The secret world of testing without mocking: domain-driven design, fakes, and other patterns to simplify testing in a microservice architecture

Much has been said about mocks, the controversial, Swiss army knife of test doubles:

...the list goes on. For a tool so easy to misuse, we're using it quite a lot. Mockito is one of the most depended-upon Java libraries in the world.

I was there too once, a frequent Mockito user, perhaps like you are now. Over time however, as my application architectures improved, as I began to introduce real domain models, the tests I wrote were becoming simpler, easier to add, and services easier to develop. Tricky testing problems that loomed over my head for years now had obvious solutions. Much to my surprise, I was barely using Mockito at all.

In this post, I demonstrate some compelling and, in my experience, overlooked advantages to mock alternatives. We will explore the origins of mocking, why mocking may have become so ubiquitous, a world without mocking, and the system of incentives, practices, and abstractions that evolve as a result. Whether you are a casual or devout mock-ist, I encourage you to keep calm, open your mind, and try going without for a while. This post will guide you. You may be surprised what you find.

The hidden burdens of mocking

We forget because the APIs are so nice, but mocking is fascinatingly complex under the hood. It's metaprogramming: code that implements types at runtime rather than using native language features to implement types at compile time.

Mockito's API optimizes for immediate convenience–justifiably so–but it's this immediate convenience that dominates our thinking. While less sexy, a compile-time implementation (a class) has its own conveniences. Unfortunately, they are easily overlooked because they take just a little time and investment in the short term before you can see them. By first reviewing some mocking pitfalls, we'll start to see how taking the time to write a class can pay off.

When a class under test has a mocked dependency, the dependency must be stubbed according to the needs of your test. We only stub the methods our class needs for the test, and only for the arguments we expect the class to use.

However, by leaving out stubbing some methods, we imply we know what methods are used. By only stubbing for certain arguments, we imply we know how those methods are used. If our implementation changes, we may need to update our tests, even though the behavior hasn't changed.

// A hypothetical anti-corruption layer encapsulating credit.
// Please forgive the very naive domain model.
interface CreditService {
  CreditStatus checkCredit(AccountId account);
  void charge(AccountId account, Money money);
}

// A hypothetical domain service which depends on a CreditService
class OrderProcessor {
  final CreditService creditService;
  // snip...
  
  void processOrder(AccountId account, Order order) {
    if (HOLD.equals(creditService.checkCredit(account))) {
      throw new CreditHoldException(account, order);
    }
    // snip...
  }
}

class OrderProcessorTest {
  // snip...

  @Test
  void throwsIfAccountOnCreditHold() {
      when(creditService.checkCredit(AccountId.of(1))).thenReturn(HOLD);
      assertThrows(
          CreditHoldException.class, 
          () -> orderService.processOrder(account1, testOrder));
      
      // The above test works with the current implementation, but what if our 
      // implementation instead changes to just call `charge` instead of first 
      // calling `checkCredit`, relying on the fact that `charge` will throw an 
      // exception in this case? The test will start failing, but actually there 
      // is no problem in the production code. This test is coupled to 
      // implementation detail.
  }
}

The reverse can also happen: your test passes, but the code actually doesn't work. For example, if an interface encapsulates some state between subsequent method calls, or a method has some preconditions or postconditions, and your stub does not reimplement these correctly, your tests may not be valid. That is, mocking also repeats, and is therefore coupled to, how a dependency works.

To remove some of this repetition, we can refactor the test setup to be done once in a @BeforeEach method for the whole class. You can even go one step further and pull out the stubbing into a static method, which can be reused in multiple test classes.

class Mocks {
  // A factory method for a mock that we can reuse in many test classes.
  // Note however the state of the stub is obscured from our tests, hurting 
  // readability.
  static CreditService creditService() {
    var creditService = mock(CreditService.class);
    when(creditService.checkCredit(AccountId.of(1))).thenReturn(HOLD);
    doThrow(NotEnoughCreditException.class)
        .when(creditService).charge(AccountId.of(1), any(Money.class));
    return creditService;
  }
}

There is another way to make an implementation of a type reusable so that you don't have to constantly reimplement it: the familiar, tool-assisted, keyword-supported, fit-for-purpose class. Classes are built-in to the language to solve precisely this problem of capturing and codifying knowledge for reuse in a stateful type. Write it once, and it sticks around to help you with the next test. Not only do classes elegantly save you from reimplementing a contract for many tests, they make implementing those contracts simpler in the first place.

// A basic starting point for a "fake" CreditService.
// It sets the foundation for many improvements, outlined below.
// You could even use a mock under the hood here, if you wanted, and change it
// later. Part of the benefit of a class is that you can change the implementation 
// over time without breaking your tests.
class InMemoryCreditService implements CreditService {
  private Map<AccountId, CreditStatus> accounts =
      ImmutableMap.of(AccountId.of(1), CreditStatus.HOLD);

  @Override
  public CreditStatus checkCredit(AccountId account) {
    return accounts.getOrDefault(account, CreditStatus.OK);
  }

  @Override
  public void charge(AccountId account, Money money) {
    if (CreditStatus.HOLD.equals(checkCredit(account))) {
      throw new NotEnoughCreditException();
    }
  }
}

Object-oriented test double

Admittedly, our class isn't all that impressive yet. We're just getting warmed up. A class's real power comes from encapsulation. A class is not just a collection of delicately specific stubs, but a persistent, evolvable and cohesive implementation devoted to the problem of testing.

When all you need is a few stubbed methods, mocking libraries are great! But the convenience of these libraries has made us forget that we can often do much better than a few stubbed methods. Just as when we aimlessly add getters and setters, habitual mocking risks missing the point of object-orientation: objects as reusable, cohesive abstractions.

For example, test setup often has a higher order semantic meaning mock DSLs end up obfuscating. When we stub an external service as in the example above...

when(creditService.checkCredit(AccountId.of(1))).thenReturn(HOLD);
doThrow(NotEnoughCreditException.class)
    .when(creditService).charge(AccountId.of(1), any(Money.class));

...what we are really saying is, "Account 1 is on credit hold." Rather than reading and writing a mock DSL that speaks in terms of methods and arguments and returning and throwing things, we can name this whole concept as a method itself.

// Evolving our class to do more for us
class InMemoryCreditService implements CreditService {
  private Map<AccountId, CreditStatus> accounts = new LinkedHashMap<>();

  public void assumeHoldOn(AccountId account) {
    accounts.put(account, CreditStatus.HOLD);
  }

  @Override
  public CreditStatus checkCredit(AccountId account) {
    return accounts.getOrDefault(account, CreditStatus.OK);
  }

  // charge implementation stays the same...

Using it, our test reads like our business speaks:

creditService.assumeHoldOn(AccountId.of(1))

Now this concept is reified for all developers to reuse (including your future self). This is encapsulation: naming some procedure or concept that we may refer to it later. It builds the ubiquitous language for your team and your tools. Having an obvious and discoverable place to capture and reuse a procedure or concept that comes up while testing: that's convenience.

I find myself using methods like these constantly while testing, further immersing my mind in the problem domain, and it is incredibly productive.

Fakes over stubs

As your class becomes more complete, it'll start to look more like a fake than a stub. You've used a fake any time you've tested with an in-memory database. A fake is a complete implementation of some interface suitable for testing.

Any time you replace a non-trivial dependency, you should really ensure the replacement has its own tests. This ensures that when you use a test double instead of the real thing, you haven't invalidated your tests. If you're clever, you can even reuse the same tests as your production implementation–and you absolutely should. It saves you time and gives you confidence.

In this way, a fake also becomes a demonstration of how some type is supposed to work. It's can become a kind of reference implementation and testbed, serving as documentation for ourselves, our teammates, and our successors.

// Example pattern to test a fake and production implementation against same tests

/** Defines the contract of a working repository via tests. */
abstract class RepositoryContract {
  SomeAggregateFactory factory = new SomeAggregateFactory();

  abstract Repository repository();

  @Test
  void savedAggregatesAreRetrievableById() {
    var aggregate = factory.newAggregate(repository().nextId());
    repository().save(aggregate);
    assertEquals(aggregate, repository().byId(aggregate.id()));
  }

  // etc...
}

class InMemoryRepositoryTest extends RepositoryContract {
  InMemoryRepository repository = new InMemoryRepository();

  @Override
  Repository repository() { return repository; }
}

class MongoRepositoryTest extends RepositoryContract {
  @RegisterExtension
  MongoDb mongoDb = new MongoDb();

  MongoRepository repository = new MongoRepository(mongoDb.database("test"));

  @Override
  Repository repository() { return repository; }
}

Fakes as a feature

As the software industry is increasingly concerned with safe, frequent production rollouts, fakes increasingly make sense as a shipped feature of our software rather than merely compiled-away test code. As a feature, fakes work as in-memory, out-of-the-box replacements of complicated external process dependencies–dependencies which may not even yet be specified–and the burdensome configuration and coupling they bring along with them. Running a service can then be effortless by way of a default, in-memory configuration, also called a hermetic server (as in "hermetically sealed"). As a feature, it is one of developer experience, though it still profoundly impacts customer experience through safer and faster delivery.

The ability to quickly and easily start any version of your service with zero external dependencies is game changing. A new teammate can start up your services locally with simple system setup and one command on their first day. Other teams can realistically use your service in their own testing, without understanding its ever-evolving internals, and without having to rely on expensive enterprise-wide integration testing environments, which inevitably fail to reproduce production anyway. Additionally, your service's own automated tests can interact with the entire application (testing tricky things like JSON serialization or HTTP error handling) and retain unit-test-like speed. And you can run them on an airplane.

This is your test. This is your test on drugs.

"Unit" tests (sometimes called "component" tests) in the ontology of testing, isolate a unit of code to ensure it functions correctly. We often contrast these with "integration" tests (confusingly, sometimes also called component tests), which test units together, without isolation. We heard writing lots of unit tests is good, because of something about a pyramid and an ice cream cone, so we have to make sure most of our tests only use isolated units, so that most of our tests are unit tests.

So let's back up. Why are we replacing dependencies and "isolating units" in the first place?

  • With stubbed dependencies, there are fewer places to look when there is a test failure. This means we can fix bugs faster, so we can ship to our users more frequently.
  • Dependencies can be heavy, like databases or other servers which take time to set up, slowing down the tests and their essential feedback. Replacing those with fast test doubles means faster feedback cycles, and faster feedback cycles means we can ship to our users more frequently.

These two why stacks all eventually converge at the same reason, the reason we write tests in the first place: to ship more value, more quickly (after all, features which improve safety also improve speed). While replacing collaborators can help as described, replacing collaborators also has effects directly counter to this end goal. That is, when you replace dependencies, your feedback cycles actually slow down because those replacements aren't what you actually ship. You aren't ever seeing your code as it truly works until you deploy and get it in front of users1. If you don't have good monitoring, you may not see it work–or not!–even then.

Mocks are like hard drugs... the more you use, the more separated from reality everything becomes.2


1 You can make the argument that regardless of how you test, production is still the only place you see how your code "truly works." The slightly more nuanced story is that how "close to truth" your tests are is a spectrum, and it's obviously advantageous to be closer to truth than not, all else equal. We'll touch on this more below.

2 Thank you Lex Pattison for this fantastic quote.

All tests are integration tests

If we think about our testing decisions in terms of value throughput (which is the only thing that matters) instead of fixating on isolating units, we end up making very different decisions:

  1. Don't replace a dependency unless you have a really good reason to. We've talked about some good examples of when this makes sense already: heavy dependencies, like external process integrations, in which the complexity or time justifies the expense of replacing it. Fakes, as described above, work great here.

  2. Write your production code so you can reuse as much of it in tests as possible. In particular, encapsulate your business logic, of which there is really only one correct implementation by definition, in reusable classes with injected dependencies.

By avoiding doubles at all, you've saved yourself the time of reimplementing code you've already written and already tested. More importantly, your tests aren't lying to you; they actually provide meaningful feedback.

Unit testing, if defined by isolating a test to only one class, doesn't exist. No code exists in a vacuum. In this way, all tests are integration tests. Rather than think about unit vs integration vs component vs end to end or whatever, I recommend sticking to Google's pragmatic small, medium, and large test categorization.

If you're getting hives thinking about all the places bugs could lurk without isolating a unit–I used to–ask yourself, why are we so comfortable using the standard library, or Apache commons, or Guava, without mocking that code out too? We trust that code. Why? We trust code that has its own tests.3

We can think of our own code no differently than the standard library. If we organize our code in layers, where each layer depends on a well-tested layer beneath it, we rarely need to replace dependencies with test doubles at all. Simply use the real thing. The bug shouldn't be there, because we've tested there, mitigating one of the "cons" of integration.

You will find tests at each layer may feel redundant. The scenarios will be similar or even the same, and will exercise much of the same code, as lower-layer tests. For example, you might have a test "places order with account in good credit standing" at the application layer invoked via the HTTP transport, at the application services layer invoking these classes directly, and at the domain model layer.

// Use your production Spring Boot configuration, but with an in-memory profile
@SpringBootTest(webEnvironment = WebEnvironment.RANDOM_PORT)
@ActiveProfiles("in-memory")
class ApplicationTest {
  // snip...
  
  @Autowired
  InMemorySubscriptions subscriptions;

  // A larger test, with broad scope and slow startup due to Spring and 
  // web server initialization. We're not just testing business logic,
  // but particularly focused on the transport specifics and application 
  // wiring: JSON serialization works how we expect, the status codes are
  // right, etc. These things are often dependent on Spring configuration,
  // and if our tests use different Spring configuration than production,
  // what are we really testing? That's why an in-memory configuration,
  // which only replaces external dependencies, is crucial.
  @Test
  void placesOrderWithAccountInGoodCreditStanding() {
    assertOk(restTemplate.postForEntity(
        "/v1/orders/",
        new HttpEntity<>(ImmutableMap.of("subscription", "SKU1")),
        Map.class));

    assertOk(restTemplate.postForEntity(
        "/v1/orders/1/",
        new HttpEntity<>(ImmutableMap.of("account", 1)),
        Map.class));
    
    // It's okay to directly use another layer if some important observable
    // affects of an API are external.
    assertThat(subscriptions.forAccount(AccountId.of(1))).hasSize(1);
  }
  
  // Other tests can also be more technical and protocol specific like 
  // JSON parse failure handling, or authentication protocol support, etc.
}

// Medium tests; fast but still broad.
// Requires a Spring context for security features, maybe transactions,
// or metrics, but doesn't require a web server.
@SpringJUnitConfig
class OrderApplicationServiceTest {
  // snip...

  @Test
  void placesOrderWithAccountInGoodCreditStanding() {
    var order = orderService.startOrder(Subscription.of("SKU1"))
    orderService.charge(order.id(), AccountId.of(1));
    assertThat(subscriptions.forAccount(AccountId.of(1))).hasSize(1);
  }
}

// Small test; fast and limited to only our domain model package.
// Requires no framework.
class OrderProcessorTest {
  InMemoryCreditService creditService = new InMemoryCreditService();
  InMemorySubscriptions subscriptions = new InMemorySubscriptions();
  OrderProcessor orderProcessor = new OrderProcessor(creditService, subscriptions);
  OrderFactory orderFactory = new OrderFactory();

  @Test
  void processesOrderWithAccountInGoodCreditStanding() {
    var order = orderFactory.startOrder();
    order.addSubscription(Subscription.of("SKU1"));
    orderProcessor.process(AccountId.of(1), order);
    assertThat(subscriptions.forAccount(AccountId.of(1))).hasSize(1);
  }
}

I used to fight really hard with my tests to avoid this overlap.

It was far more trouble than it was worth.

The thing is, these aren't actually that redundant when you think about it. Remember, when you or your teammates uses some class in your application, you expect it to adhere to its contract, period. This is what tests do: assert things implement their contracts. How they implement them doesn't matter to your tests, and nor should it matter to you (otherwise, how can you hope to survive in a complex code base if you have to keep the whole thing in your head?). If one of these tests fail, yes, it's quite possible the problem is in another class instead of the one under test. But as we discussed, you should also have tests against that class. And if this case is missing, great! You found a missing test, and a bug! You wouldn't have found this bug (until production, if at all) if you replaced the dependency with a mock, and what is the point of tests if not to discover bugs before production?

I also illustrated the worst of it. In practice, tests at lower levels get much more detailed than upper levels, thoroughly testing all branches in your domain objects, since that's where most of your business logic is (or should be) anyway. Individual upper layers likely won't be able to reach all those branches, and don't really need to try. As a result, you end up with a familiar test pyramid, with lots of small, fast tests, and fewer larger, slow tests.

What redundancy is left is merely a reflection of the obvious: code relies on other code. And by definition that means when we test code, we're (re)testing other code, whether we wrote it or not, all the time. By accepting it, you've freed yourself up to reuse an entire application of code rather than replacing it throughout your tests, and you know your tests actually reflect reality4.


3 For further exploration of tested or "well understood" as the boundary for "unit" vs "integration" tests, check out the legendary Kent Beck's post, "Unit" Tests?.

4 Admittedly, the only reality is actual production, which is why testing mustn't stop at the door of prod, but embrace it through monitoring, observability, feature flags, and the like. But there's no reason you shouldn't try to get close to production on your laptop, especially where doing so saves you so much time to boot.

Summary

It's not to say mocks don't have their place, but many applications benefit from a simpler world: a world without mocks. In fact, in the simplest world, you'd reuse all the implementations you've already written.

Focus on the testing the contract of the class under test. Don't worry if the tests are somehow redundant. If the application is otherwise well architected, that's only a matter of implementation. What matters is that the class, in conjunction with obedient collaborators, implements its own contract.

If it's an external process dependency, wrap it in a simple interface–what domain-driven designers may call an "anti-corruption layer"–and then implement a fake for it. Write tests that run against both the fake and the real implementation to ensure the fake is compliant. Capture common set up scenarios in the language of your problem domain as methods on your fakes.

Finally, compile your fakes with your program, and put them behind configuration flags or profiles to enable lightweight modes of execution.

Most of all, don't get too complacent with your usual solutions or tools. Change it up. You never know what new world you may discover.

Comments

Popular posts from this blog

What the health!? Implementing health probes for highly available, self-healing, global services

Asynchronous denormalization and transactional messaging with MongoDB change streams