Enable JavaScript.

 

A simple domain model

 

Unit Testing With Sham Domain Objects

Abstract

A well-designed, object-oriented program consists of clearly defined, independent layers. The most important layer is the domain layer because all design decisions are ultimately governed by its classes, states, relationships, behaviors and constraints. Even small domain layers can be complex and difficult to test thoroughly. This article is about unit testing applications using program-generated, "sham" instances of domain classes.

Test unitaire avec des objets du domaine factice

Désolé, pas encore traduit.

Motivation

This article is about unit testing applications using program-generated, "sham" instances of domain classes. Why is this important? Consider the (semi-realistic) example of a domain model in Diagram 1. The domain is that of a fictitious corporation (the Universal Services Corporation) and its customers. USC sells and services a variety of machines from multiple locations.

USC and its customers are hierarchical organizations with a headquarters, divisions and branches. Each Organization can have suborganizations as its components, and each knows its parent organization. (See the Composite Pattern1.) Each USC organization also keeps a list of its customer organizations, and each customer organization is associated with a specific USC sales organization and a USC service organization. Since a USC organization can also use USC machines and services, it can be a customer of itself or another USC organization. Of course, only USC sales organizations can sell machines and services, and only USC service organizations can service machines.

Each organization has a physical location that houses employees and machines. A location can have one or more installed machines, of various types. A machine can have multiple open problems, so there is a one-to-many relationship between machines and problems. Further, there is an implicit problem and service history for each machine.

All people of both USC and its customers are modeled uniformly by class Employee, a subclass of TypedDomainObject. (See the Type Object Pattern2.) A variety of types are defined, from "CEO" to "Field Engineer." Each location keeps a list of its employees and identifies one or more of them as "contacts" -- people that can be called about a problem. Further, one employee is identified as the "prime contact."

A USC Field Engineer is assigned to fix each problem, but only a Field Engineer trained to work on the affected type of machine can be assigned. Therefore, the domain model also keeps track of the training received by each Field Engineer on each machine type. This is a one-to-many relationship, but the proficiency level and currency of the training must also be tracked. This is done by instances of the Training class. There is a subclass of Employee for FieldEngineer, but not for other types of employees because it is the only subclass with additional behaviors. If and when the model is extended with specific behaviors for other types of employees, additional subclasses can be defined, with the Employee class eventually becoming abstract.

This is only a brief introduction to a small but complex domain model. You can easily imagine the rich set of applications that can be implemented on this domain, and also the variety of constraints, rules, behaviors and queries required in the complete domain model. Such a model must be thoroughly unit tested, and this testing should be independent of other parts of the applications that incorporate the model. Generating sham instances of the domain classes has value in testing all layers of the application.

Sham Objects versus Mock Objects

The concept of "mock" objects has recently gained some attention.

A Mock Object is a substitute implementation to emulate or instrument other domain code. It should be simpler than the real code, not duplicate its implementation, and allow you to set up private state to aid in testing. The emphasis in mock implementations is on absolute simplicity, rather than completeness. For example, a mock collection class might always return the same results from an index method, regardless of the actual parameters.3

Mock objects are instances of classes created as substitutes for domain classes, before the domain classes are ready to be integrated into the domain layer. They exist to make it possible to test other parts of the domain model. In contrast, sham objects are real instances of domain classes generated by program logic. They are "sham" only in the sense that they do not represent actual things in the real world domain.

The differences between mock objects and sham objects lead to different approaches to unit testing. The focus of mock objects is on independently testing individual domain classes; with sham objects, it is on testing domain classes in the context of the complete domain model. Clearly, these are complementary ideas. If the coding of a domain class is incomplete and instances of it are generated by program, are they sham objects or mock objects? Or sham, mock objects?

 

 

Program layers

The Layers of an Application Program

A well-designed, object-oriented program is divided into layers4,5. The domain layer models aspects of the real world of interest to the program's sponsors; the view layer graphically presents domain objects to the program's users; the application layer mediates between views and domain objects; and the persistence layer maps domain objects to and from persistent storage. It is desirable to keep program layers independent of each other; each providing services to the layer above it through well-defined interfaces.

Theoretically, it should be possible to assign separate, geographically dispersed teams of programmers to each layer, with communications only in terms of well-defined layer interfaces. Practically, a single team may be responsible for code in all layers, but this does not excuse them from respecting layer boundaries - even in XP projects.

The all-too-common approach to testing is to hire hoards of "quality assurance" testers to bang away at the completed application until they break something. A better approach is to first test each layer as an independent subsystem, at the level of its component units, and to do so within a unit testing framework that encourages frequent regression testing.

 

The View Layer

The view layer of most programs is constructed from graphic widgets arranged in a window and supported by a rendering engine. These are pretested components of generally high quality, requiring no additional unit testing. They are parameterized through a variety of properties for each type of widget; some of which pertain to presentation (such as font, color, size, and position) and others to the data being presented (such as output formatting and input validation). These properties can be set statically by window builder tools, and many can be set dynamically by the application layer. In general, the data-oriented properties should not be used because they infringe on the responsibilities of the domain layer -- responsibilities that should be tested as part of domain layer unit testing.

Visual Programming frameworks are provided by many vendors in an attempt to reduce the cost and complexity of application programming. The view layer is extended all the way down to the persistence layer by wiring widgets directly to database operations, thereby eliminating the need for an application layer or a domain layer. SQL statements are generated by the visual framework for immediate execution by the database. Business rules, properly part of the domain model, get implemented as view extensions-if they are implemented at all-and they may not be implemented consistently in all programs that access the database. In general, visual programming frameworks do not scale up very well. "Bang it till it breaks" is just about the only testing that can be done.

 

The Application Layer

The application layer mediates between the view layer and the domain layer. Two basic approaches are used, task-oriented and navigational. Task-oriented applications focus on specific tasks users need to accomplish, while navigational applications focus on user navigation to objects in the domain layer, followed by discrete operations on those objects.

Task-oriented applications require the user to pick the right user interface for the task at hand, usually through deeply hierarchical menus or through nested dialogues. These applications are rigidly bound to the business process analysis that led to their design. They hide the domain model affected by each task from the user to make individual tasks appear simpler. However, the result is often increased training because users are denied an appropriate level of contextual information. Often, they do not know why they must perform certain tasks-only that they must.

It is important that task-oriented applications have a distinct domain layer. Otherwise, application code is forced to accommodate two, essentially unrelated functions: mapping between the view and persistence layers, and implementing the rules and behaviors required by the domain. The temptation to bury domain logic in the application layer leads to code that is difficult to implement, debug, and maintain.6

In contrast, navigational applications minimize the semantic gap between what the user sees and what the user can do, and this minimizes the gap between what the user requests and what the program must do to accomplish those requests. This translates directly to operations on object-oriented domain models. Navigational applications are highly contextual and well-suited to drag-and-drop, action-oriented visual interfaces. They rely on increased user knowledge of the domain, but this results in reduced user training.

To unit test both task-oriented and navigational applications, you need a domain layer populated with correctly interrelated domain objects. The obvious approach is to create a database and load it with test data, but this has several disadvantages. It requires all layers of the program to be developed and tested together, thereby eliminating many of the advantages of a layered architecture. And, it takes special effort by a database administrator to create a test database and load it from files of existing data.

How likely is it that application layer programmers will start each test case with uncorrupted data? How good is the test data that was loaded? How likely is it the test data will be adequate for all of the tests that need to be run? And how often will programmers be able to run regression tests? This is clearly not an adequate basis for unit testing a complex application layer; especially not with repeatable unit testing.

The alternative is to use a sham domain, a domain layer populated with domain objects program generated just for the purpose of unit testing. The sham domain is regenerated for each unit test so that each test always begins with good data. Sham domain objects do not persist beyond unit tests. The result is a suite of repeatable unit tests for the application layer.

 

The Domain Layer

All of the most important design decisions of a program are ultimately governed by the classes, states, relationships, behaviors and constraints of the domain layer. Once a domain model has been diagrammed (in UML or whatever else suits the designers), it is relatively straightforward to create the domain classes, define their state and relationship variables, and add methods for constraints and interfaces. Unit tests can then be created for those methods of each class that depend on and affect the state of a single domain object. The Unit Testing framework described by Kent Beck7 (and implemented as SUnit for Smalltalk and as JUnit for Java) is ideal for this purpose.

It is simple to create an instance of a single domain class in the setup phase of a test, along with state values appropriate to the test. But much testing still remains for a) methods with complex dependencies on other domain objects, b) methods supporting the complex queries and transactions of applications, and c) methods supporting the operations of the persistence layer.

Creating a test bed of sham domain objects, a sham domain, is a key part of unit testing the domain layer. At first blush, it appears to be quite a programming chore to create a sham domain consisting of an appropriate number and variety of instances of all domain classes, with all state values and relationships fully expressed, but the programming for a sham domain need be developed only once (and maintained as the domain model evolves). If the domain model is really large, it may be necessary to do this in pieces, but if that is the case, the domain model is probably too large to be viable anyway. Think subsystems!

A single message in the setup method of a test case kicks off the process of creating a complete sham domain. This produces a sham domain that is larger than required by the limited scope of any single test, so it may seem wasteful to recreate it for every test. However, it takes less programmer effort (a valuable commodity) than writing code to back out updates. Besides, the sham domain should not be all that large; a few thousand objects that only take a fraction of a second to create. If a series of tests read but do not update the sham domain, or update it in non-interfering ways, then it makes sense to pre create the sham domain and use it for multiple test cases. Otherwise, just recreate it for each test case.

 

The Persistence Layer

The domain layer should present the appearance to the application layer that all domain objects are immediately and continually available in an apparently infinite virtual address space. However, this ideal is compromised in many ways by the need for long-term persistence. Practically, only a few domain objects are loaded into memory by the persistence layer at any one time. This leads to a variety of concerns when a sham domain is created for unit testing:

  1. There are well-known "impedance mismatches" between objects and relational databases8. Domain classes are not necessarily one-to-one with database tables. An instance of a domain class can be populated from one or more tables in arbitrarily complex ways. This is especially true if an application is built on top of an existing relational database. A complex table join may be necessary to obtain the data required by a domain object. However, in a cleanly layered program architecture none of these complexities are visible to the application layer; only the objects and interfaces explicitly provided by the domain layer. Therefore, all of the impedance mismatches masked by the persistence layer can be safely ignored when creating the sham domain.
  2. The use of a proxy is common when only a portion of a large object is required. For example, a query may return proxies for selected employees of a company, consisting of just their names, titles and identity keys. The application user can then select one of them and request more information. At that point, the whole employee object is retrieved from the database and substituted for its proxy. The sham domain must support both proxies and their base objects, and substitutions between them.
  3. In an object-oriented domain model, relationships are expressed directly by pointers and collections, leading to fast access to related objects, at the cost of flexibility in performing ad-hoc queries. In contrast, in a Relational Database all entities of the same type are held in the same table and relationships among entities are expressed through common attributes, often by identity keys. This provides great flexibility for ad-hoc queries, but not necessarily very fast access to related entities. A sham domain generally need not be concerned with ad-hoc queries, so pointer and collection relationships are sufficient.
    Flexibility for ad-hoc queries is a key selling point for relational databases, but it is not very realistic for databases with more than a few tables. The entity model becomes too complex for anyone but a database domain expert to understand and query. Predefined queries available through the user interface necessarily become the norm.
  4. Exceptions are raised by database operations for device access, authorization, concurrency, session, caching, and transaction management errors. Few of them would occur if the domain objects were actually available in memory, and few of them are of any interest to the application layer. It is, therefore, incumbent on the persistence and domain layers to shield the application layer by trapping all persistence exceptions and raising only domain specific exceptions (or a general Error exception). If this is done, a sham domain created for unit testing need only be concerned with simulating the exceptions defined by the domain layer's external interface.
  5. Many factors affect the performance of the domain layer; in particular, the number of domain objects involved. Databases optimize performance by the early binding of queries to resources and by caching. These optimizations are not apparent to the application layer, and often not to the domain layer; the persistence layer just does what it must to provide good performance. Given the small size of typical sham domains, there is no need for any optimizations. Of course, this also means that it is not possible to do performance testing with a sham domain.
  6. An index maintained over a large collection is another way to optimize database performance, allowing individual objects to be selected by key value. Each such access by key is, in effect, a query requested by the domain layer. It is important to separate domain methods that request queries from domain methods that processes the results of the query. In this way, the query can be implemented in different ways when the domain layer operates in sham mode versus persistence mode.
  7. One-to-many relationships in an object-oriented domain model are implemented by one object holding a collection of the related objects. If the number of related objects is large (e.g., the employees of a large corporation), the related objects are fetched from the database as needed. If the application needs to sequentially process them, the domain layer can create a stream on the collection so the persistence layer can use buffering techniques. In these cases, the sham domain must also create a stream, regardless of the actual size of the sham collection.
  8. A domain layer may also support predefined queries that are not directly expressed by the UML diagram of the domain model. Often this is done through messages to specific domain classes. For example, an Employee class may have a query method that answers all management employees. Temporary collections are created by these queries, so the preceding discussion of collection size also applies to them.
  9. All interactions with a database necessarily occur within the scope of transactions. This allows database operations to be performed atomically; to be fully completed or fully backed-out. A sham domain does not interact with a database, so there is no concern with the state of the database. However, the domain model must still support the same commit and rollback interfaces, and domain objects must still have transaction atomicity.
  10. As applications become increasingly complex, it is necessary to communicate various events occurring in the domain to interested parties; such as the creation, deletion and updating of objects. There are two cases to consider:
    • Events initiated by concurrent database users. Database locking and other concurrency mechanisms can be used by the persistence layer to serialize database access, making each transaction appear to be against a static database. The sham domain can ignore the possibility of these events occurring.
    • Events initiated by concurrent application views within the same transaction . The domain layer must be able to detect such events and communicate them to observers. And of course, observers must be able to specify which events are of interest to them. These capabilities should be part of the overall domain layer framework and provided for both sham and persistent domain objects.
  11. Feedback information is provided by database operations; such as the number of objects read by a query. Equivalent information must be made available to applications when using a sham domain.

Programmers of the application layer should not have to contend with the complexities arising from persistence. In fact, they should not have to look beyond the public interfaces of the domain layer. A test-bed of sham domain objects allows them to test application logic independent of these persistence considerations. Of course, it is still necessary for the persistence layer programmers to test the persistence layer.

 

 

Domain framework

Sham Modeling Framework

A domain model consists of classes that model aspects of the real world, but to be useful, they must be implemented in a framework that supports many practical considerations, such as security, persistence, versioning, transaction management -- and unit testing. It is far beyond the scope of this paper to describe a complete domain framework, but the following points relate to unit testing with a sham domain:

  1. The domain layer is under the overall control of an instance of a subclass of DomainManager. (See the Singleton Pattern1.) This object is responsible for initializing the overall layer, in particular to initialize it for operation either with sham objects or with a persistence layer. When sham objects are to be used, it initiates the process of building them. In a more complete domain framework, DomainManager would also have other responsibilities.
  2. The operation of the domain layer depends on whether sham-objects or persistent objects from the database are being used. This mode can be determined by sending isUsingShams to a domain class. Individual domain objects must behave the same, regardless of mode, but that leaves a variety of open issues if the behavior of the domain model, as seen by the application layer, is to be independent of this mode.
  3. All objects in memory have a unique identity, but that identity disappears when objects are written to a database. To compensate, all domain classes are subclassed from DomainObject, which defines a serialNumber instance variable. DomainObject provides a nextSerialNumber class method that answers a unique integer per invocation. The serial numbers become the "keys" of the rows of the database tables. In a running application, serial numbers (a.k.a. sequence numbers) are often obtained from the database, but when creating sham objects, they must be generated by each class (by incrementing the serialNumber class instance variable).
  4. Another consideration is the generality of classes. Is there to be just one Employee class with a "type" indicator2 to distinguish managers from secretaries and engineers, or should there be a separate subclass of Employee for each? It depends on the purpose of the model and the extent to which unique attributes and behaviors are required for each type. Domain objects that are generic with a type indicator are subclasses of TypedDomainObject and inherit a type instance variable. Type indicators are instances of a subclass of DomainObjectType; for example, of EmployeeType or ProductType. The list of possible types is easily maintained, by an application built for that purpose or by using common database tools.
  5. A simple question is where to put the methods for creating sham instances of the domain classes. A first answer is to put all of the object creation methods for shams in a test-generator class. For a small domain model, this would work fine, but it gets unwieldy as the number of domain classes increases. Further, it is not granular enough; it does not allow for the selective generation of sham objects by class. A second answer is to distribute sham-object creation methods among the domain classes in class methods. This fits well with the general notion that classes are "instance factories." If this is done, put all of these messages into a common sham building class protocol.
  6. Another question is how to manage all of the sham domain objects. They could be dumped into a single global collection; all of the instances of a particular domain class could then be retrieved by sending the class the allInstances message (inherited from Behavior). An alternative is for each class to collect all of its sham instances in a class instance variable named shams. All of its instances can then be retrieved by sending the shams class message. Again, the first answer is simpler, but it risks mixing the domain objects of multiple runs, especially if an earlier run failed to cleanup after itself. Also, in some tests it may be desirable to use a mixture (by class) of sham objects and persistent objects.
  7. In this framework, domain classes are responsible for their instances. Therefore, predefined queries to select among those instances are implemented as messages to domain classes. Each such query method answers a collection of the instances selected by the query. In database mode, the query method uses the persistence layer to send a SQL Select statement to the database (directly or through a mapping layer), and then returns the answer collection to its caller. In sham mode, the query method uses Smalltalk enumeration methods on its shams collection. It is worth repeating, here, that objects in the application layer must not be allowed to issue SQL statements to the database. The results of all database queries must be conceptualized as proper domain objects, and not as ad-hoc records.

Building the Sham Domain Model

The process of creating a sham domain layer is straightforward. Start with a domain class with few dependencies and create an assortment of instances of it. Then, pick a second domain class with a relation to the first, and create instances of the second for each instance of the first, and so forth. Try to make the instances of each class semi-realistic, but have fun too. For example, sham Employees are given the first names of movie stars and the last names of US presidents (e.g., Elvis Nixon and Marilyn Bush).

A single buildShams message to the DomainManager subclass (to USCDomainManager) does the following:

  1. It initializes all of the subclasses of DomainObject, which initializes the usingShams, serialNumber and shams class instance variables of each.
  2. It sends buildShams to each of its subclasses of DomainObject. This method creates a collection of instances of the class, varying their contents in suitable ways. Each instance is related to already created instances of other classes, but relations with instances yet to be created are postponed. Instances of the subclasses of DomainObjectType are created first because they have no dependencies on other objects. Otherwise, the order of sham-object creation is something for a programmer to work out to get full coverage.
  3. Each domain class has a method for building a single instance of itself, that is called repetitively by buildShams with varying arguments. Additional methods are used to create dependent objects, such as an employee's phone number. All of these methods are in the shams building protocol.

Returning now to the domain model in Diagram 1, class Organization's buildShams method creates instances in a composite hierarchy. It uses nested loops to create them level by level, varying the state variables of the instances at each level. Both USC and its customers are modeled as organizations; however, the two hierarchies are created separately. Relations between them are added to say which USC organization handles sales to a customer, and which USC organization services machines.

Class Location provides an addEmployee: instance method that adds an employee to the employees collection of a location, and also assigns the location to the location variable of the Employee. This pattern is followed for all one-to-many, bi-directional relationships in the domain model.

Wherever possible the buildShams methods use the methods of the domain classes so that the act of building the sham domain also tests many of its basic mechanisms, eliminating separate unit test for them. This works fine for simple accessor and collection methods. As always, testing code means testing the testing code, too. As code for building sham objects is written, build appropriate SUnit test cases to validate the sham objects -- to test that data values make sense and that object relationships are complete and correct. There is little point in creating a sham domain that you are not sure of - test it as would all other code of the application.

Additional domain methods may also be needed to support the building of sham-objects. Some of them may later prove to be of value to applications. For example, to set the Field Engineer who serviced a machine, a query message is sent to the Machine instance requesting the list of all "field engineering" employees, at the location that services the machine, who are trained to service that type of machine. This is a complex query to a Machine that may, or may not, be used by the application layer. If it is used, promote the method from the shams building protocol to a domain protocol. In general, though, treat the building of shams as a separate programming problem.

The number of instances of a class created in the sham domain depends on both context and relationships with other classes. For one-to-one relationships, only one instance of the dependent class is needed, but care should be taken in assigning values to its instance variables. In the sham domain, each Organization has only one Location and each Location has only one Address, but each Location requires an Address that appears to be both unique and related to the Organization. For one-to-many relations, a number of variations may be desired, typically of varying types.

Unit Testing With Sham Domain Objects

The whole point of creating a sham domain is that it helps you to separately test the application, domain and persistence layers. If your project is large enough to have separate development teams for each layer, you want to reduce scheduling dependencies among them. Once a basic domain model has been designed and implemented at the level of key classes and their relationships, you can build a sham domain and let each group use it as the base for their unit testing. Smaller projects can be developed in a more agile way, a story at a time. As each iteration progresses, implement the necessary domain classes and their sham instances. Work on each layer and unit test against the sham domain before attempting tests with persistent objects.

  1. Testing the Domain Layer.

    When you build and test the sham domain, you also test the creational methods and object relationships of domain classes, but this leaves a lot untested. You still need to test methods that validate input data, enforce business rules and constraints, perform queries, and implement complex behaviors. Much of the setup work for these tests has already been completed in the construction of the sham domain. In each test case, send messages to specific domain objects and then check for expected results. Did you get the expected return object? Was the expected exception raised? The assertion methods of SUnit makes it easy to do these tests.

    But other methods change the domain model as their main effect; for example, they add an employee, change an address, etc. The domain framework described above helps you probe the domain model for expected changes. All instances of each domain class are contained in the shams collection of the class, and every object in the class has a unique serialNumber. It is easy to access an affected object by its serialNumber, or by selecting objects through collection enumeration methods.

  2. Testing the Application Layer.

    Automating the testing of the application layer means three things, presenting domain information to the user, simulating user gestures, and probing the domain model for expected results. A sham domain helps with two parts of the problem. It provides information for presentation to the user, and it provides a target for user requested changes. Changes to sham domain objects can be easily probed for expected results. (Simulating user gestures is an unrelated problem - someone else's.)

  3. Testing the Persistence Layer.

    Bridging the semantic gap between an object-oriented domain model and a relational database can be quite a chore. If a reliable mapping engine (like TopLink) is used, then the problem is primarily one of specifying the correct mapping descriptors. (Not using a reliable mapping engine means writing a lot of complex code. You wouldn't develop all of your own user interface widgets, would you?)

    Even so, a good deal of persistence testing is required. This is usually accomplished by loading the database and then successively loading and unloading domain objects. But then we are back to the questions previously asked about the quality of the database data, especially over a series of tests. Sham objects give testers an alternative; they can repopulate the database from the sham domain prior to each test - and do it without the services of a database administrator.

Conclusion

Creating and testing complex applications is always difficult. A layered architecture with well-defined inter-layer interfaces is crucial to success, as is the testing of each layer as an independent unit. Because of its central role, special care should be given to the domain layer. It is worth the extra effort it takes to generate a sham domain layer. An application can be more reliably and efficiently tested, and with a degree of repeatability not otherwise possible.

References

1 Gamma et all, Design Patterns: Elements of Reusable Object-Oriented Software, Addison-Wesley, 1995.

2 Johnson, Ralph, Woolf, Bobby The Type Object Pattern.

3 Tim Mackinnon, Steve Freeman, Philip Craig, Endo-Testing: Unit Testing with Mock Objects

4 Buschmann, Frank, et all, Pattern-Oriented Software Architecture: A System of Patterns, John Wiley & Sons, 1996.

5 Brown, Kyle, Crossing Chasms: The Architectural Patterns.

6 Demers, Richard,

.

7 Beck, Kent, Simple Smalltalk Testing: With Patterns.

8 Brown, Kyle and Whitenack, Bruce G., Crossing Chasms: A Pattern Language for Object-RDBMS Integration: "The Static Patterns".