Reference Number Attributes
In his book, Java Modeling in Color with UML [Coad99], Peter Coad [Coad] introduced the idea of a class archetype. A class archetype is a kind of class that appears frequently across a wide variety of problem domains. Classes belonging to the same archetype have similar kinds of attributes and operations. They also relate, interact, and collaborate with other kinds of classes in similar ways. However, classes belonging to the same archetype are not similar enough for us to be able to generalize the similarities in a super class or interface. The idea is to first identify various class archetypes, their typical attributes, operations, and patterns of association and collaboration. Having done this we are in a position to apply these domain neutral patterns to our specific problem domains so that we construct more robust, flexible, and extensible object models more rapidly.
Introducing Four Class Archetypes
Peter Coad lists four major class archetypes that he and colleagues have come to expect when building object models especially for business and enterprise software systems. To help communicate these archetypes and their patterns in UML diagrams, we give each archetype a color and show the relevant archetype name in a UML stereotype tag of classes belonging to that archetype. The four class archetypes are:
- Moment-Interval (pastel pink): classes that represent some moment or interval of time, usually an event or activity of some kind that needs to be recorded for business or legal reasons.
- Role (pastel yellow): classes that represent the way a party (person or organization), place, or thing participates in particular kinds of event or activity (Moment-Intervals).
- Party, Place, Thing (pastel green): classes that represent role players in Moment-Intervals (actors in the dictionary sense of the word and not the UML sense).
- Description (pastel blue): Short for catelog-entry-like description,
classes of this archetype represent common or default values and associated
behavior for sets of party, place, or thing objects and sometimes for sets
of Moment-Interval objects too.
Note: The term archetype is preferred to the term stereotype precisely because the similarities between classes belonging to the archetype are not nearly as rigid as those generally associated with UML stereotypes and inheritance hierarchies. However, the class stereotype tag in UML diagrams is a very handy place to indicate the archetype of a class and most UML based modeling tools support the use of stereotypes.
As stated earlier, each of the four class archetypes has a set of attributes that we might typically expect to find in classes belonging to that archetype. In each case, this set includes some sort of reference number attribute. The Moment-Interval class archetype has an attribute called referenceNumber, the Role class archetype has an assignedNumber attribute, the Party-Place-Thing attribute has an identityNumber attribute, and the Description class archetype has an itemNumber attribute. This article examines these reference number attributes in detail, looking at the similarities and differences between them and identifying some strategies to use when modeling and implementing them.

Figure 1: The four major class archetypes with their typical attributes
Not a Number
It is one of those strange historical quirks that what we call identity numbers or reference numbers are often not numbers at all. They are all too frequently character strings containing letters, digits, spaces, and other symbols like hyphens, underscores and colons. Therefore, assuming a reference number attribute should be modeled by an integer is generally a mistake. Even if a reference number is an integer today, there is no guarantee that users/clients will not want it to be a character string tomorrow. In addition, there is generally no requirement to perform arithmetic operations or arithmetic comparisons on reference numbers. Even if it is convenient or clever to do so, it would be courting trouble because these operations and comparisons would likely break if users/clients ever wanted a change in format or values. Our first strategy is therefore:
Reference Number Strategy #1
Model reference number attributes as character strings and not as integers because users/clients frequently do not expect them to be numbers in the arithmetic sense and may want to include letters and other characters in the values at a later date.
Not a Primary Key
Another important similarity between these attributes is that they do NOT necessarily make good candidates for primary keys of database tables. They are just as unsuitable for low-level persistent or remote object identifiers.
The reference number attributes exist to enable humans to match software objects unambiguously with the artifact that they represent in the real world. For example, serial numbers on items enable us to match a software object in our computer system with the actual physical item that it represents. A receipt number enables us to match a paper receipt with the software object representing that particular purchase. And so on.
Primary keys or other sorts of low-level persistent or remote object identity numbers also serve to uniquely identify a software object. However, these are technical architecture/infrastructure level identifiers and exist so that our software can store, locate and retrieve the correct objects from memory, persistent storage, and transfer them across network links, etc.
Note: This is not to belittle the importance of primary keys and persistent or remote object identifiers. They are critically important in any successful enterprise business system. They are so important that we assume their existence when building object models of a problem domain and they are not shown in our UML diagrams.
Unfortunately, (or maybe fortunately depending on your philosophical perspective), what makes for a simple, efficient identifier for a software program does not often make for an identifier that is easily remembered by a human being. Using the same attribute for both is therefore not a good idea.
Although we might be able to live with the inefficiency of using an identifier biased towards human recognition for both system-level and problem domain identifiers, there is a second bigger disadvantage. Users or clients often want to change the format of a reference number to make it easier to remember or to reflect some change such as a major restructuring of the company. If the reference number is being used as a primary key or persistent object identifier, these changes can cause very nasty ripple effects because the reference number is being used in foreign key values or their object-oriented equivalent throughout the system and possibly other systems too.
Therefore, a rule of thumb that says we should never make the system-level identifiers of an object visible to end-users. This is because, if we do make them visible, there is a good chance that end users will start using them as reference numbers and will, sooner or later, want to change them, restructure them, encode extra information into them, or make them easier to remember, etc. This becomes our second strategy in this area:
Reference Number Strategy #2
Never make a system-level object identifier visible to end-users. Instead provide end users with their own unique reference numbers attributes, the format and composition of which can be changed with little or no impact on technical architecture level code.
Reference Number Differences between Archetypes
Why do the four archetypes have different names for their reference number attributes if they do similar jobs i.e. that of allowing humans to match a software object with its real world equivalent? The answer is that in each case the names help us identify real world reference numbers. The different names also serve to remind us of typical differences in who is responsible for creating those reference numbers and when. Lets look at each in turn.
Moment-Interval Class Archetype
In business systems, Moment-Interval classes often represent business transactions between two or more parties. Traditionally such transactions required each party to retain a paper record of that transaction. To be able to match the various parties' copies, the parties agree on a unique reference number for the transaction and include it on any piece of paper related to the transaction. We now do the same electronically by storing a reference number as an attribute in our Moment-Interval objects representing that transaction. In many cases we still need to produce pieces of paper with the reference number on. Examples include receipt numbers on till receipts at retail stores and invoice numbers on invoices sent by mail or fax.
The generation of these kinds of reference number is typically the responsibility of our computer system. In business systems this is often a concatenation of character strings derived from other properties of the transaction plus a sequence number to ensure uniqueness.
Real time systems such as supervisory control and data acquisition (SCADA) systems, often uniquely identify Moment-Intervals by the date, time, and device involved in the relevant event or activity. In these cases, we might derive a reference number completely from the properties of a Moment-Interval object.
We also have a classic space versus speed design trade off to make. Do we always generate the derivable parts of the reference number when asked for it or do we store the reference number in its own private attribute? Note that we would have to regenerate any stored value if the values of the relevant properties changed before an event or activity completed. Another point to note is that if the reference number is completely derivable and we choose not to cache its value, the referenceNumber attribute in the Moment-Interval archetype really represents a query operation rather than a private instance variable (private field, private member variable) in the object.
From this analysis we can derive three strategies:
Reference Number Strategy #3
In business systems look for reference numbers that uniquely identify relevant business transactions to all involved parties.
Reference Number Strategy #4
Challenge any printing requirement for a Moment-Interval class that does not include the reference number as part of the printed output.
Reference Number Strategy #5
In real-time systems investigate the possibility that a unique reference number can be derived from date and time that the associated event or activity occurs and the identity of the device involved. In business systems look for short meaningful character strings derived from other properties of the Moment-Interval that when concatenated make a meaningful, memorable reference number.
Note: In [Coad99] this attribute was simply called number. I think referenceNumber conveys a more accurate idea of the responsibilities represented by this attribute.
Role Class Archetype
To perform the duties or obtain the privileges of a role, the role player (a person, place or thing) frequently requires some sort of authorization. Often these authorization processes are outside the scope of our problem domain. When this is so, our Role classes become responsible for remembering the authorization code, license number, and associated information that result from that process. For example, objects of a Pilot class are responsible for remembering the pilot's license number and objects of an Employee class are responsible for remembering the employee numbers assigned by the HR department. This is the purpose of the assignedNumber attribute in the Role class archetype.
User names and user id's that help authenticate a user of a system are another common example of an assignedNumber attribute in a role class. Often the users chose these themselves or a standard security process/policy defines them (e.g. my university user name was defined for me and consisted of my first name followed by the first letter of my surname followed by the year I hoped to graduate).
Places and things sometimes play roles in our system too and the assigned number attribute continues to be typically useful. For example, in the United Kingdom, a building used as a public house (pub) must be licensed to serve alcohol. The building is the role player, the role it is playing is a pub or bar, and the value of the role's assignedNumber attribute is the license number. Certificates may also be issued for devices indicating that they have been tested and found fit for their intended use. The device is the role-player, the role is the use to which the device is to be put, and the certificate number is the value of the assignedNumber attribute.
Therefore, we have:
Reference Number Strategy #6
For role classes, look for the results of processes that authorize the role-player to play that role and that need to be remembered in assignedNumber attributes.
Party Place Thing Class Archetype
Parties, Places and Thing objects are also responsible for being able to be matched with their real-world counterpart. For things this normally means remembering a serial number of some sort. For places, this is often an address, lot or bin number, or a set of coordinates. For organizations, there is normally some legally required registration number that can be used. However, when it comes to uniquely identifying individual people, we can rapidly find ourselves in a quagmire of political sensitivity, regulations, and policies. Another of article, Party Time: Modeling Legal Id's, discusses this problem in some depth.
Note: [Coad99] calls the attribute representing this responsibility in the Party-Place-Thing class archetype, serialNumber. I feel this name is far too specific to the Thing flavor of this archetype. I prefer the more general name, identityNumber for the archetype as a whole but remembering that this often translates to serialNumber for Things.
For the vast majority of systems, generating the values for role-players' reference numbers is somebody else's problem. Occasionally, however, we might be asked to generate a reference number for role-players. These can be used in the absence of a suitable externally supplied reference number or as a much simpler identifier especially if external identifiers for the objects involved vary in format. One example might be producing simple bar code values for videos, CD's and DVD's in a media rental shop so that returned items can be identified and processed quickly. Another example might be a reference number used only within an organization to simplify the identification of individual customers. See again Party Time: Modeling Legal Id's.
Therefore, we have:
Reference Number Strategy #7
To simplify identification, consider using an internally generated reference number for role-players in addition to or instead of varying forms of externally generated identification number that might already exist.
Description Class Archetype
Classes belonging to the description class archetype often represent catalogues of products and services. For example, a car sales system might have a description class called CarModel each instance of which describes a particular make and model of car available for sale. A consultancy organization might have a similar catalogue of service offerings.
The description class archetype actually has two typical reference number attributes.
The first reference number is represented by the type attribute and this is typically used to store the manufacturer's or supplier's name or code for the product or service. For example, in our car sales system the type attribute of our CarModel class might store values like BMW 520i. Obviously the values for this attribute are typically generated outside of our system and will need to be entered or uploaded in some way.
The second reference number attribute is the itemNumber attribute. An attribute like this is typically used to store the item's entry number in a catalog. These catalog entry numbers are typically under our control and our system is likely to be responsible for generating their values as we enter product descriptions into the system.
The idea of external and internal reference numbers is similar to our previous strategy #7 for role-player classes.
Concluding Remarks
Remember that the attributes of a class archetype are typical and not always needed in classes belonging to that archetype. There is an overriding strategy of only including attributes in a class that help fulfill its responsibilities within a particular system. If the system does not require a particular problem domain class to have a reference number attribute, do not give it one just because its class archetype suggests one.
Also the names of the attributes are necessarily domain neutral. It is always a good idea to replace the domain neutral names of the attributes with more specific names. In other words call a receipt number attribute, receiptNumber in preference to referenceNumber. Call a user name attribute userName instead of assignedNumber. We could use UML stereotype tags again to indicate that our receiptNumber is a kind of reference number but this is probably overkill and will add too much clutter to our UML diagrams.