Construction principles for information professionals: (3) Unambiguous and consistent language

By: Rutger Gooszen

In this series of blogs, I will delve into the enduring principles of information architecture that ensure better "information structures." Sometimes, these principles have been overlooked in the rapid advancement of technology. In this blog, I will focus on the necessity of clarity and consistency in language within a "system" or "domain" as aspects such as privacy protection, information security, and information management heavily depend on it.

Principes van informatiemanagement, Decentralisatie van besluitvorming binnen de organisatie, Impact van besluitvormingsbevoegdheid

Unambiguous means that something has only one interpretation. It is clear and cannot be understood differently. Another word for unambiguous is unequivocal. Consistency means being free from internal contradictions. Therefore, language is unambiguous and consistent when the meaning of words is unequivocal, and there are no contradictions. The language used in an information representation of reality should meet these criteria. For instance, the term 'application' is defined in general administrative law as 'request.' Now that it is unequivocally established that every application is a request (to make a decision), it becomes evident that rules referring to requests are also applicable to applications.

The meaning or semantics of words is a tricky phenomenon because it involves human perception. Each person associates new things with what they already know or have experienced. Thus, giving new meaning to something that already holds a different meaning in someone's eyes takes time.

Clarity and consistency of language pose challenges

Clarity and consistency of language are among the most challenging aspects of communication and information architecture. They are often underestimated, with statements like "I don't need to explain that; everyone understands it" commonly heard. However, even the dictionary (the thick of Dale) often provides more than one meaning for a word. Not to mention jargon or technical lexicon, which might not even be listed there. Information represents reality through language, which is limited in its expressive power and finesse. Therefore, we use many words to approach reality. But in an information system, we often lack the space for such elaboration, leading us to use different key concepts or objects. The same "thing" can have different meanings or roles in information architecture, and people find it difficult to distinguish between them. In an information context, these "things" are considered distinct if they have different birth and death criteria. This means they become relevant to the organization or cease to exist at different moments, and someone determines that.

For Jaap van Rees [1], mentioned in previous blogs, consistency meant separately recording the unique data of different key concepts (so the common data can be combined), even if those data pertain to the same real-world entity! For example, if the person "Wim de Graaf" (key concept 1: person) is both a treating doctor (key concept 2: doctor) and a patient (key concept 3: patient) in a hospital, the unique data of Wim related to each role (a key concept) should be linked to that object and not lumped together with Wim as a person. However, Wim's address can be linked to the person because it is the same for both the patient and the treating doctor. This rule was long necessary when building relational data collections. Nowadays, there is technology to make this more flexible through semantic models, but the essence remains that an object must be unambiguously named and have an information lifecycle.

Change is an opportunity to get it right

A crucial first step in a new development or change that requires support from an information system is to clarify the key concepts or objects in the relevant domain and determine their birth and death criteria (lifecycle) and who has ownership over them. The aim should be semantic interoperability; the ability of systems or organizations to interpret data in the same way from both the sender's and receiver's sides. This applies even when considering relationships between objects! It often involves a hierarchy of classes and subclasses of concepts. For instance, a car is a vehicle, and a bike is a vehicle. But is a boat also a vehicle, or is it a vessel? That depends on the definition, one might say. Therefore, you must make those definitions explicit. If not, a semantic conflict arises, and both sides fail to understand each other, resulting in an inconsistent system.

A common way to represent essential key concepts and their relationships is through a domain model or object model. A list of definitions for the objects helps ensure a shared language. A good list names the definition, the source it came from, and whether it has been established or is still a concept. It may also include synonyms or homonyms.

The larger the project, the sooner you need a clear design.

I have previously mentioned that speaking the same "language" often requires significant effort, and therefore, translations between domains are inevitable, or linked data (semantic web) can eliminate the need for them. However, within one domain (e.g., healthcare or criminal justice), linguistic consistency is essential to establish effective information provisioning within that domain. An excellent example of consistent language across multiple domains (or the entire government domain) is the system of basic registrations. But achieving this has not been without its challenges!

Our legislative system encompasses many different domains and is an example of poor consistency across those domains, even when referring to the same concept. For instance, the term "taxable income" might have the same (legal) definition and calculation method for many organizations, but it is named differently (assessable income, taxable income) based on context. This leads to synonyms that appear to have different meanings. Maintaining consistency proves difficult for lawmakers.

What goes wrong when this principle is not applied? If the language used in a system is inconsistent, the system's comprehensibility suffers, and maintenance becomes more challenging (as seen in our legislation). Often, the roles of an object are not adequately separated. As a result, data related to an object that is no longer relevant for registration (e.g., a discharged patient) might still be retained because the object has multiple meanings. In the example of the doctor and the patient, patient data might be kept for too long because the person is still employed as a doctor.

An architect who is aware of this construction principle must make the "language" unambiguous and consistent from an information perspective at an early stage. Failing to do so, or doing it too late, means that in a large project, everyone will have assigned their own meanings to all the objects to be registered without realizing their interconnections. And reversing this process will be a challenge!

Read the other information science principles here:

Meaningless identity designation, read here.
Decoupling points for complexity reduction and flexibility, maximizing independence of components, read here.
Language consistency, read here.
Clear distribution of responsibilities and functional separation for administration, read here.
Delegating decision-making authority as low as possible, read here.
Detaching authorization from identification/authentication, read here.
Single registration of master data, read here.
Separating data and metadata in storage and processing, read here.
Applying standard patterns without deviations, read here.
Separating application function from data storage, read here.
Device-independent development, read here.
Choose a Storage Structure, read here.
No hidden interfaces, read here.

References:

[1] De informatie-architect, from Rees / Wisse, 1995

Related insights