Construction principles for the IT architect: (10) Separating application function from data storage

In this series of blogs, I dwell on the still valid information science construction principles that guarantee better ‘information building’. This 10th blog in the series continues with the first informatics principle (for the distinction, see my starting blog). This one is about the construction of data processing.

We don't dwell on it but every screen we process information on hides an underlying construct that extracts data from a storage and allows us to access or modify this data or add new data. If the processing function and the data storage are separated, this has great advantages. In fact, this is the information science principle of decoupling but applied in computing.

placeholder

Why separate?

To clarify the principle, I use the metaphor of transportation. The goods are the data. That is ultimately what has value for the end user. The various transport systems are the processing functions or application functions . If the goods were hard-coupled to the transport system (e.g. container transport), the good would be ‘trapped’ in that container from production in the factory to the end user. In itself, this is not an impediment to transport but it creates constraints. The goods cannot then be transported by any other transport system (by air or in a smaller vehicle). There would be container trucks driving and ships sailing everywhere, and redistribution from the container packing to another means of transport is impossible. This is inflexible from the point of view of the goods and the end user.  

In other words, if the data cannot be processed separately from the application function, then it is ‘trapped’ in that application function. The user can only access the data through that application function. 

In freight transport, the ability to deliver goods through different transport systems has been created to suit the use. The good can be ‘accessed’ by different systems to perform the desired function with it. If it has to be fast then it can be done via air freight. In a city via Coolblue's cargo bike. 

If we separate application function and data storage, the data can be processed by more than that one application function! That provides all kinds of advantages 

How to separate?

Between the application and the data is an ‘API’ (an Application Programming Interface). An API is the modern decoupling component in computing that separates the data from the application. By standardizing the API, different applications can access a data store in the same way. But beware; this standardization is no mean feat! The structure of the data store and meaning of the data does need to fit the API standard. Data management should ensure that data quality is appropriate and that it is clear which data storage is the source for which data. 

Moreover, access to the data (which, in the case of unsegregated storage, is via the application function) must be controlled on the storage itself. This means that the API must be authorized via a security mechanism, possibly such that the end user accessing data via the API can be authenticated and authorized to that data. 

So it is certainly not easier to implement than an unsegregated application! But you get something in return in terms of vendor independence and maintainability. 

Vendor independence

One of the advantages of not being ‘trapped’ in the application function is that the user can avoid vendor lock-in. By imposing requirements on data storage, you can use it outside the vendor's application. In the past, this was often the reason why this principle was not applied by vendors. Still, data portability by suppliers is limited and it is difficult to renew an application without major data migration processes because the data storage only works for that one application. 

This principle also contributes to the FAIR (findability, accessibility, interoperability, and reusability) principles formulated in 2016 for scientific datasets. In particular, accessibility and reusability benefit from a dataset that can be accessed separately from the application.  

Maintainability

The separation benefits the maintainability (modifiability) of both application layer and data storage. Difficult to build new functionality in the existing registry application can be added as a separate application on the same data storage. The Mendix platform has exploited this advantage. 

Conversely, you can replace an obsolete database with other technology (even a cloud database) while leaving the application itself untouched. Only the API needs to link to the new database location. 

Application examples

This principle is one of the central tenets within Common Ground. By separating data, data can be shared between different applications without having to be stored multiple times. The goal here is to have all applications get the data from the same source. The aforementioned effort to ensure the quality and meaning of the data though is the flip side of the benefit. 

Another application we see within Whatsapp where the media received on the phone is also stored in the media store so that it can be accessed separately from other apps. Since the context is a person's phone, the security for that separate media storage is already in place. 

But there are still examples of it not being applied. For example, with email. Try accessing a .pst file from Microsoft Outlook (which contains all your email) separately from Outlook. Or save an email as a separate file so that you can edit and forward it in another program as email with attachments. I am sure there are handy conversion tools on the internet but it is not easy. And that's because no standard API has been defined for email messages.  

Another example can be seen with the many drawing tools. Each of these has its own storage format so a drawing can only be edited in that tooling.  

As an IT Architect, you have to weigh up this principle carefully. It is not always necessary, there must be a justification for the extra effort you have to put into developing or applying an API between application layer and data layer. But usually that justification is quickly found. 

Read the other information science principles here:

  1. Meaningless identity designation, read here.
  2. Decoupling points for complexity reduction and flexibility, maximizing independence of components, read here.
  3. Language consistency, read here.
  4. Clear distribution of responsibilities and functional separation for administration, read here.
  5. Delegating decision-making authority as low as possible, read here.
  6. Detaching authorization from identification/authentication, read here.
  7. Single registration of master data, read here.
  8. Separating data and metadata in storage and processing, read here.
  9. Applying standard patterns without deviations, read here.
  10. Separating application function from data storage, read here.

Related Insights

divider