Wednesday, April 03, 2019

The NoSQL Aggregate Data Model



The aggregate data model is based on the idea that elements of a complex objects are often manipulated in a related fashion (Sadalage & Fowler, 2012). These aggregates are in part a solution to the object-relational impedance mismatch problem. By saving the object that exists within an application in a similar fashion within the database, it greatly simplifies saving and loading of that information. Relationships between aggregates is maintained by links. This allows the manipulation of one aggregate (a customer for example) be independent from a related aggregate (like an order). This linkage is somewhat similar to the relationships in the relational model, but differs in that the things being related are potentially of much higher complexity.

Within a document database, such as MongoDB, Binary JavaScript Object Notation (BSON) is used to store aggregates (Kaur & Rani, 2013). MongoDB supports the creation of an index to increase the speed of execution of a query. Documents within MongoDB are essentially serialized versions of classes, and in turn aggregates. MongoDB supports composition and associations. Composition is when one aggregate is embedded within another. An association is a link, as mentioned earlier. It is possible to query MongoDB to retrieve a particular field, or element of an aggregate (MongoDB Inc, n.d.). For example, consider an aggregate that represents a customer. The customer has a name, age, and address. It is possible to query all customers and project only the name and age fields.

The benefits of an aggregate data model is that allows for easy mapping to the object model within an application. If the object model is well constructed, the data model will be also. It also provides a reasonable partitioning mechanism, as the elements within an aggregate are closely related and should be stored physically close to each other. Another advantage of the aggregate data model is that retrieval of a given aggregate is simplified compared to that of the relational model. In the relational model, a transformation is often required to construct the object the calling application requires. Within the aggregate model, that work is not required.

A negative of the aggregate data model is that there are not abstraction layers, similar to views in the relational model. The consumer of an aggregate data model typically interacts directly with the data store. If the aggregate changes it is the responsibility of the consumer of the data to adjust to those changes. Additionally, I have firsthand experience attempting to retrieve information from an aggregate data model in a way that was not intended by the designers of the data model. In some systems, it is necessary to retrieve every entire document to perform analytics.

I have worked with NoSQL databases in the past. The term aggregate data model was not used, but the concept is valid. Typically, individuals refer to the database’s JSON, rather than an abstract concept. MongoDB refers to the concept as a “denormalized” model (Anonymous, n.d.). When the term aggregate is used, it typically relates to the analytical operation. This is yet another example of the lack of standardization within the NoSQL database space.

References
Anonymous. (n.d.). Data Model Design. Retrieved from https://docs.mongodb.com/manual/core/data-model-design/
Kaur, K., & Rani, R. (2013, 6-9 Oct. 2013). Modeling and querying data in NoSQL databases. Paper presented at the Big Data, 2013 IEEE International Conference on.
MongoDB Inc. (n.d.). Project Fields to Return from Query. Retrieved from https://docs.mongodb.com/manual/tutorial/project-fields-from-query-results/
Sadalage, P. J., & Fowler, M. (2012). NoSQL distilled: a brief guide to the emerging world of polyglot persistence: Pearson Education.