Information Delivery 2.0 - Reference Architecture
Converged architecture for information delivery
Apr. 26, 2012 09:15 AM
Information Delivery 1.0 Issues
With the enablement of new sources of data flow into the enterprise, it is time to look at the issues of Information Delivery 1.0 of the current enterprises.
- Disparate Data Sources, most enterprises have grown multiple database platforms and even within a platform, multiple databases for various reasons. Enterprises taking a lot of pain and effort on ETL towards synchronizing the data.
- Enterprises are slowly incorporating big data, unstructured data in their information delivery scope, but don't have clear means to integrate them.
- Rich Media content (audio, video, music and other binary content) are finding their place, but the real context of that content is identified more by the metadata and not by the content.
- Enterprises do not have a common search platform and searches are not targeted to the context of the content, but rather goes with the ID Numbers and other metadata.
- Now the volumes of the enterprises have moved from TERRA BYTE to PETA BYTE and beyond, enterprises don't have a clear mechanism to scale to this massive increase in data volumes both in terms of storage and compute power.
However, new platforms and tools are now available to the enterprises to tackle the above mentioned traditional issues. But it is time to provide enterprises with a Reference Architecture for Information Delivery 2.0 Platform, so that enterprises can fit in their Information Delivery architecture in alignment with the reference architecture and best practices.
Information Delivery 2.0 Reference Architecture
The following diagram provides the reference architecture for the Information Delivery 2.0 Platform.
Before we go to the individual building blocks of the reference architecture the following segregation of content is important.
The term Big Data refers to the massive influx of real time data, that is consumer generated like from social media sites or machine generated like Sensor or log data, this data is unstructured , however it is still textual in nature.
While the Rich Media data is still a large volume of data and unstructured , but more of audio, video, binary documents which are not in a textual format.
While both of the above categories are unstructured and some times classified under ‘ Big Data' in the reference architecture they are split into 2, due to unique requirements in processing each of them.
Traditional Enterprise Structured Data Layer
This is the layer that always exists in the traditional enterprise and the typical enterprise data warehouse, data marts, Operational Data Stores and transactional databases will continue to exist. As it is always difficult to de-commission the traditional data sources, the new reference architecture has a place for them. However as per the new reference architecture further sprawl and proliferation of disparate data sources is avoided or minimized.
Extended Enterprise - Real Time Big Data Layer
As we explained above, Big Data Layer is more about massive amounts of data generated from consumer oriented sources like social media, blogs and some of machine related sources like sensors, computer logs etc. This data is unstructured but it is textual in nature. As the real time propagation of this data is very important, this layer is represented inside a massively parallel, processing infrastructure layer that are facilitated by the platforms like Hadoop.
Extended Enterprise - Rich Media Layer
As evident in the diagram, there are more avenues of data sources like audio, video and other unstructured documents that are not textual in nature and require specialized tools and platform to understand them.
Context Aware Content Layer
This layer is more specific to the Extended Rich Media layer, as to understand the context of a rich media content like a video or photo, we needed more than traditional means but specialized engines. Hence this optional layer is represented separately.
Extended Data Virtualization Layer
As we introduced traditional data sources along with non-traditional data sources into the reference architecture, we needed a stronger platform to integrate all these under a single umbrella. Data Virtualization platforms provide such an option. It has been already explained in some of the previous articles about the characteristics of the data virtualization platform.
The Reference Architecture also outlines the key building blocks of the Data Virtualization Layer:
- Optimization (Various techniques to meet the performance needs of queries and transactions)
- Data Mining Models (A Optional feature to perform predictive analysis across all sources of data, traditional and extended data sources)
Enterprise Search Layer
This layer facilitates different consumers to search the content of interest without knowing the location, syntax, and the semantics of the underlying data source. The enterprise search is targeted at the Data Virtualization layer, which in turn performs the searches on the underlying data source, which could be traditional or extended enterprise.
Data Manipulation / Transaction Layer
This layer is important as the enterprise data cannot be static and there needs to be robust mechanisms to ensure that the data is kept current. The traditional data requires an ACID (Atomic, Consistent, Isolated and Durable) transactional layer while the extended enterprise requires non-transactional modification of data. Both will be facilitated through this layer. Again much like the Enterprise Search layer, transactional layer routes the request through the data virtualization layer.
Data Consumer Layer
This end layer is more about the data consumers, which are varied from ESB (Enterprise Service Bus), Reporting and Mobile devices. As these data consumers use a different kinds of protocols and APIs, Enterprise Search Layer and Data Manipulation Layer appropriately exposes the services for them to be consumed.
Governance Aspects of Information Delivery 2.0 Reference Architecture
While the above reference architecture covers the technology and platform aspects, this reference architecture also covers the following governance aspects typical of an information delivery.
- Master Data Management: Master Data Management enables development of a "single version of the truth" by establishing common descriptions for core business entities across multiple systems. This can be ideally implemented in the Extended Data Virtualization Layer.
- Information Quality: This ensures that credibility of the analysis made out of the information and transactions are logically consistent. This can be implemented in all layers with more concentration on the Data Virtualization Layer.
- Data Integration: This ensures that the data is shared across the enterprise in a consistent manner.
- Data Security: This ensures that the people are properly authorized to access the data and they get only what they are allowed to see.
- Information Life Cycle Management: When the data needs to be sourced, archived and purged. While the policies can be governed at the Data Virtualization layer, the implementation will be done at the respective data sources seamlessly. Apart from the governance aspects, this reference architecture takes care of the following Quality of Services needs of the information delivery.
- Availability: The reference architecture will be hosted on Cloud based delivery platforms to make it highly available, as the entire enterprise depends on it.
- Scalability: Massively Parallel Processing architecture facilitates high scalability and the growing needs of the enterprise. Cloud-based architectures and private cloud appliances with abilities to burst into public clouds play a major role with respect to this Quality of Service.
Other QoS factors like Performance, Security are already covered as part of the building blocks of the reference architecture.
As we see the emergence of a new enterprise information delivery needs, the above laid out reference architecture is important as it covers all the aspects of the proposed delivery model. Currently most of the large information management vendors are providing supporting products to fit to this reference architecture. While we may yet to see a complete offering from a single vendor, covering all the aspects of the reference architecture, but such a scenario is possible in the near future.