Forecasting AI blog





The Data Centric approach

The Data Centric approach

By G. H.


February 17, 2022




If we recognize that value is created through the use of data, then in the age of Big Data, we need to ask ourselves the right questions about our ability to identify, collect and analyze data, but more importantly, to perceive its value to the company, its customers, suppliers and partners.

By adopting the Data Centric concept, a company will be able to extract value from all data, whether it is white (internal), grey (external) or black (dark data: data present in large quantities in companies but not used). To achieve this, a company must change its current approach to data, inherited from 20 years of business intelligence culture.

The business intelligence vision

The data analysis architectures that exist in companies most often follow the same model. Data collection tools dump data into a repository, where it is cleaned and reconciled, and finally stored in a data warehouse for users to analyze with business intelligence solutions. Therefore, the boundaries of data and analytics are limited to the perimeter, which is contained and controlled by the business. The whole process is about answering questions based on known data.

If a company can anticipate the value associated with the use of white data, what about the use of gray and dark data, since it is by definition unprepared to receive this data, let alone determine its value?

Rethinking the approach to data

For an enterprise to extract value from all data, i.e., to collect, store and analyze all data, it must change its approach to data. The current logic is to implement a Big Data platform to deploy the Data Lake. This data lake brings a new dynamic to the information system by providing a global space to store and analyze all data, raw or refined, from internal or external sources.

Often, the mistake is that the company keeps the traditional vision of data integration: data is extracted from the source and copied into the data lake to make it available for analysis. However, with the advent of very large amounts of data, this strategy of gathering all the data in one place can be counterproductive. Potentially costly in terms of time, processing and storage, the value created may be low. It is therefore important to identify the data sources that need to be integrated and rethink the cross-application integration strategy.

For the Data Centric strategy to be successful, it must be linked to another concept: the extended enterprise.

New limits for the information system

Today, IT has "abandoned" certain data without being able to capture it and/or integrate it easily into the information system. This grey and dark data is, for example, digital data created in the cloud and manipulated directly by marketing departments, or certain industrial production data that remains in place because it is difficult to repatriate into an information system.

Thus, the notion of data isolation is pushed to its limits; isolated data no longer resides in a data warehouse, but is deported "somewhere" inside or outside the company while remaining visible to the business.

The notion of the extended enterprise was born from the observation that IT only considers or values data if it knows how to identify, manage and, if necessary, store it. In the logic of Data Centric, it should be possible to see and analyze all the data in the company, including those that are not within the traditional boundaries of the information system. However, since copying all data from all sources doesn't make sense, even in the context of Big Data, an approach in which the information system extends to virtually all sources in the enterprise should be preferred.

But if all the data is not copied locally into a "data lake", how can it be browsed and analyzed to benefit from it?

Edge computing

While there is no literal translation of the term Edge Computing, the general idea is simple: add value to data where it resides. Depending on the company's data strategy and in order to avoid massive and costly data movement to the Data Lake, Edge Computing prefers to process information as close to the data as possible, sending only useful data to the Data Lake and at a lower cost.

The new Linky smart meters are a very good example: they are able to transmit to the network either the customer's electricity consumption over a given period of time, or the total amount of his daily consumption. This information will be stored (including consumption peaks) in the data lake of the electric operator.

The Data Centric logic offers the company unprecedented flexibility in creating a Big Data project. In addition, the focus on data value avoids the pitfalls of reflexes from 20 years of BI history, helps define data boundaries, and ultimately drives Big Data adoption by solidifying and streamlining the use of the Data Lake. And that's without forgetting the aspects of good governance and security that certainly have a strong impact on the implementation of such projects.