Data Lake

Where is your data?

You have a variety of data, such. B.

sales figures
inventory data
customer data
Internet visitors data
market data

They are in the most diverse system

databases
Data warehouses
ERP systems
Shop systems
Files

There are probably several systems in use in these groups. This presents you with a massive interface problem. The solution is data integration.

How do I integrate my data?

You now have the ability to program interfaces between all these systems, or at least those that seem most important. But then you have a complex maintenance problem, as the update entails a system, updates to multiple interfaces. Another problem is that the interfaces require knowledge in both systems between which the interface is to be implemented. Often, these systems are maintained by different departments. This requires many departments to work with many other departments. The question arises as to whether the resulting system is flexible and quickly adaptable ...

Another possibility is to decide on a system and put it in a star shape, so that each system only has to provide interfaces to this one system. The big question is which system can do this. It has to be easy to integrate and very flexible to map all systems. In addition, the license costs should be considered, as this new system must also be budgeted.

Many companies are starting a Big Lake project to solve this dilemma

What is a Data Lake?

A data lake is a way to store all the data of a business. The data can be structured (relational databases or data warehouses), semi-structured (CSV, logs, XML, JSON) or not structured (emails, documents or PDFs).

A Data Lake can be implemented with Hadoop (HDFS). The legacy systems are then connected using frameworks such as Flume or Kafka