Data lakes are the central repository which allow storage, processing and security of huge structured, semi-stuctured and unstructured data. This kind of database system are used by large companies.
Data warehouse is system used to analyse, report data for business intelligence or decision making.
Lakehouse is a new open architecture which combines best elements of data lakes and data warehouse to manage large amounts of data.
There are multiple database software out there in market. Some of the notable and highly known software are MySQL, MS SQL Server, Oracle, Snowflake, Google BigQuery, Redshift, Databricks. All these support all sorts of basic to advance functionalities such as OLAP, OLTP, searching, sorting etc.
Today I attend a webinar on Oracle MySQL Heatwave Lakehouse and got some insights which make it more useful and practical. This version of database is mainly used by large corporations only not much of usage for individuals or small scale businesses. But updating your knowledge regarding the latest happening in the database industry is good.
Oracle MySQL Heatwave support OLTP, Analytics, Machine Learning, Autopilot and is scalable from 1 to 512 nodes. Each node can process upto 0.5 TB of data. Heatwave is mainly designed for structured data. It is available at Oracle Cloud Infrastructure (OCI), AWS and Azure cloud platform.
Autopilot and in built machine learning capability are most fascinating feature without requirement of external apps. Autopilot recommends hyperparameter tuning like number of nodes to perform the task in optimized way, time to process the request.
MySQL heatwave is already in operation and is being used by many companies. What's new is the MySQL Heatwave Lakehouse, which allow processing and applying all the power of HeatWave on object store i.e. CSV files, Log files, or we can call semi-structure data. It involves analyzing Terabytes of files to recommend data types, data size, number of node required. It allow querying data in files as well as OLTP.
It also features Autopilot, which handle all basic data management task which require database administrator. It can process upto 500 TB of data. Lakehouse does not store the files data in database rather allow query from files directly. Currently Lakehouse is only available in OCI clouds.
Despite handling of data objects, performance of Heatwave and Lakehouse is same in terms of time and price. If we compare it with others like Redshift, Databricks, Snowflake and Google BigQuery, it is cheaper, faster and scalable.