{"id":2974,"date":"2022-07-12T17:28:42","date_gmt":"2022-07-12T17:28:42","guid":{"rendered":"https:\/\/pramodsingla.com\/?p=2974"},"modified":"2022-07-12T18:09:26","modified_gmt":"2022-07-12T18:09:26","slug":"2974","status":"publish","type":"post","link":"https:\/\/pramodsingla.com\/?p=2974","title":{"rendered":"Data Warehouse Vs Data Lake vs Data Lakehouse"},"content":{"rendered":"<h3 style=\"text-align: center;\"><strong>Data Warehouse Vs Data Lake vs Data Lakehouse<\/strong><\/h3>\n<p><strong>What is a Data warehouse?<\/strong><\/p>\n<ul>\n<li>A\u00a0data warehouse\u00a0is a type of\u00a0data management\u00a0system that is designed to support business intelligence (BI) activities. It often contains large amounts of historical data which is usually derived from a wide range of sources such as transactional systems,\u00a0relational databases, and other sources, typically on a regular cadence.<\/li>\n<li>Organizations build their data warehouse by using technologies like SQL Server, Oracle, Teradata, Redshift, PostgreSQL, etc.<\/li>\n<li>Both the file format and the processing engine are typically proprietary technologies.<\/li>\n<\/ul>\n<p><strong>What is Data Lake?<\/strong><\/p>\n<ul>\n<li>A data lake is a distributed storage solution that runs on commodity hardware and easily scales out horizontally.<\/li>\n<li>The data lake architecture, unlike that of the data warehouse, decouples the distributed storage system from the distributed computing system. This allows each system to scale out as needed by the workload.<\/li>\n<li>Organizations build their data lakes by independently choosing the following:\n<ul>\n<li>Storage system e.g., Azure Data Lake Storage, AWS S3, or Google Cloud Storage.<\/li>\n<li>File format e.g., Parquet, ORC, JSON, CSV etc.<\/li>\n<li>Processing engine e.g., Spark, Hive, Hadoop, HDInsight, Amazon EMR, etc.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><strong>What is Data Lakehouse?<\/strong><\/p>\n<ul>\n<li>A data lake house is a data solution concept that combines elements of the data warehouse\u00a0with those of the\u00a0data lake. Data Lake + ACID + Data Governance +Merge<\/li>\n<li>Organizations build their data <strong>lake house<\/strong> by independently choosing the following:\n<ul>\n<li>Storage system e.g., Azure Data Lake Storage, AWS S3, or Google Cloud Storage.<\/li>\n<li>File format e.g., Delta Lake, Iceberg, etc.<\/li>\n<li>Processing engine e.g., Azure Synapse, Databricks, Spark, Flink, etc.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><strong>Comparison<\/strong><\/p>\n<table style=\"height: 1178px;\" width=\"635\">\n<tbody>\n<tr>\n<td><strong>Feature<\/strong><\/td>\n<td><strong>Data Warehouse<\/strong><\/td>\n<td><strong>Data Lake<\/strong><\/td>\n<td><strong>Data Lakehouse<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Support for Scale-out<\/strong><\/td>\n<td><strong>No<\/strong><\/td>\n<td><strong>Yes<\/strong><\/td>\n<td><strong>Yes<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Support for diverse data format<\/strong><\/td>\n<td><strong>No<\/strong><\/td>\n<td><strong>Yes<\/strong><\/td>\n<td><strong>Yes<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Support for diverse workloads<\/strong><\/td>\n<td><strong>No<\/strong><\/td>\n<td><strong>Yes<\/strong><\/td>\n<td><strong>Yes<\/strong><\/td>\n<\/tr>\n<tr>\n<td>Schema evolution<\/td>\n<td>No<\/td>\n<td>Yes<\/td>\n<td>Yes<\/td>\n<\/tr>\n<tr>\n<td>Support For ML<\/td>\n<td>No<\/td>\n<td>Yes<\/td>\n<td>Yes<\/td>\n<\/tr>\n<tr>\n<td>Support streaming processing<\/td>\n<td>No<\/td>\n<td>Yes<\/td>\n<td>Yes<\/td>\n<\/tr>\n<tr>\n<td><strong>ACID Compliant<\/strong><\/td>\n<td><strong>Yes<\/strong><\/td>\n<td><strong>No<\/strong><\/td>\n<td><strong>Yes<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Support Data Governance <\/strong><\/td>\n<td><strong>Yes<\/strong><\/td>\n<td><strong>No<\/strong><\/td>\n<td><strong>Yes<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Support Merge (Update+ Delete) support.<\/strong><\/td>\n<td><strong>Yes<\/strong><\/td>\n<td><strong>No<\/strong><\/td>\n<td><strong>yes<\/strong><\/td>\n<\/tr>\n<tr>\n<td>Support Indexing<\/td>\n<td>Yes<\/td>\n<td>No<\/td>\n<td>Yes<\/td>\n<\/tr>\n<tr>\n<td>Is Storage and Processing engine de-coupled<\/td>\n<td>No<\/td>\n<td>Yes<\/td>\n<td>Yes<\/td>\n<\/tr>\n<tr>\n<td>File Format<\/td>\n<td>Proprietary<\/td>\n<td>Parquet, ORC, JSON, CSV etc.<\/td>\n<td>Delta Lake, Iceberg etc.<\/td>\n<\/tr>\n<tr>\n<td>Processing engine<\/td>\n<td>SQL Server, Oracle, Teradata, Redshift, etc.<\/td>\n<td>Spark, Hive, Flink, Hadoop, HDInsight, Amazon EMR etc.<\/td>\n<td>Azure Synapse, Databricks, Spark, Flink etc.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Databricks: <a href=\"https:\/\/pages.databricks.com\/rs\/094-YMS-629\/images\/The%20Data%20Lakehouse%20Platform%20For%20Dummies%20%282%29.pdf?utm_source=databricks&amp;utm_medium=email&amp;utm_campaign=7013f000000Lj6ZAAS&amp;mkt_tok=MDk0LVlNUy02MjkAAAGFkDLIKIOTUNerbqb4XXYlWpSaj0zGQtNwbdMtptkyt-yPa0j-6GOdVP_ZDrNOaeSHSrglM-Ut0smFb7kZxlQQ7XkrlMq_i3TyzHaebDbiNlAAKQ\">The Data Lakehouse Platform for Dummies <\/a><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Data Warehouse Vs Data Lake vs Data Lakehouse What is a Data warehouse? A\u00a0data warehouse\u00a0is a type of\u00a0data management\u00a0system that[&#8230;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_mi_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0},"categories":[6,7,8,11,205,17,204],"tags":[49,51,206,209,210,207,208],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/pramodsingla.com\/index.php?rest_route=\/wp\/v2\/posts\/2974"}],"collection":[{"href":"https:\/\/pramodsingla.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pramodsingla.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pramodsingla.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/pramodsingla.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2974"}],"version-history":[{"count":9,"href":"https:\/\/pramodsingla.com\/index.php?rest_route=\/wp\/v2\/posts\/2974\/revisions"}],"predecessor-version":[{"id":2983,"href":"https:\/\/pramodsingla.com\/index.php?rest_route=\/wp\/v2\/posts\/2974\/revisions\/2983"}],"wp:attachment":[{"href":"https:\/\/pramodsingla.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2974"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pramodsingla.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2974"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pramodsingla.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2974"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}