{"id":12365,"date":"2015-04-23T03:06:31","date_gmt":"2015-04-23T03:06:31","guid":{"rendered":"http:\/\/www.smartdatacollective.com\/index.php\/post\/data-lake-debate-final-word-negative\/"},"modified":"2015-04-23T03:06:31","modified_gmt":"2015-04-23T03:06:31","slug":"data-lake-debate-final-word-negative","status":"publish","type":"post","link":"https:\/\/www.smartdatacollective.com\/data-lake-debate-final-word-negative\/","title":{"rendered":"The Data Lake Debate: The Final Word from Negative"},"content":{"rendered":"<p><span style=\"font-size: small;\"><img loading=\"lazy\" loading=\"lazy\" decoding=\"async\" class=\"imgp_img alignright size-full wp-image-12212\" style=\"float: right;\" src=\"http:\/\/www.smartdatacollective.com\/wp-content\/uploads\/2015\/03\/DLD-banner.jpg\" alt=\"Data Lake Debate\" width=\"600\" height=\"150\" \/><img loading=\"lazy\" loading=\"lazy\" decoding=\"async\" class=\"imgp_img alignleft size-full wp-image-12213\" style=\"float: left;\" src=\"http:\/\/www.smartdatacollective.com\/wp-content\/uploads\/2015\/03\/anne-bobblehead.png\" alt=\"Anne Buff\" width=\"100\" height=\"100\" \/><\/span><\/p>\n<p><span style=\"font-size: small;\"><br \/><\/span><\/p>\n<p><span style=\"font-size: small;\"><img loading=\"lazy\" loading=\"lazy\" decoding=\"async\" class=\"imgp_img alignright size-full wp-image-12212\" style=\"float: right;\" src=\"http:\/\/www.smartdatacollective.com\/wp-content\/uploads\/2015\/03\/DLD-banner.jpg\" alt=\"Data Lake Debate\" width=\"600\" height=\"150\" \/><img loading=\"lazy\" loading=\"lazy\" decoding=\"async\" class=\"imgp_img alignleft size-full wp-image-12213\" style=\"float: left;\" src=\"http:\/\/www.smartdatacollective.com\/wp-content\/uploads\/2015\/03\/anne-bobblehead.png\" alt=\"Anne Buff\" width=\"100\" height=\"100\" \/><\/span><\/p>\n<p><span style=\"font-size: small;\"><br \/><\/span><\/p>\n<p><span style=\"font-size: small;\">Well, it seems you took the gloves off this time, Tamara. I appreciate the valiant effort and your passionate belief in the Hadoop ecosystem. However, given your revisit to the definition of the data lake and clarifications about Hadoop, I find it important to repeat the resolution we are debating: \u201c<\/span><em style=\"font-size: small;\">a data lake is essential for any organization to take full advantage of its data\u201d. <\/em><span style=\"font-size: small;\">We are not debating whether a data ecosystem is essential \u2013 just the data lake. While I will stand strong with you that a well-designed data ecosystem (open source or proprietary) of many interdependent systems is critically imperative for businesses to succeed in today\u2019s digital world, there are still ample concerns and cautions to consider before declaring the data lake essential. As I reflect on our debate, the following are the key issues keeping the data lake from the prestige and splendor in which you have presented it.<\/span><\/p>\n<p><span style=\"font-size: small;\"><strong>Physical attributes do not determine business value.&nbsp;<\/strong>Regardless of shape, size, or other linguistic expression used to define the qualities of a data lake, the data lake still remains a storage repository. Until the data is processed and consumed, it does not provide business value. Any storage repository on its own does not prove itself essential for the organization; it must be part of a larger, well-designed data infrastructure. The options for data storage architectures are numerous and the implementation choice should be contingent upon business need and technical requirements. A data lake is not the catchall answer.<\/span><\/p>\n<p><span style=\"font-size: small;\"><strong>The talent gap is real.&nbsp;<\/strong>If we were to accept your argument that the Hadoop Ecosystem is what organizations should be considering, the technical skills to support the environment would become even greater than just considering Hadoop, the open source project. As I mentioned before, finding individuals with the skills to access, query and manage just Apache Hadoop is difficult. If you add in the need for skills using Hive, Spark, Ambari, Pig, HBase, etc. and the wide variety of vendor distributions the talent pool is significantly smaller. In the event an organization is able to hire the talent (or grow it in-house) the cost and paranoia of turnover dramatically rises.<\/span><\/p>\n<p><span style=\"font-size: small;\"><strong>The risk is greater than the reward. <\/strong>It does sound idyllic to have any and all of the organization\u2019s data in a central location to serve the needs of the entire enterprise. But, at what cost? As I mentioned before, copying existing structured data to a data lake (especially transactional data) would be a duplication of effort and storage and would create additional risk for the organization. How many copies of the data do we need anyways? The source system, the data mart\/store, the data warehouse and now the data lake? Data integration is far more important than data co-habitation. Data governance and security are not inherent to the data lake environment (regardless of form). Without policies, procedures and additional technology to secure and protect this massive collection of data, the organization is at enormous risk. No executive in his or her right mind will jump on board for this. There is a reason <a href=\"https:\/\/www.capgemini-consulting.com\/resource-file-access\/resource\/pdf\/cracking_the_data_conundrum-big_data_pov_13-1-15_v2.pdf\" rel=\"nofollow\">Capgemini Consulting<\/a> found \u201conly 13 percent of organizations have achieved full-scale production for their big data implementations\u201d and \u201conly 27 percent of the executives surveyed described their big data initiatives as successful.\u201d The data lake is no exception.<\/span><\/p>\n<p><span style=\"font-size: small;\"><strong>Collection without purpose is hoarding. <\/strong>Like you said, not everyone is a Google or a Facebook. Well, not everyone is Amazon either. Storing everything is just not an option for most organizations. So the question becomes, \u201cWhat should be stored?\u201d Answering this question without consideration of strategic business initiatives or goals is futile.<\/span><\/p>\n<p><span style=\"font-size: small;\">The organizations with which I have worked that have implemented a data lake or a data-lake-like environment for technical initiatives have all had the same concern \u2013 \u201cNow that it is built, we need to convince the business to use it.\u201d To establish value and ensure use, the business needs to be involved in the data lake development from the onset. Business stakeholders care about <em>what<\/em> is stored \u2013 not <em>how<\/em> it is stored. Value will not magically appear without purpose.<\/span><\/p>\n<p><span style=\"font-size: small;\">All of that being said, there is one scenario where \u201cwithout purpose\u201d becomes the purpose (I mentioned this before as well.) In the world of analytics and data science, the data lake becomes a gold mine. The volume and variety of big data combined with the accuracy and structure of operational data provides a rich and fruitful environment for data wizards to develop and refine models that generate insights we never thought possible. Even in this situation though, I would argue that while the data lake is definitely valuable, the essential component is the brilliant analytical minds.<\/span><\/p>\n<p><span style=\"font-size: small;\"><strong>The alternative is\u2026<\/strong><\/span><\/p>\n<p><span style=\"font-size: small;\">You asked, \u201cIf a Hadoop-based data lake is not the answer, then what is?\u201d Organizations should absolutely begin to consider new ways of collecting, packaging and delivering data both internally and externally. Ultimately, it doesn\u2019t matter how or where the data is stored but instead how it is integrated and accessed for purpose. An organization\u2019s data infrastructure and strategy will be an evolution based on business needs and initiatives, budgets, technical skills and available technologies. In time, a data lake may in fact be a valuable asset in the essential and indispensable well-designed, purpose-built data ecosystem. But, by then maybe it will be a data river (ever flowing), or a data mountain (peaks and valleys), or whatever trendy industry term comes to be. Any which way, it will only a be a part, never the essential component.<\/span><\/p>\n<p><span style=\"font-size: small;\"><em>And for the record\u2026<\/em><\/span><\/p>\n<p><span style=\"font-size: small;\"><strong>Not all data lakes are Hadoop-based.<\/strong><\/span><\/p>\n<hr \/>\n<p>&nbsp;<\/p>\n<p><span style=\"font-size: small;\">Previously in the&nbsp;<a href=\"http:\/\/smartdatacollective.com\/all\/13556\" rel=\"nofollow\">Data Lake Debate<\/a>:<\/span><\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li><span style=\"font-size: small;\"><a href=\"http:\/\/smartdatacollective.com\/jilldyche\/302511\/data-lake-debate-introduction\" rel=\"nofollow\">The Introduction<\/a>&nbsp;\u2013 by Jill Dyche<\/span><\/li>\n<li><span style=\"font-size: small;\"><a href=\"http:\/\/smartdatacollective.com\/tamaradull\/306046\/data-lake-debate-pro-s-first\" rel=\"nofollow\">Pro\u2019s Up First<\/a>&nbsp;\u2013 by Tamara Dull<\/span><\/li>\n<li><span style=\"font-size: small;\"><a href=\"http:\/\/smartdatacollective.com\/cnuwvu\/307631\/data-lake-debate-questioning-pro\" rel=\"nofollow\">Questioning the Pro<\/a>&nbsp;\u2013 by Anne Buff and Tamara Dull<\/span><\/li>\n<li><span style=\"font-size: small;\"><a href=\"http:\/\/smartdatacollective.com\/cnuwvu\/308991\/data-lake-debate-negative-puts-stake-ground\" rel=\"nofollow\">Negative Puts a Stake in the Ground<\/a>&nbsp;\u2013 by Anne Buff<\/span><\/li>\n<li><span style=\"font-size: small;\"><a href=\"http:\/\/smartdatacollective.com\/tamaradull\/309646\/data-lake-debate-pro-cross-examines-con\" target=\"_blank\" rel=\"nofollow\">Pro Cross-Examines Con<\/a>&nbsp;&#8211; by Tamara Dull and Anne Buff<\/span><\/li>\n<li><span style=\"font-size: small;\"><a title=\"The Data Lake Debate: Pro Delivers First Rebuttal\" href=\"http:\/\/smartdatacollective.com\/tamaradull\/310921\/data-lake-debate-pro-delivers-first-rebuttal\" target=\"_blank\" rel=\"nofollow\">Pro Delivers First Rebuttal<\/a> &#8211; by Tamara Dull<\/span><\/li>\n<\/ul>\n<p><span style=\"font-size: small;\"><strong><br \/><\/strong><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Well, it seems you took the gloves off this time, Tamara. I appreciate the valiant effort and your passionate belief in the Hadoop ecosystem. However, given your revisit to the definition of the data lake and clarifications about Hadoop, I find it important to repeat the resolution we are debating: \u201ca data lake is essential [&hellip;]<\/p>\n","protected":false},"author":726,"featured_media":12212,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_seopress_robots_primary_cat":"","_seopress_titles_title":"","_seopress_titles_desc":"","_seopress_robots_index":"","footnotes":""},"categories":[48,50,22,20,30],"tags":[1867],"class_list":{"0":"post-12365","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-big-data","8":"category-data-management","9":"category-hadoop","10":"category-open-source","11":"category-policy-and-governance","12":"tag-data-lake-debate"},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.smartdatacollective.com\/wp-json\/wp\/v2\/posts\/12365"}],"collection":[{"href":"https:\/\/www.smartdatacollective.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.smartdatacollective.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.smartdatacollective.com\/wp-json\/wp\/v2\/users\/726"}],"replies":[{"embeddable":true,"href":"https:\/\/www.smartdatacollective.com\/wp-json\/wp\/v2\/comments?post=12365"}],"version-history":[{"count":0,"href":"https:\/\/www.smartdatacollective.com\/wp-json\/wp\/v2\/posts\/12365\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.smartdatacollective.com\/wp-json\/wp\/v2\/media\/12212"}],"wp:attachment":[{"href":"https:\/\/www.smartdatacollective.com\/wp-json\/wp\/v2\/media?parent=12365"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.smartdatacollective.com\/wp-json\/wp\/v2\/categories?post=12365"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.smartdatacollective.com\/wp-json\/wp\/v2\/tags?post=12365"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}