Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics in ecommerce
    Analytics Technology Drives Conversions for Your eCommerce Site
    5 Min Read
    CRM Analytics
    CRM Analytics Helps Content Creators Develop an Edge in a Saturated Market
    5 Min Read
    data analytics and commerce media
    Leveraging Commerce Media & Data Analytics in Ecommerce
    8 Min Read
    big data in healthcare
    Leveraging Big Data and Analytics to Enhance Patient-Centered Care
    5 Min Read
    instagram visibility
    Data Analytics Plays a Key Role in Improving Instagram Visibility
    7 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: The Data Lake Debate: Pro Delivers First Rebuttal
Share
Notification Show More
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Software > Hadoop > The Data Lake Debate: Pro Delivers First Rebuttal
Big DataData ManagementHadoopOpen SourcePolicy and Governance

The Data Lake Debate: Pro Delivers First Rebuttal

TamaraDull
TamaraDull
5 Min Read
Image
SHARE

Image

Contents
Revisiting Definitions (Again!)And the Alternative is…Without Purpose is Okay

Image

ImageIn keeping with the spirit of this Lincoln-Douglas debate format, it looks like I only have 4 minutes (or approximately 600 words) to rebut the anti-data lake arguments Anne presented in this post and this one. Let’s do it!

Timer: START!

More Read

In-database analytics and Decision Management

Datameer Provides End-user Focused BI Solutions for Big Data Analytics
Scalability-focused Email Marketing Solutions that Incorporate Hadoop
Maximizing the Value of On-demand Business Intelligence for Small and Medium Enterprises
Data Outsourcing Strategies Can Skyrocket Your Website Brand

One of the challenges in this debate – at least for me – is that Anne and I seem to be operating on different definitions of two key terms in this discussion: data lake and Hadoop. The reason I bring this up is because you see this same confusion, or lack of clarity, elsewhere. So that’s where I’d like to start.

Revisiting Definitions (Again!)

About the data lake. In my opening argument, I defined the data lake as a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. I also mentioned that a data lake can take on different shapes and sizes, and provided these examples:

  • A single data lake; or
  • A data lake with multiple data ponds—similar in concept to a data warehouse/data mart model; or
  • Multiple, decentralized data lakes; or
  • A virtual data lake to reduce data movement.

Whereby I’ve been operating under a more logical-based definition of a data lake during this debate, Anne’s been more focused on a single, physical storage repository in her arguments.

About Hadoop. Hadoop has two primary meanings: it’s both an open source project and an ecosystem of related projects and technologies. Here’s how they differ:

  • Open source project. When Hadoop made its commercial debut, much of the discussion was around Apache Hadoop, an open source project released by the Apache Software Foundation. Apache Hadoop was built to do two things: store and process any and all kinds of data.
  • Ecosystem. Today, when you hear discussions of Hadoop, it’s more likely about the ecosystem of projects – both open source and proprietary – that work with Apache Hadoop to make it a more robust data-everything platform. Apache Hadoop was never intended to do it all. The Hadoop ecosystem, however, is hell-bent on doing it all – and then some.

During this debate, when I’ve mentioned Hadoop, I’ve been referring to the Hadoop ecosystem. From what I can tell from Anne’s arguments, she’s been talking about Apache Hadoop. Again, same word, different uses.

And the Alternative is…

Throughout Anne’s argument, she points out the shortcomings of using Apache Hadoop (not the ecosystem) as a data lake. Point taken. But when I asked what organizations are supposed to do when the majority of their data (80-90%) is not sitting in pristine data structures, Anne replied, “It is not the storage and access [of Apache Hadoop] that brings the advantage. The advantage is in the insights derived from the analysis of the data.” What’s still not clear is how and where this analysis is taking place. If a Hadoop-based data lake is not the answer, then what is? 

Without Purpose is Okay

You can see Anne squirming – just like fingernails on a chalkboard – anytime someone mentions collecting and storing data without a purpose or business context. She retaliates with “There’s no value to the organization!” Au contraire, mon ami! Tell Amazon that. They haven’t thrown any data away since day 1. Do you think they knew they’d be getting a patent for anticipatory shipping – i.e., shipping your package before you buy it – when they first started out over 20 years ago?

Today, we have big data technologies, like the Hadoop ecosystem, that allow organizations to collect and store any and all data at a fraction of the cost. I fully agree with Anne that “just because you can doesn’t mean you should” –but I would also contend that just because you can’t define the purpose now doesn’t mean you shouldn’t collect and store it. Don’t be afraid to embrace the unknown unknowns in your data.

Timer: STOP! Total word count: 598


Previously in the Data Lake Debate:

  • The Introduction – by Jill Dyche
  • Pro’s Up First – by Tamara Dull
  • Questioning the Pro – by Anne Buff and Tamara Dull
  • Negative Puts a Stake in the Ground – by Anne Buff
  • Pro Cross-Examines Con – by Tamara Dull and Anne Buff


TAGGED:Data Lake Debate
Share This Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

AI for MSPs
Autotask and ConnectWise Prove the Benefits of AI in IT
Artificial Intelligence Exclusive
gamer laptops
Data-Driven Tips to Choose the Perfect Gamer Laptop
Best Practices Reviews
smart crosswalk
AI Reduces Pedestrian Collisions With Smart Crosswalks
Artificial Intelligence Exclusive News
ai success
How Leaders Can Unlock AI’s Full Potential for Business Success
Artificial Intelligence Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Data Lake Debate
Big DataData ManagementHadoopPolicy and Governance

The Data Lake Debate: Negative Puts a Stake in the Ground

10 Min Read
Image
Data ManagementHadoopKnowledge ManagementOpen SourceUnstructured Data

The Data Lake Debate: Pro Cross-Examines Con

7 Min Read
Image
Big DataHadoop

The Data Lake Debate: Pro is Up First

8 Min Read
Data Lake Debate
Big DataData ManagementHadoopPolicy and Governance

The Data Lake Debate: Questioning the Pro

8 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data
ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-24 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?