Big Data comes with a big problem: Big Storage. Choosing how and where to store the massive amounts of information needed for large-scale analytics is a technical and economic question with many possible answers.
Nowadays, organizations aren’t restricted to traditional data warehouses. Their options include data lakes, and more recently, data lakehouses. Here’s how those work.
What is a data lake?
The term “data lake” was first coined in 2010, according to Dataversity, by James Dixon, founder and former CTO of business intelligence firm Pentaho. The intent of a data lake is to store a quantity of Big Data so voluminous it couldn’t easily be organized or navigated via SQL tools. Since the value of the information is predominantly derived from its sheer quantity, it made economic sense to come up with a different architecture which required fewer relative resources to run.
“In the early days, when you had on-prem data...you’re talking about having to pay for really high-performing computers with lots of storage,” Alex Merced, developer advocate at data lake analytics platform Dremio, told IT Brew. This also resulted in situations where expanding storage might have meant someone paid for unnecessary processing at the time.
Keep reading here.—TM
|