IT technology analysts, Gartner, have weighed in on data lakes, cautioning users against being swallowed up by hype.
In a view that some may consider a splash of cold water in their face, Gartner has a somewhat sobering and down to earth view of data lakes. It has ‘credited’ growing hype surrounding data lakes with causing “substantial confusion” in the IT sector. Furthermore, it noted that several, but unnamed, vendors are marketing data lakes as an essential component to capitalise on Big Data opportunities. The caveat is that there is apparently little alignment between vendors about what comprises a data lake, or how to get value from it.
“In broad terms, data lakes are marketed as enterprise- wide data management platforms for analysing disparate sources of data in its native format,” clarified Nick Heudecker, research director at Gartner. “The idea is simple: instead of placing data in a purpose-built data store, you move it into a data lake in its original format. Once data is placed into the lake, it’s available for analysis by everyone in the organisation,” he continued.
However, Gartner pointed out that, while the marketing hype suggests audiences throughout an organisation will leverage data lakes, this assumes that all those audiences are highly skilled at data manipulation and analysis. “The need for increased agility and accessibility for data analysis is the primary driver for data lakes,” said Andrew White, vice president and analyst at Gartner. “Nevertheless, while it is certainly true that data lakes can provide value to various parts of the organisation, the proposition of enterprise-wide data management has yet to be realised.”
The data lake concept hopes to solve two problems, the first of which deals with information silos. Rather than having dozens of independently managed collections of data, users could combine these sources into the unmanaged data lake. Secondly, data lakes conceptually tackle a problem pertaining to Big Data initiatives, which typically require a large amount of varied information. Unfortunately, this information is so varied that it’s not clear what it is when it is received.
Data lake or data swamp?
White added that addressing both of these issues with a data lake certainly benefits IT in the short term, in that IT no longer has to spend time understanding how information is used. However, he stressed, getting value out of the data remains the responsibility of the business end user. White also warned that without at least some semblance of information governance, the lake will end up being a collection of disconnected data pools or information silos all in one place. Gartner warned that data lakes therefore carry substantial risks; the most significant of which is the inability to determine data quality or the lineage of findings by other users that have found value, previously. Another risk is security and access control, in which data could be placed into the data lake with no oversight of the contents.
“The question your organisation has to address is this — do we allow or even encourage one-off, independent analysis of information in silos or a data lake, bringing said data together, or do we formalise that effort, and try to sustain the value-generating skills we develop?” said White. “If the former, it is quite likely that a data lake will appeal, otherwise it is beneficial to quickly move beyond a data lake concept quickly,” he concluded.