Name: Bhambhri: Embracing Open Source
Uploaded: 2017-05-02T07:08:09Z
Duration: 3 min 40 s

Bhambhri: Embracing Open Source

Added on 02 May 2017

Watched 3,357 times

A lot of the structured data was managed in the databases, and of course now there is a lot of unstructured data, semi-structured data. And even in terms of structured data, there is a very large volume of data that is suddenly getting generated. So in terms of where or how this is going to be managed, since a lot of these data is very sparse, that means there is a lot of noise in this data. So the signal-to-noise ratio is very high. So it's not very practical, without really knowing what's the value in this data, to just put it in a database. Because it's expensive to do that, and sometimes, a lot of times, the model of the data is not known. So how do you even put it in a database, without really knowing what the scheme of it is?
Databases, by definition, dictate structure. So for all this unstructured, semi-structured data or data where there is no model, no schema, and the volumes are high and it's noisy, it makes sense to do a lot of pre-processing of this data on technology where you can sort of put it on commodity hardware. You can pre-process and filter the data pretty quickly, without your cost going through the roof.
So there is a lot of work that has happened here in the Open Source community. And then companies like of course IBM and others have embraced the Open Source and they are extending it, so that this data can be ingested quickly, it can be processed and filtered. And then, once these golden nuggets have been found out, then of course it makes sense to put it maybe in a database. It could be a warehouse or an operational data store, so that the rest of your downstream applications can kind of continue to work in a seamless manner. But it's just that now, they are not just working against the structured data that had been stored in the databases, but now they are working with a bigger set of data, some of which was gained by the data that came maybe from the web, like the social media, the content of which obviously all of us are creating there.
But I wouldn't say that the technology for storing it and analysing it is still coming from a lot of the database vendors. Of course, companies like Google and Yahoo came up with the technologies to crawl the web and be able to search for this information. But when we look at the enterprise usage of this data, it's still very much a play which is sort of adjacent to the databases. So it's sort of that you want to extend the data platform to beyond just the database. And like I said, there is Open Source technology, there is capabilities from vendors like IBM, we have products that help in this space as well, which are built on top of Open Source.