Every day more than 2.5 quintillion (2.5 x 1018 ) bytes of data are created, coming from business and bank transactions, posts on social media sites, digital photos, videos, and other sensors as GPS signal and more. Big Data is the name of the mass of unstructured data available nowadays on Internet.
All this large amount of data is a big resource, and many of these sets of data are available to everybody. Some companies are exploiting it already, you may have guessed that Google looks at the subjects you are interested in, and presents you with ads related to that content. Facebook for example looks at the friends of your friends to suggest you new contacts. Other examples are less obvious, but plenty of business good sense, like an airplane company improving the pilot’s ETA of a flight using, among others, weather and aerial traffic information. The new ETA is more accurate, and allows to reduce idle time at airports. The McKinsey Global Institute calls Big Data ‘the next frontier for innovation, competition and productivity’.
The European Commission believes that ‘data is the new gold’. To boost the economy they have created the Open Data Initiative that aims at opening up Public Sector Information. As they put it:
Public sector information (PSI) is the single largest source of information in Europe. It is produced and collected by public bodies and includes digital maps, meteorological, legal, traffic, financial, economic and other data. Most of this raw data could be re-used or integrated into new products and services, which we use on a daily basis, such as car navigation systems, weather forecasts, financial and insurance services.
Re-use of public sector information means using it in new ways by adding value to it, combining information from different sources, making mash-ups and new applications, both for commercial and non-commercial purposes. Public sector information has great economic potential. [..] Increase in the re-use of PSI generates new businesses and jobs and provides consumers with more choice and more value for money.
And they are not the only ones, the UN has also it own open data initiative, so it’s time to let your imagination fly and ask yourself what information could help your business, as unimaginable as it could have been to count with it before. Managers could now make decisions based on real data analysis. There are many sectors where you can generate financial value from Big Data, the MacKinsey Global Institute points out among them health care, the public sector administration, global personal location data, retail and manufacturing.
From the technological perspective, exploiting Big Data is a great challenge. All these data come from different sources, are stored on different locations, in different formats, so navigating through it is not an easy task. Up to now, companies were using their own stored data to do their business. They defined the format, created the metadata (information on how to interpret each content, what meant each bit of information), used consistently throughout the company. For this kind of data (called ‘structured data’) there are a number of proven techniques that allow manipulating the data usually stored in ‘databases’ or ‘data-warehouses’ and giving answers for the business management.
But when it comes to unstructured data, it’s really another business. And not only there is a challenge as we mentioned earlier on navigating through data from different locations, changing from one format to another, but also dealing with the huge volume of data: think of the quantity of bytes that have to be analysed! Also, to be worth the effort, it has to be done on time. That is giving the answer to a question when it still matters (in some cases it can be days or hours, in others like for a car guidance program, it is measured in seconds). This is really hard, and classical programs don’t stand to the challenge. There are new algorithms being created, different initiatives under construction, that are fighting to gain movement and become standards. For me, this trend is worth following. If you are interested, check Roberto Zicari’s presentation, from ODBMS.org