Big Data is such a huge change for businesses that it can easily seem overwhelming. The BigDataEurope project meets interested companies half way by providing an integrated stack of tools to manipulate, publish and use large-scale data resources.
Looking at the very long list of projects funded under Horizon 2020 and the large spectrum of topics being covered, it would be easy to forget that the EU's biggest research and innovation programme to date is all about addressing seven major societal concerns: health and wellbeing; food, agriculture and the bioeconomy; energy; transport; climate change; freedom and security; and the place of Europe in a changing world.
What is even easier to forget is the fact that these seemingly very different topics and the related sectors of activities all share at least one common trait: how they could benefit from digital innovation, and more specifically from Big Data.
To ensure that they do, the BigDataEurope (Integrating Big Data, Software and Communities for Addressing Europe's Societal Challenges) project created seven communities and tried to better understand what they would need from Big Data. The result is a platform able to ingest data from a variety of sources, which can be tailored to target innovative applications across the seven H2020 challenges.
What gaps did you aim to fill with this project and how is this important?
It is widely acknowledged that the analysis of large amounts of data (Big Data) profoundly influences our economy and society as a whole. However, it is important that the corresponding technologies are not just available to a small circle of companies, but can also be widely used by smaller enterprises and initiatives as well as in research and academia.
BigDataEurope filled this gap first by providing a platform for realising Big Data applications, and then by discussing requirements and pilot applications with communities representing the societal challenges identified by the H2020 framework programme.
What makes your approach innovative?
Numerous events organised with stakeholder groups made us realise that in addition to volume and velocity, the variety of data is a key aspect to be dealt with in societal applications.
To address this requirement, we devised and produced a semantic data description layer for Big Data. This layer uses vocabulary and knowledge graphs, and allows communities to develop a common understanding of their data while at the same time interlinking and integrating this data on a technical level.
What were the main difficulties you faced in bringing all these stakeholder groups and data sources together, and how did you overcome them?
A key challenge lay in the different terminologies, cultures and practices found in stakeholder groups, which also resulted in very different requirements and viewpoints. Whilst, for example, open data already plays a key role in mobility applications, data security, privacy and anonymisation are of paramount importance in healthcare scenarios.
We addressed this challenge by avoiding developing a one-size-fits-all platform, instead integrating components that fulfil a very specific purpose such as the processing of streaming data or anonymisation. As a result, the user can combine and integrate the most suitable data management components for any concrete application scenario of the BigDataEurope platform.
What are the advantages of integrating all this data? Can you provide a real-life example?
The project produced seven demonstrators showcasing the value of integrated data for the different societal challenges. These included for example the forecasting of road traffic and congestion based on historic and current sensor data in combination with information from social networks.
Another example is precision farming aiming to provide plants such as grapevines with optimal nutrition, fertilisation and irrigation based on climate and research data.
Did the project results meet your initial expectations? How so?
Overall the need to deal with data variety was something we expected and was confirmed through stakeholder and community meetings. Thanks to the semantic integration approach followed by the platform, we managed to take a step forward, but we are still slightly away from the vision of seamlessly integrating and analysing large amounts of aggregated data with minimal effort. Besides, the consideration of data privacy and sovereignty of data providers will require more attention in the future.
How can interested stakeholders start using your platform?
The platform, the documentation and pilots implementations are fully open source and available for reuse. Also, there are a number of the BigDataEurope consortium partners (including for example Fraunhofer) who are ready to provide assistance and support.
What are your follow-up plans?
Consortium members are pursuing research on the topic of Big Data management in their own domains. For example, there are already several recently-started H2020 projects that continue to maintain parts of the BigDataEurope platform and deepen its application in the healthcare and life-science domains.
Explore further: Blue Brain Nexus: An open-source knowledge graph for data-driven science