In barely 10 years, the widespread use of the internet has brought about the compilation of a phenomenal amount of data in very varied sectors. The generation of data from internet activity itself and, more recently, the growing enthusiasm for connected objects adds to this emergence. The vast amount of information associated with storage and processing capacities proposed by the different Cloud-Computing infrastructures, forms what we now call Big Data.
Big Data and Geographical Information
The world of geographical information has not been sidelined in this movement, since approximately 80% of the data generated contains georeferenced information. Geographical information also covers the concept of geolocation, which is natively incorporated into a multitude of connected objects such as smartphones, sensors or other components of any type. The increasingly popular OpenData approach and CrowdSourcing projects such as OpenStreetMaps adds to this.
These collaborative projects generate sources of spatial data that are extremely diverse and offer very varied databases. This not only provides a multitude of spatial objects, but also provides a structure that is conducive to the introduction of innovative analysis mechanisms. Recently the satellite data of the SENTINEL satellite constellation, proposed by the European programme Copernicus has been provided free of charge. This data provides a new dimension to this problem, since it will, in time, be possible to use databases of radar, optical and multi-dimensional images that are several petabytes in size.
Geomatys and Big Data
Geomatys has been aware of the stakes of geospatial Big Data for a long time. We have been carrying out diverse Earth Watch development activity for several years. This activity has taught us how to manage massive data streams. Since few years, our teams have been working on upgrading the Constellation-SDI platform in order to produce a product that can meet the challenge of supporting large volumes of data and also adapt to the ever-growing calculation demand. This is the approach that birthed the EXAMIND project, a flexible spatial data infrastructure, able to analyse very large volumes of data.
A Flexible and interoperable infrastructure
EXAMIND aims to offer a platform that best uses the cloud-computing infrastructures of the different entities involved in the field. From the start, the project has been structured around the concept of containers and uses the Docker project to offer a modular solution that is simple to manage, whatever the particularities of each Cloud solution. The architecture of the EXAMIND project has been designed to use new storage standards such as object storage warehouses like Amazon S3 or to adopt multiple tenant behaviour. The high level of resilience of the infrastructure must be guaranteed, along with the capacity to support significant load increases.
Using Information Streams in Real Time
EXAMIND allows streams of information to be processed in real time. The platform uses the OGC Sensor Web Enablement standard and the O&M model to standardise any type of sensor data, including data from connected objects. The data from heterogeneous sources may be cross-checked and sent in real time. EXAMIND shall thus provide an analysis capacity that is particularly dynamic.
Capacity to Perform Heterogeneous Processing
Given the volume of data offered by the Big Data movement, the implementation of distributed processing systems becomes problematic. This would actually require a transfer of data streams that is not compatible with current communication network capacity.
It is in this context that it becomes pertinent to propose that the processing be provided closer to the data sources. In order to allow heterogeneous processing technologies to be used, EXAMIND offers a containerisation infrastructure for its processing, along with an environment that allows the implementation of the different containers to be managed, so as to develop remote processing chains.
An Engine for Big Data Analytics
Geomatys has developed Big Data Analytics engine within the EXAMIND project. This integrates transparently with all the OGC services proposed and integrates distributed data processing and analysis technologies. The technical and architectural choices made have rendered the project perfectly compatible with the Cloud-Computing infrastructures currently on the market.