Big Data and us

Big Data

In just under two decades, the widespread adoption of web-based operation (commerce, health, banking, government, etc.) has spawned a tremendous amount of data in a wide variety of industries. Added to this emergence is the generation of data from the very activity on those websites, and the more recent and growing craze for connected objects. This mass of information, associated with the storage and processing capacities offered by the various Cloud-Computing infrastructures, represent what we now call Big Data.

Big Data and Geographic Information

The world of Geographic Information technology has been no stranger to this movement. Around 80% of all data generated contains geo-refrenced information. Geographic information also extends to the concept of geolocation, natively incorporated into a large number of connected objects such as smartphones, sensors or other components. Added to this is an increasingly popular OpenData approach and CrowdSourcing projects such as OpenStreetMaps. 

Le big data à geomatys

These collaborative projects generate extremely varied sources of spatial data, offering extremely diverse databases. This not only brings great wealth in the provision of spatial objects, but also a structure conducive to the establishment of innovative analysis mechanisms. Today, satellite data from the constellation of SENTINEL satellites produced by the European Copernicus program is made available free of charge. Such earth-observation focused satellites, both public and proprietary, are launched frequently. These data bring a new dimension to this subject. It is now possible to exploit databases of radar, optical and multidimensional images of several petabytes covering observations across the globe on an increasingly regular basis. 

Geomatys and Big Data

Geomatys has long been sensitive to both the challenges posed by, and opportunities possible from, the exploitation of geospatial Big Data. For several years, we have been leading various development actions around the theme of Earth Observation. These actions have familiarized us with the management of massive data flows. For over a decade, our teams have been developing the EXAMIND platform, a product capable of meeting the challenges of handling large volumes of geo-referenced data and adaptative to ever- increasing demands in terms of calculation. It is from this approach that the EXAMIND project was born. An elastic spatial data infrastructure capable of processing, combining, analyzing and visualizing very large volumes of data and disseminating results in operationally useful ways.

An elastic and interoperable infrastructure

EXAMIND offers a platform making best use of the cloud-computing infrastructures already widely implemented in the field. The project is structured around the concept of containers. It uses the Docker project to propose modular offerings, independent of the specifics of each Cloud solution, and easy to administer. The architecture of the EXAMIND project has been designed to exploit new storage standards such as object storage warehouses like Amazon S3, or to adopt a multi-tenant behavior. This capability is necessary to ensure maximum resilience of the infrastructure, as well as the ability to respond to heavy data loads.
Big data et information géographique

Exploitation of real-time data streams

EXAMIND allows for the treatment of real-time data streams. The platform uses the OGC Sensor Web Enablement standard and the O&M model in order to harmonize all types of sensor data, including those from connected objects. Data from heterogeneous sources can be crossed and fed back in real time. EXAMIND thus provides particularly dynamic analysis capacity.

Ability to orchestrate heterogeneous data processing

Given the volume of data arising from the Big Data movement, it becomes problematic to consider the creation of distributed processing systems. Indeed, this would require a transfer of information flows incompatible with the capacities of current communication networks. 

In this context, it becomes relevant to move processing as close as possible to data sources. In order to allow the use of heterogeneous processing technologies, EXAMIND offers an infrastructure for containerization of data processing and an environment allowing the orchestrated execution of the different containers, in order to develop remote processing chains. 

A motor for Big Data Analytics

Geomatys has developed a Big Data Analytics engine as part of the EXAMIND project. It integrates seamlessly with all OGC services offered. It also integrates distributed data analysis and processing technologies. The technological and architectural choices made provide the project with perfect compatibility with the Cloud-Computing infrastructures currently available on the market. 

Moteur big data analytics de Geomatys