Proj.4 versus Apache SIS: a performance comparison

Proj.4 versus Apache SIS: a performance comparison 28/08/2017 Martin Desruisseaux In August 17th 2017, we presented an introduction to Apache Spatial Information System (SIS) in the Free and Open Source Software for Geospatial (FOSS4G) conference. Discussions about performance and accuracy were planed, but skipped because of lack of time. A first part of the results that we intended to show are below. Disclaimer: the author of this blog is an Apache SIS contributor. Summary The performance of two map projection libraries were compared: Proj.4, a library in the C language. Apache Spatial Information System (SIS), a library in the Java language. The benchmark code was written in Java. The Proj.4 functions were invoked through the Java Native Interfaces (JNI). The use of JNI introduces a bias in benchmark measurements for Proj.4, but this bias is estimated lower than one standard deviation in the time measurements. In our results, Proj.4 is faster except for the Mercator inverse projection. The Proj.4 performance advantage is explained by the cost of Java 8 trigonometric functions like Math.sin(φ) and Math.asin(y)compared to their C counterparts. Java 9 is known to be faster than Java 8 but has not been tested yet. The Mercator inverse projection exception is explained by the mathematical work done in Apache SIS, where some formulas have been rewritten in more efficient ways using mathematical equivalences; in the Mercator case those gains outweigh the handicap of slower trigonometric functions. In a few extreme cases, Apache SIS is 400 times faster than Proj.4. Those extreme cases are explained by Apache SIS capability to detect when a chain of coordinate operations can be simplified as an affine transform. In all tested map projections, Proj.4 and Apache SIS results differ by a few micrometres or less. In datum shift tests, Proj.4 and Apache SIS are sometime in agreement and sometime apart by one or two meters. Those differences are explained by the way the two libraries use the EPSG geodetic dataset. More details are given in the discussion after “Material and method” section. Material and method The benchmark compared Proj.4 release 4.9.3 (August 2016) with Apache SIS 0.8-jdk8-SNAPSHOT (August 2017). The environment is Java 1.8.0_144-b01 on MacOS 10.12.6. The tests use the GeoAPI interfaces. GeoAPI 3.0 is an OGC standard which allows to write code without knowledge of the underlying implementation. It is similar in this respect to Java DataBase Connectivity (JDBC) interfaces. This approach allows us to write benchmark or test codes only once, then execute it on arbitrary GeoAPI implementations. Apache SIS is one such implementations. The sis-gdal module provides another implementation as wrappers around the Proj.4 library (another variant is available in geoapi-proj4 module provided by the GeoAPI project). Coordinate Reference System (CRS) instantiations All Coordinate Reference System (CRS) objects are created from an EPSG code through the GeoAPI CRSAuthorityFactory interface. The Apache SIS CRS class provides convenience static methods which delegate to GeoAPI implementations for performing the real work. Coordinate Reference Systems backed by Proj.4 are created as below. Note that despite the “epsg” part, this is considered a Proj4 definition (indicated by the “Proj4::” prefix) rather than an EPSG definition. Those definitions differ in axis order and sometime in units of measurement. CoordinateReferenceSystem crs = CRS.forCode(“Proj4::+init=epsg:4326”); Coordinate Reference Systems backed by Apache SIS are created as below. The first line creates a CRS as defined by EPSG. The second line modifies the CRS for the same axis order than Proj.4 (note that the CRS intentionally lost its “EPSG::4326” identifier in this process). CoordinateReferenceSystem crs = CRS.forCode(“EPSG::4326”); crs = AbstractCRS.castOrCopy(crs).forConvention(AxesConvention.RIGHT_HANDED); Note: alternatively, we could have created the Proj.4 CRS with “Proj4::+init=epsg:4326 +axis=neu”definition string. But this approach forces us to maintain a list of axis orientations for all supported EPSG codes. This is the approach implemented by GeoAPI wrappers for Proj.4, as a way to get an EPSG factory closer to authoritative definitions. Apache SIS takes a different approach where EPSG codes are handled by Apache SIS itself and the codes provided by Proj.4 are considered to be in a different, non-EPSG, name space. Coordinate Operation instantiations Many coordinate operations may exist for the same pair of source and target CRS. For example the EPSG geodetic dataset contains about 85 operations from NAD27 to WGS84, each operation using a different set of parameters for different state or geographic area. For making a choice, the Apache SIS CRS class provides another convenience method: CoordinateOperation op = CRS.findOperation(sourceCRS, targetCRS, areaOfInterest); The areaOfInterest option is set to null in this test, which default to the widest domain of validity. Note that in the case of NAD27 to WGS84 transformations, the “widest domain of validity” criterion causes Apache SIS to select a transformation for Canada, not for USA. In order to avoid ambiguity, “NAD27 to WGS84” transformations are not compared in this benchmark. The issue of Proj.4 and Apache SIS selecting different transformations will be discussed in more details in another blog post, to be named “Proj.4 versus Apache SIS: an accuracy comparison”. For the performance comparison discussed in this post, suffice to say that we verified that the benchmark compares the same operation methods. All coordinate conversions or transformations are executed through the GeoAPI MathTransforminterface. In the Proj.4 wrapper case, MathTransform delegates to Proj.4 pj_transformfunction. All those functions can work on arbitrary number of coordinates. Our benchmark uses this capability for applying coordinate operations on groups of 65536 coordinates. In Proj.4 case, this means one single JNI call and one single pj_transform call per group of 65536 coordinates. The cost of those calls has been estimated by performing an identity transform (where source and destination CRS are the same) on 65536 random coordinates with Proj.4 and assuming that all the execution time were due to JNI overhead. We measured 0.2 milliseconds, which is about 0.6% of the average execution time of other coordinate operations tested in this benchmark. Conversions or transformations have been tested between the following pairs of CRS. Those pairs have been selected in order to use different operation methods: Cylindrical Equal Area