benchmark-4.0
Not logged in

Benchmark (aka "comparing apples and oranges")

Back to main page

Introduction

Today I've casually read this blog page; many thanks to my dear friend Markus Neteler (the maintainer of GRASS GIS) for pointing my attention on this topic.
The original Clipping Contest simply compared several popular Desktop GIS (ArcGis, QGIS, gvSIG, GRASS and others), but there was an obvious omission.
No Spatial DBMS was considered at all. (too bad; a really embarrassing omission)

After all a Clipping Contest sounds more like a pure data-processing problem than like a typical GIS/GUI problem, so it's probably useful introducing some Spatial DBMS into this Contest. This was my first thought while reading the benchmark report, so I duly started preparing my own testbed based on SpatiaLite (BTW I'm going to release the new version 4.0.0 in the very next days, so after all performing some further test is surely welcome).

Test configuration


I tested both Windows and Debian on the same hardware. The measured timings are rather similar; anyway Linux performed slightly better than the native Windows system (more or less, by a 10% - 20% factor). This is not at all surprising for me, because this trend seems to be constantly verified each time I manage some comparative benchmark. Very often Linux is a best performer than Windows, even when running as a Virtual Machine on the top of a physical Windows system. Quite paradoxical, but indisputably true and objectively verified many times.

The sample dataset

I've simply downloaded the same identical sample dataset already used for all other tests reported in the Clipping Contest.
Very shortly said: it's a really huge set of Contour Lines (the uncompressed Shapefile is about 1,3 GB)

Fig. 1

This one is the original set of Contour Lines to be clipped.
original dataset

Fig. 2

This is the final result of the clipping operation.
  • the Study Area boundary (polygon) is represented in red.
  • the Contour Lines are represented in dark blue.
  • the azure rectangle marks the detail area shown in the next figure.
clipped

Fig. 3

This is a magnified detail of the Study Area and Clipped Contour Lines.

As you can easily notice there are lots of Contour Lines, and each Line has many vertexes.

This one is a rather complex problem to be resolved: and it surely represents a thorough and very demanding test.
load_extension

The Spatial SQL approach

This is a pure data-processing approach; we'll use the most classical Spatial Is Not Special methodology, i.e. we'll go performing the Clipping Contest exactly as if it was a trivial ordinary SQL query performed on behalf of plain generic data. We'll not use at all any GIS-specific frill, and we'll simply use a couple of Spatial SQL functions where strictly required. (BTW it's a good opportunity to introduce few basic concepts about Spatial SQL).
Let's go ...

Preparing the SQL script

--
-- SQL script performing the Clipping Contest
-- Target DBMS: SQLite with SpatiaLite extension
--

SELECT DateTime('now'), 'Loading the StudyArea SHP';
--
-- loading the StudyArea Shapefile into the DB
--
.loadshp StudyArea1MileBuffer StudyArea1MileBuffer CP1252 2260 Geometry Id POLYGON 2d no no_spatial_index

SELECT DateTime('now'), 'Loading the Contours SHP';
--
-- loading the Contour Shapefile into the DB
--
.loadshp Contours20Ft Contours20Ft CP1252 2260 Geometry Id LINESTRING 2d no with_spatial_index

SELECT DateTime('now'), 'Clipping ... please wait';
--
-- creating a new table containing the Clipped Contours
--
CREATE TABLE Contours20FtClipped AS
SELECT c.Id AS Id, c.OBJECTID AS OBJECTID, 
  c.Contour AS Contour, c.Shape_Leng AS Shape_Leng,
  CastToMultiLinestring(ST_Intersection(
      c.Geometry, a.Geometry)) AS Geometry
FROM Contours20Ft AS c, StudyArea1MileBuffer AS a
WHERE c.ROWID IN (
    SELECT ROWID 
    FROM SpatialIndex
    WHERE f_table_name = 'Contours20Ft' 
        AND search_frame = a.Geometry
);

SELECT DateTime('now'), 'deleting NULL Geometries';
--
-- clean up
-- deleting any NULL Geometry from Clipped Contours
--
DELETE FROM Contours20FtClipped 
WHERE Geometry IS NULL;

SELECT DateTime('now'), 'recovering a genuine Geometry column';
--
-- recovering a genuine Geometry Column (Clipped Contours)
---
SELECT RecoverGeometryColumn('Contours20FtClipped', 'Geometry', 2260, 'MULTILINESTRING', 2);

SELECT DateTime('now'), 'creating the Spatial Index';
--
-- creating a Spatial (Index supporting Clipped Contours)
--
SELECT CreateSpatialIndex('Contours20FtClipped', 'Geometry');


SELECT DateTime('now'), 'all done ... quitting';
--
-- end job
--
Don't be fouled; complexity is more apparent than real. The following is a quick explanation of the step by step process defined by the above SQL script.
.loadshp parameters

Step #1

simply a dot macro command understood by the SpatiaLite front-end tool, allowing to directly import an external Shapefile into a corresponding Spatial Table within the DB.
CREATE TABLE Contours20FtClipped AS
SELECT c.Id AS Id, c.OBJECTID AS OBJECTID, 
  c.Contour AS Contour, c.Shape_Leng AS Shape_Leng,
  CastToMultiLinestring(ST_Intersection(
      c.Geometry, a.Geometry)) AS Geometry
FROM Contours20Ft AS c, StudyArea1MileBuffer AS a
WHERE c.ROWID IN (
    SELECT ROWID 
    FROM SpatialIndex
    WHERE f_table_name = 'Contours20Ft' 
        AND search_frame = a.Geometry
);

Step #2

this single SQL query performs all the actual Clipping task.
ST_Intersection() is a Spatial SQL function computing the intersection between the Study Area (polygon) and each Contour Line (linestring). The returned result could be:
  • a Linestring or a MultiLinestring (depending on the specific shape of both geometries)
  • a NULL (i.e. absolutely nothing) if the Contour Line and the Study Area do not intersect at all.
  • CastToMultiLinestring() simply is a cast operator ensuring that all returned geometries will be of the same type.
  • the inner sub-query SELECT ROWID FROM SpatialIndex ... is simply intended to access the Spatial Index supporting the Contour Lines, so to speed up the whole query.
  • CREATE TABLE Contours20FtClipped AS simply intends that the resultset returned by this SQL query has to be permanently saved into yet another table (to be created on the fly).

DELETE FROM Contours20FtClipped 
WHERE Geometry IS NULL;

Step #3

just a stupid post-processing step; the previous query has surely generated many NULL geometries, and we'll now get rid of all them.
SELECT RecoverGeometryColumn('Contours20FtClipped', 'Geometry', 
    2260, 'MULTILINESTRING', 2);
SELECT CreateSpatialIndex('Contours20FtClipped', 'Geometry');

Step #4

the very final post-processing step: transforming the table containing all Clipped Contour Lines into a genuine Spatial Table supported by its own Spatial Index.
SELECT DateTime('now'), 'some message';

measuring the time

this simply is a weird SQL trick; we'll query the system clock immediately before and after performing each step, so to get full trace of the corresponding timings.

Bringing things to a head: running the test

# spatialite my_test.sqlite < test.sql

SpatiaLite version ..: 4.0.0-RC2	Supported Extensions:
	- 'VirtualShape'	[direct Shapefile access]
	- 'VirtualDbf'		[direct DBF access]
	- 'VirtualXL'		[direct XLS access]
	- 'VirtualText'		[direct CSV/TXT access]
	- 'VirtualNetwork'	[Dijkstra shortest path]
	- 'RTree'		[Spatial Index - R*Tree]
	- 'MbrCache'		[Spatial Index - MBR cache]
	- 'VirtualSpatialIndex'	[R*Tree metahandler]
	- 'VirtualFDO'		[FDO-OGR interoperability]
	- 'SpatiaLite'		[Spatial SQL - OGC]
PROJ.4 version ......: Rel. 4.7.1, 23 September 2009
GEOS version ........: 3.2.0-CAPI-1.6.0

2012-11-10 00:03:52|Loading the StudyArea SHP
2012-11-10 00:03:52|Loading the Contours SHP
2012-11-10 00:04:54|Clipping ... please wait
2012-11-10 00:07:31|deleting NULL Geometries
2012-11-10 00:07:33|recovering a genuine Geometry column
2012-11-10 00:07:34|creating the Spatial Index
2012-11-10 00:07:36|all done ... quitting
#

Final considerations


Please note: this Clipping Contest really is like comparing apples and oranges.

What are we measuring, really ?



Back to main page