SpatiaLite: Ground Control Points

Ground Control Points

Starting since version 4.3.0 SpatiaLite supports an optional GCP module; the GCP acronym simply stands for Ground Control Points.
Understanding the intended role of GCP is very intuitive, anyway the underlying mathematic and statistic operations are really complex and apparently intimidating.
In this short presentation we'll make any possible effort so to keep the discussion at the more elementary level.

Introduction: defining the problem

Imagine two different maps obviously representing the same identical geographic area, as the two shown on the side.
Now suppose that only in one case the Coordinate Reference System is well known and fully qualified; however (for any good reason) the other map is completely lacking any useful information about its own CRS.

You could be legitimately interested in merging together both maps, anyway this operation seems to be completely forbidden simply due to the complete lack of critical informations about one of the two maps.
In this specific case adopting the usual coordinate transformations based on PROJ.4 and ST_Transform() is completely out of discussion.

Just to say, this is the usual condition of many potentially interesting historical maps; but in many others cases too it frequently happens to encounter some odd map based on a completely undocumented CRS.
Other times it could be a well known CRS unhappily unsupported by the standard EPSG geodetic definitions simply because it's based on some extravagant or not very common projection system completely unfit to match the standard EPSG expectations
(an infamous example surely familiar to many Italian readers is represented by the Cassini-Soldner projection widely adopted by the Italian National Cadastre).

Well, in all these cases the GCP approach can successfully resolve all your problems in a rather simple way.
Let's see how it practically works:

you are required first to identify several pairs of closely matching points on both maps.
the accuracy and precision of your measures will have a strong influence on the final result, so you must be very careful during this step.
accurately choosing appropriate terrain features allowing a precise and non ambiguous identification is equally important: good candidates are e.g. bell towers, chimneys, relevant buildings (church, castle, palace, railways station), mountain pinnacles, ponds, small water bodies and alike, but it obviously widely depends on the relative scale and epoch of both maps.
identifying at least three GCP pairs is the bare minimum required to attempt resolving a 2D problem, and at least four GCP pairs are required in the 3D case.
defining a bigger number of GCP pairs usually helps a lot in order to achieve a better precision, because in the case of an overdetermined system several statistic methods can be usefully applied so as to smooth and correct single measure errors.
any uneven distribution of the GCPs should always be carefully avoided, because it could easily introduce noticeable distortions; as far as possible all GCPs should be wisely placed so to regularly cover the map as a whole and possibly avoiding any directional preference.
Placing many precisely measured points aligned along the same line is an inherently useless waste of time.
splitting a big problem into many smaller problems usually helps: achieving a good precision is easier when separately processing small-sized samples just covering a precisely selected area of reasonable extension, because this way each single sub-zone can benefit from using its own precisely calibrated transformation.

You can now download the sample dataset we'll use in the following practical examples.

in this example the fully qualified map (shown in yellow on the bottom side figure) consists of two Shapefiles:
- tuscany: administrative boundaries of Tuscan Local Councils (MultiPolygons 2D).
  SRID=4326, charset=CP1252
- geonames_ppl: Tuscan Populated Places (Points 2D).
  SRID=4326, charset=CP1252
the mysterious unqualified map (shown in gray on the top side figure) also consists of two Shapefiles:
- unknown: another dataset of different origin representing the administrative boundaries of Tuscan Local Councils (MultiPolygons 2D).
  SRID=-1, charset=CP1252
- unknown_ppl: a second dataset representing Tuscan Populated Places fully consistent with the unknown Admin Boundaries (Points 2D).
  SRID=-1, charset=CP1252

SELECT org.name, org.geometry, dst.geometry
FROM unknown_ppl AS org, geonames_ppl AS dst
WHERE Upper(org.name) = Upper(dst.name);

As you can easily check by executing the above SQL query, the Populated Places could be effectively used so to automatically create a set of 269 GCP pairs.
We cannot probably expect an hi-quality result starting from a raw collection of unchecked data, but that's certainly enough to immediately start a first test session. Let's go on.

Just for the sake of curiosity:

tuscany actually corresponds to the Admin Boundaries available from the 2011 ISTAT national census (released under CC-BY license) and simply reprojected into SRID 4326.
geonames_ppl simply is a selection from the GeoNames dataset (release under CC-BY license).
unknown and unknown_ppl are the official Admin Boundaries released under CC-BY by Tuscany Region and their mysterious Reference System simply is 3003 Monte Mario / Italy zone 1 aka Gauss-Boaga - West fuse.

Anyway this one is a didactic example, so nicely forget all these informations for now and continue to ignore anything about the mystery dataset.

First attempt and SQL familiarization

all SQL functions depending on the GCP module start with a GCP_ common prefix. the GCP_Compute() SQL function do actually resolves a GCP problem then returning a BLOB-encoded object intended to be passed to some other GCP SQL function for further processing. This function will return NULL on invalid arguments or when any error is encountered (typically: too few GCP pairs were passed, or the GCP pairs are of a so poor quality that finding any valid solution was impossible). Note well: GCP_Compute() is an aggregate function, and consequently will take in account all GCP pairs coming from the input resultset. the first argument corresponds to the origin Geometry, i.e. to the unknown component. the second argument corresponds to the destination Geometry, i.e. to the fully qualified component. the third (optional) argument can assume the following values: 1 (default) the GCP problem will be solved by applying the RMSE regression method and will return a set of polynomial coefficients of the 1st order. 2 the GCP problem will be solved by applying the RMSE regression method and will return a set of polynomial coefficients of the 2nd order. 3 the GCP problem will be solved by applying the RMSE regression method and will return a set of polynomial coefficients of the 3rd order. 0 the GCP problem will be solved by applying the Thin Plate Spline (aka TPS) method. the GCP_Transform() SQL function is strictly similar to ATM_Transform(), and will directly recompute all coordinate values accordingly to the coefficients found in the BLOB-encoded object created by GCP_Compute().	CREATE TABLE tuscany_1st AS SELECT u.name, GCP_Transform(u.geometry, g.matrix, 4326) AS geometry FROM unknown AS u, (SELECT GCP_Compute(org.geometry, dst.geometry) AS matrix FROM unknown_ppl AS org, geonames_ppl AS dst WHERE Upper(org.name) = Upper(dst.name)) AS g; SELECT RecoverGeometryColumn('tuscany_1st', 'geometry', 4326, 'MULTIPOLYGON', 'XY');
	CREATE TABLE tuscany_2nd AS SELECT u.name, GCP_Transform(u.geometry, g.matrix, 4326) AS geometry FROM unknown AS u, (SELECT GCP_Compute(org.geometry, dst.geometry, 2) AS matrix FROM unknown_ppl AS org, geonames_ppl AS dst WHERE Upper(org.name) = Upper(dst.name)) AS g; SELECT RecoverGeometryColumn('tuscany_2nd', 'geometry', 4326, 'MULTIPOLYGON', 'XY');
	CREATE TABLE tuscany_3rd AS SELECT u.name, GCP_Transform(u.geometry, g.matrix, 4326) AS geometry FROM unknown AS u, (SELECT GCP_Compute(org.geometry, dst.geometry, 3) AS matrix FROM unknown_ppl AS org, geonames_ppl AS dst WHERE Upper(org.name) = Upper(dst.name)) AS g; SELECT RecoverGeometryColumn('tuscany_3rd', 'geometry', 4326, 'MULTIPOLYGON', 'XY');
results assessment The side figure is a visual representation of the above SQL queries: the yellow background corresponds to the original tuscany map (i.e. to the reference target). the results of the 1st order transformation are shown in red. the results of the 2nd order transformation are shown in blue. the results of the 3rd order transformation are shown in green. As you can easily notice, this first results are surely encouraging (they directly confirm that we are proceeding in the right direction), anyway they certainly suffer from many obvious problems. We've got decently accurate and fairly self-consistent transformations near to the center of the map. Anyway there is an obvious degrading accuracy and a strong dispersion in the most peripheral areas of the map. Short conclusion: our candidate GCP pairs are of very poor quality, and fail to produce hi-quality results. Honestly we were expecting something like this, once considered that we've lazily used GCP pairs of unchecked quality simply based on matching Populated Places coming from two unrelated sources. It wasn't too much difficult imagining that a so simplistic approach was rather inadequate. Lesson learnt: GCP can surely produce better results than this, but now it's absolutely evident that accurately defining a good set of valid GCP pairs is a demanding and time consuming task. Accuracy and precision of input GCP pairs have a big impact on the final result quality.

second attempt: using selected hi-quality GCP pairs

SELECT org.name, org.geometry, dst.geometry
FROM unknown_ppl AS org, geonames_ppl AS dst
WHERE dst.selected = 1 AND Upper(org.name) = Upper(dst.name);

As you can easily check by yourself the geonames_ppl dataset includes a selected column containing BOOLEAN values; the intended scope is to allow filtering a limited set of points presenting a very strict correspondence with their mates on the other map.

CREATE TABLE tuscany_bis_3rd AS
SELECT u.name, GCP_Transform(u.geometry, g.matrix, 4326) AS geometry
FROM unknown AS u,
     (SELECT GCP_Compute(org.geometry, dst.geometry, 3) AS matrix
      FROM unknown_ppl AS org, geonames_ppl AS dst
      WHERE dst.selected = 1 AND Upper(org.name) = Upper(dst.name)) AS g;

SELECT RecoverGeometryColumn('tuscany_bis_3rd', 'geometry', 4326, 'MULTIPOLYGON', 'XY');

So you can now duly repeat the same SQL queries as above just introducing few changes so to restrict the GCP selection to this hi-quality subset.

The side figure graphically shows the position of such selected points; as you can notice they present a sub-optimal distribution, because they just cover the central portion of the map leaving many peripheral areas completely uncovered.

results assessment

We'll use the same colour schema we've already adopted in the previous test:

the yellow background corresponds to the original tuscany map (i.e. to the reference target).
the results of the 1st order transformation are shown in red.
the results of the 2nd order transformation are shown in blue.
the results of the 3rd order transformation are shown in green.
the results of the TPS transformation are shown in cyan.

As the top side figure shows we've now a noticeably increased overall quality.
Many obvious issues still affects the most peripheral areas of the map, anyway there is a significant enhancement on a wider central area.

The bottom side figure shows a comparison of 2nd order and TPS transformations (in the previous test TPS was unable to resolve the GCP problem due to the poor quality of input data).
It's interesting noting that TPS too produces very similar results to the ones produced by the 2nd order polynomial transformation.

Lesson learnt: using a limited number of accurately selected hi-quality GPCs can easily produce better results than using a huge number of poorly defined GCPs.

Quality assessment

The top side figure directly compares the tuscany and the unknown datasets; the second dataset has been reprojected into SRID=4326 via ST_Transform().
The scene corresponds to a barge pier in the harbour of Portoferraio (Elba Island).

As you can notice the two datasets are noticeably different under many aspects, and several fine details are represented in a completely different way.
This is absolutely not surprising, once considered that:

the tuscany dataset (represented as a yellowish surface with brown edges) is an excerpt of Admin Boundaries produced by ISTAT in the context of the 2011 national census.
the unknown dataset (represented as a blue line) is an excerpt of Admin Boundaries produced by Tuscany Region.
it's rather obvious that the Tuscany Region dataset is more rich of fine grained details, and is more accurate. The ISTAT dataset is more like a summarized sketchy representation and intentionally ignores many small details.
Anyway a fairly good spatial correspondence clearly exists between the two datasets.

The bottom side figure represent the same area, but in this case the blue line corresponds to the result of the 2nd order polynomial transformation.
At this representation scale it's absolutely obvious that there is a noticeable position error.

Our efforts to use a GCP approach so to make the two datasets to spatially coincide still is someway unsatisfactory. We are absolutely required to identify a better strategy producing higher quality results.

final attempt: separately processing a smaller sub-sample and wisely placing hi-quality GCP pairs

In this last exercise you are expected to download a further sample dataset; it simply contains a reduced sub-sample just covering the Elba Island, the Pianosa Island and the Montecristo Island. As shown in the side figure, we'll then proceed by using a more realistic set of control points; this time 20 GCPs have been carefully identified on both maps accordingly to the following criteria: only well marked and easily identifiable small-sized map features have been used so to reduce any error margin. Considering the specific geographic area all GCPs identify a cliff, a cape or a pier. all GCPs have been located in such a way so to regularly cover the whole map extension as far as possible. an exceeding number of GCPs has been determined so to compensate for eventual inaccurate Point placements. The names of the Shapefiles are exactly the same adopted in our fist example, so you can directly create a new DB-file by simply recycling the same SQL statements we have already used before without applying any change.
results assessment We'll use the same colour schema we've already adopted in the previous tests: the yellow background corresponds to the original tuscany map (i.e. to the reference target). the results of the 1st order transformation are shown in red. the results of the 2nd order transformation are shown in blue. the results of the 3rd order transformation are shown in green. the results of the TPS transformation are shown in cyan. As the side figure shows we've now a fairly good quality. Just in the case of the Montecristo Island (the southern one) we can notice at first glance some evident symptoms of a someway imperfect overlapping condition. This is not too much surprising, because Montecristo is not very closely related to other two islands, Adopting an even more restricted sub-sample could probably help in order to overcome these latest issues; anyway we can surely appreciate a noticeable overall improvement.
Quality assessment 1st order polynomial transformation: represented as a red line. Not really perfect, but reasonably good; some evident shift. 2nd order polynomial transformation: represented as a blue line. Almost perfect alignment, very small and practically negligible shift. 3rd order polynomial transformation: represented as a dotted line. Still decent, but obviously affected by a noticeable shift. TPS transformation: represented as a green line. Very similar to the 2nd order transformation; almost perfect alignment. Both 2nd order and TPS seems to be on par, and we finally have an hi-quality transformation affected by very reduced approximation errors. The 1st order give us a decent but clearly inferior result; the 3rd order surely is the less attractive of them all.	Portoferraio harbour - Barge piers
A quick check on different areas of the map confirms the above reported general trends. results of the 1st order transformation are shown as a pink line. the 2st order is shown as a blue line. the TPS transformation is shown as a red line. Both 2nd order and TPS transformations are roughly equivalent and produce hi-quality results. It's worth noting that both methods present a strong mutual consistency. The 1st order transformation is clearly affected by several troubles.	Portoferraio harbour - Ferry piers
	Pianosa Island - Harbour

Boring SQL functions: a formal introduction

SQL Function	Description
GCP_Transform ( BLOB Geometry , BLOB GCP-coeffs ) : BLOB Geometry	Will return a new Geometry by applying a Polynomial or TPS Transformation to the input Geometry. The output Geometry will preserve the original SRID, dimensions and type. NULL will be returned on invalid arguments.
GCP_Transform ( BLOB Geometry , BLOB GCP-coeffs , int srid ) : BLOB Geometry	Same as above, but the output Geometry will assume the explicitly set SRID.
GCP_IsValid ( BLOB GCP-coeffs ) : BOOLEAN	Will check if a BLOB do really correspond to encoded GCP-coeffs then returning TRUE or FALSE. -1 will be returned if the argument isn't of the BLOB type.
GCP_AsText ( BLOB GCP-coeffs ) : TEXT	Will return a text serialized representation of the GCP-coeffs. NULL will be returned on invalid arguments.
GCP_Compute ( BLOB GeometryA , BLOB Geometry B ) : BLOB GCP-coeffs GCP_Compute ( BLOB GeometryA , BLOB GeometryB , int order) : BLOB GCP-coeffs	Will return a BLOB-encoded set of coefficients representing the solution of a GCP problem. NULL will be returned on invalid arguments. the first two arguments are always expected to be Geometries of the POINT type. a strong dimensional consistency is always required: all input Points are required to be exclusively of the 2D type or exclusively of the 3D type. Mixing together 2D and 3D Points will be considered an illegal operation. the first argument corresponds to the origin Geometry, i.e. to the unknown component. the second argument corresponds to the destination Geometry, i.e. to the fully qualified component. the third (optional) argument can assume one of the following values: 1 (default) the GCP problem will be solved by applying the RMSE regression method and will return a set of polynomial coefficients of the 1st order. Note: all polynomial solutions can be either of the 2D or of the 3D type; the selection of the dimensional mode is implicitly determined by the nature of the input Points. 2 the GCP problem will be solved by applying the RMSE regression method and will return a set of polynomial coefficients of the 2nd order. 3 the GCP problem will be solved by applying the RMSE regression method and will return a set of polynomial coefficients of the 3rd order. 0 the GCP problem will be solved by applying the Thin Plate Spline (aka TPS) method. Note: all TPS solutions are always of the 2D type, even when 3D Points are supplied. Aggregate function
GCP2ATM ( BLOB GCP-coeffs ) : BLOB ATM-matrix	Will convert a BLOB GCP-coeffs object into a BLOB-encoded Affine Transformation Matrix. NULL will be returned on invalid arguments. This function only accepts an argument corresponding to a set of polynomial coeffs of the 1st order. Note: only a polynomial transformation of the first order is mathematically equivalent to an Affine Transformation Matrix.

Credits

The GCP module internally implemented by libspatialite fully depends on C code initially developed for GRASS GIS, and more precisely on the low-level sources implementing the v.rectify operator.
SpatiaLite itself simply implements a thin layer of SQL compatibility, but beyond the scenes all the hard mathematical and statical work is completely delegated to the routines kindly borrowed by GRASS GIS.
The advantages of such a solution are rather obvious:

avoiding to reinvent the wheel yet another time.
taking full profit from reusing already existing code coming from a well renown source of indisputable excellence.

Many thanks to Markus Neteler and to all the GRASS developers team for producing and maintaining such a valuable code base.

License considerations

The C code borrowed from GRASS GIS was initially released under the GPLv2+ license terms.
Due to the intrinsic viral nature of GPL by enabling the GCP module you'll automatically cause libspatialite as a whole to fall under the GPLv2+ license legal terms.
There is no contradiction in all this: as the SpatiaLite's own license verbatim states:

SpatiaLite is licensed under the MPL tri-license terms; you are free to choose the best-fit license between:
    the MPL 1.1
    the GPL v2.0 or any subsequent version
    the LGPL v2.1 or any subsequent version

by enabling the GCP module you'll then automatically opting for the "GPL v2.0 or any subsequent version" legal clause.

If you eventually would wish better to fully preserve the unconstrained initial tri-license you simply have to completely disable the GCP module when configuring libspatialite itself.
The default setting always corresponds to ./configure --enable-gcp=no: so you necessarily have to express an explicit consent in order to trigger the mandatory GPLv2+ license terms.

Back to main 4.3.0 Wiki page