Tutorial: how to fully identify Invalid Geometries
backDownloading the sample dataset
In this tutorial we'll use am Administrative Shapefile made available from ISTAT, the Italian Census Agency; more precisely we'll use the Local Councils (2011 census) you can download from here: Com2011_WGS84.zipLoading the sample dataset
Please use the following settings when importing the Shapefile:- SRID = 32632
- charset = CP1252
Checking for Invalid Geometries (simple)
SELECT pro_com, nome FROM Com2011_WGS84 WHERE ST_IsValid(geometry) <> 1; ------------------------------ pro_com comune ------------------------------ 72001 Acquaviva delle Fonti 72037 Rutigliano 74001 Brindisi 75031 Gallipoli 75072 Santa Cesarea Terme 72040 Sannicandro di Bari 81019 Santa Ninfa 87009 Bronte 107002 Calasetta 104012 La MaddalenaThis first query will identify several invalid geometries; anyway we still continue to ignore why they are invalid.
Checking for Invalid Geometries (using GEOS messages)
SELECT pro_com, nome, GEOS_GetLastWarningMsg(), ST_AsText(GEOS_GetCriticalPointFromMsg()) FROM Com2011_WGS84 WHERE ST_IsValid(geometry) <> 1; --------------------------------------------------------------------------------------------------------------- pro_com comune reason geometry --------------------------------------------------------------------------------------------------------------- 72001 Acquaviva delle Fonti Ring Self-intersection at or near point ... POINT(1158442.637743 4556057.858299) 72037 Rutigliano Ring Self-intersection at or near point ... POINT(1177064.506819 4571209.19257) 74001 Brindisi Ring Self-intersection at or near point ... POINT(1258539.785894 4539825.213707) 75031 Gallipoli Ring Self-intersection at or near point ... POINT(1265551.03412 4472508.244351) 75072 Santa Cesarea Terme Ring Self-intersection at or near point ... POINT(1308926.03571 4476059.989228) 72040 Sannicandro di Bari Ring Self-intersection at or near point ... POINT(1154968.428081 4561653.816523) 81019 Santa Ninfa Ring Self-intersection at or near point ... POINT(841730.1146 4190972.9645) 87009 Bronte Ring Self-intersection at or near point ... POINT(1028304.0538 4195170.1147) 107002 Calasetta Ring Self-intersection at or near point ... POINT(444979.6776 4320623.5316) 104012 La Maddalena Ring Self-intersection at or near point ... POINT(528108.5157 4571486.8074)In this second step we still continue to get the same invalid geometries exactly as before. But in this case we are finally able to fully disclose the invalidity cause: there is some self-intersection.
That's not only; we'll now be able to exactly identify where the self-intersection is. We could eventually open some Desktop GIS and may be attempt to manually apply some corrective action.
Checking for Invalid Geometries (using PostGIS-like functions)
SELECT pro_com, nome, ST_IsValidReason(geometry), ST_AsText(ST_IsValidDetail(geometry)) FROM Com2011_WGS84 WHERE ST_IsValid(geometry) <> 1; ---------------------------------------------------------------------------------------------------------------------------- pro_com comune reason geometry ---------------------------------------------------------------------------------------------------------------------------- 72001 Acquaviva delle Fonti Ring Self-intersection[1158442.63774317 4556057.85829892] POINT(1158442.637743 4556057.858299) 72037 Rutigliano Ring Self-intersection[1177064.50681879 4571209.19256991] POINT(1177064.506819 4571209.19257) 74001 Brindisi Ring Self-intersection[1258539.785894 4539825.21370734] POINT(1258539.785894 4539825.213707) 75031 Gallipoli Ring Self-intersection[1265551.03412014 4472508.24435147] POINT(1265551.03412 4472508.244351) 75072 Santa Cesarea Terme Ring Self-intersection[1308926.03570968 4476059.98922819] POINT(1308926.03571 4476059.989228) 72040 Sannicandro di Bari Ring Self-intersection[1154968.42808096 4561653.81652273] POINT(1154968.428081 4561653.816523) 81019 Santa Ninfa Ring Self-intersection[841730.1146 4190972.9645] POINT(841730.1146 4190972.9645) 87009 Bronte Ring Self-intersection[1028304.0538 4195170.1147] POINT(1028304.0538 4195170.1147) 107002 Calasetta Ring Self-intersection[444979.6776 4320623.5316] POINT(444979.6776 4320623.5316) 104012 La Maddalena Ring Self-intersection[528108.5157 4571486.8074] POINT(528108.5157 4571486.8074)We'll get more or less the same results even when using this second approach. There are slight differences in the text describing the invalidity reason, but there is no substantial difference between the two alternatives.
Subtle differences
SELECT pro_com, nome, ST_IsValid(geometry), GEOS_GetLastWarningMsg(), GEOS_GetLastErrorMsg(), GEOS_GetLastAuxErrorMsg(), ST_AsText(GEOS_GetCriticalPointFromMsg()) FROM Com2011_WGS84; -------------------------------------------------------- pro_com comune valid warning error aux Point -------------------------------------------------------- 1004 Albiano d'Ivrea 1 NULL NULL NULL NULL 1005 Alice Superiore 1 NULL NULL NULL NULL ........ SELECT pro_com, nome, ST_IsValid(geometry), ST_IsValidReason(geometry), ST_AsText(ST_IsValidDetail(geometry)) FROM Com2011_WGS84; ----------------------------------------------------- pro_com comune valid reason Point ----------------------------------------------------- 1004 Albiano d'Ivrea 1 Valid Geometry NULL 1005 Alice Superiore 1 Valid Geometry NULL ........Anyway there are few remarkable differences between the two alternatives:
- How Valid Geometries are reported:
- in this case the message-based approach will report absolutely nothing.
- but instead ST_IsValidReason() will explicitly state for validity.
- Speed / timings:
- the first query based on messages performs three times faster than the second one based on PostGIS-like functions.
And this isn't surprising, because PostGIS-like functions will repeat three times the same basic validation work: each single function will re-examine independently the input Geometry.
On the other hand messages are cost-free, because just a single pass performed by ST_IsValid() will be required: any further information will be then extracted by the received messages.
- the first query based on messages performs three times faster than the second one based on PostGIS-like functions.
back