Wiki page
[About ST_Subdivide()] by
sandro
2019-02-15 19:44:20.
D 2019-02-15T19:44:20.785
L About\sST_Subdivide()
P d7e14cd8a2a43e43aa2da8bc53be3464964753f7
U sandro
W 10871
<table cellspacing="12" width="100%">
<tr><td colspan="2">
<table width="100%" bgcolor="#f0f0f8">
<tr><td align="center">
<h1>ST_Subdivide(): a quick intro</h1>
</td></tr></table>
<table width="100%"><tr>
<td width="33%" align="left"></td>
<td align="center"><a href="https://www.gaia-gis.it/fossil/libspatialite/wiki?name=4.3.0+doc">back to index</a></td>
<td width="33%" align="right"></td>
</tr></table><br>
<h2>About ST_Subdivide()</h2>
Since version <b>5.0.0</b> SpatiaLite supports <b>ST_Subdivide()</b>, an advanced SQL function already available on <a href="https://postgis.net/docs/ST_Subdivide.html">PostGIS</a>.<br>
The implementation is strictly similar in both Spatial DBMSes because on PostgIS the function is built on the top of the internal <b>lwgeom</b> library, and on SpatiaLite is built on the top of <b>librttopo</b> that simply is a more universal porting of lwgeom outside PostGIS.<br><br>
A short rationale: processing huge geometries having an impressive number of Vertices (many thousands or even more) is an intrinsically slow process.<br>
Subdividing them into many smaller parts (still preserving full topological consistency) usually helps to restore a satisfying processing speed.
This is exactly the intended scope of <b>ST_Subdivide()</b>:
<ul>
<li>this function will receive an input geometry (may well be a very huge one).</li>
<li>each <b>Linestring</b> or <b>Polygon</b> found within the input geometry will be then processed:
<ul>
<li>all Linestrings or Polygons using a number of Vertices lesser or equal than the given threshold will be returned as they are.</li>
<li>but all Linestrings or Polygons using an exceeding number of Vertices will be recursively split, untill they'll be represented by a <b>collection</b> of elementary <b>parts</b> using no more than the required number of vertices.</li>
</ul></li>
<li>at the end of the process the function will return a collection containing all elementary parts (a <b>MultiLinestring</b> or a <b>MultiPolygon</b> depending on the nature of the input geometry).</li>
</ul>
<h2>Basic Examples</h2>
The following examples are based on the <b>am_reg_multipart</b> Shapefile freely available for download from <a href="http://www502.regione.toscana.it/geoscopio/cartoteca.html">here</a> (search for <b>Ambiti Amministrativi</b> - <i>Administrative Boundaries</i>).<br><br>
This one supports a very accurate and precise map representation, and the main exterior ring has more than 100,000+ Vertices.<br>
It's a very good example of a Geometry so huge to make extremely slow computing any Spatial operator (as e.g. ST_Intersects, ST_Touches, ST_Covers and so on).<br><br>
<table bgcolor="#e0ffe0" cellspacing="8" cellpadding="6" border="1">
<tr><th>SQL query</th><th>Visual Sample</th></tr>
<tr>
<td>
<verbatim>
SELECT ST_Subdivide(geometry) FROM tuscany;
</verbatim>
in this first call no optional argument is specified, and thus <b>ST_Subdivide()</b> will implicitly assume the standard threshold of <b>max 128</b> Vertices.<br><br>
As the side figure shows, the returned result is a collection of about 4,300+ elementary parts.
</td>
<td>
<img src="https://www.gaia-gis.it/gaia-sins/subdivide/sub128.png" alt="max=128">
</td>
</tr>
<tr>
<td>
<verbatim>
SELECT ST_Subdivide(geometry, 512) FROM tuscany;
</verbatim>
in this second call the optional argument is explicitly set, and <b>ST_Subdivide()</b> is now assuming a threshold of <b>max 512</b> Vertices.<br><br>
As the side figure shows, the returned result is a collection of about 1,800+ elementary parts.
</td>
<td>
<img src="https://www.gaia-gis.it/gaia-sins/subdivide/sub512.png" alt="max=512">
</td>
</tr>
<tr>
<td>
<verbatim>
SELECT ST_Subdivide(geometry, 2048) FROM tuscany;
</verbatim>
in this third and final call the threshold is set to <b>max 2,048</b> Vertices.<br><br>
As shown by the figure, the returned result is now a collection containing just 1,200+ elementary parts.
<h3>Short conclusion</h3>
<b>ST_Subdivide()</b> is effectively capable to subdivide some nasty huge Geometry into a collection of many smaller parts much more reasonable to be handled.<br>
And it's reasonably flexible and easy to be customized as required; you just have to set the most appropriate <b>max. Vertices</b> threshold.
</td>
<td>
<img src="https://www.gaia-gis.it/gaia-sins/subdivide/sub2048.png" alt="max=2048">
</td>
</tr>
</table>
<br>
However, a <b>big caveat</b> should be always considered about the multi-part collections returned by <b>ST_Subdivide()</b>.<br>
At least in the case of <b>MultiPolygons</b> any collection returned by ST_Subdivide() is inherently invalid, as the following SQL query demonstrates:
<verbatim>
SELECT ST_IsValid(ST_Subdivide(geometry)) FROM tuscany;
---------
0
</verbatim>
It's not at all difficult undestanding why this happens.<br>
Accordingly to standard <b>OGC/SFS</b> rules two single-part Polygons belonging to the same MultiPolygon are alwayes forbidden to reciprocally touch.<br>
More precisely: they can only touch on specific point(s), but they can never share a common boundary.<br><br>
<table bgcolor="#ffd0d0" cellspacing="8" cellpadding="6">
<tr><td align="center"><h3>Caveat !!!</h3></td></tr>
<tr><td><b>Never ever</b> directly pass a Geometry of this kind to any Spatial operator such as ST_Intersects, ST_Touches, ST_Covers and alike.<br><br>
This will certainly make crazy the <b>GEOS</b> library.<br>
Uncorrect results may easily follow.<br>
And in the worst case some unexpected crash could eventually occur.
<h3>You have been warned !!!</h3>
</td>
</tr>
</table>
<br><br>
<hr>
<h2>Advanced Examples</h2>
Now we'll go to explore the real effectiveness of <b>ST_Subdivide()</b> in one of the most common Spatial Problems: identifying all intersections between a dataset of the POINT type and a huge Polygon/Multipolygon.<br><br>
During this second test we'll use the following datasets:
<ul>
<li>once again, the same administrative boundary of Tuscany we've already used before.</li>
<li>in the role of the POINT dataset we'll use the <b>civici</b> Shapefile (<i>House Numers</i>) freely available for download from <a href="http://www502.regione.toscana.it/geoscopio/cartoteca.html">here</a> (search for <b>Grafo Stradale</b> - <i>Road Network</i>).<br>
This too is a huge dataset containing about 1.5 million points.
</ul>
<h3>A first naive attemp</h3>
<verbatim>
SELECT Count(*)
FROM civici AS c
LEFT JOIN tuscany AS t ON (ST_Intersects(c.geometry, t.geometry) = 1);
SELECT Count(*)
FROM civici AS c
LEFT JOIN tuscany AS t ON (ST_Intersects(c.geometry, t.geometry) = 1
AND c.rowid IN (
SELECT rowid FROM SpatialIndex
WHERE f_table_name = 'civici' AND search_frame = t.geometry));
</verbatim>
The only difference between the above two SQL queries is in that the second one explicitly calls the Spatial Index.<br><br>
<b>Actual findings</b>:
<ul>
<li>I was forced to prematurely abort the first test when I discovered that during the first half an hour it have made a very little progress.<br>
Accordingly to my extimated estrapolation, waiting for completion will have preasumably required between 4 and 6 hours.<br>
Definitely this isn't a practical solution for our problem.</li>
<li>the second test was, if possible, even slower than the first one.<br>
And there is very good reason explaining for this; all Points to be checked fall inside th BBOX of Tuscany.<br>
So the Spatial Index isn't of any possible help.<br>
And even worst, querying the Spatial Index imposes a further overhead, thus requiring a longer time for absolutely nothing.</li>
<li><b>Short conclusion</b>: computing many million times the intersections between a Point and a Polygon defined by some hundredth thousands Vertices is a damn hard computational problem.<br>
And consequently it requires a very long time, so long to be absolutely unpractal.</li>
</ul>
<h3>A second attemp based on ST_Subdivide() and VirtualElementary</h3>
<verbatim>
CREATE TABLE xx (id INTEGER PRIMARY KEY);
SELECT AddGeometryColumn('xx', 'geom', 3003, 'MULTIPOLYGON', 'XY');
INSERT INTO xx VALUES (1, NULL);
</verbatim>
We'll start first by creating a convenience Table intended to temporarily store the result of ST_Subdivide(). This Table is very simple:
<ul>
<li>it has just two columns, a Primary Key and a MultiPolygon Geometry.</li>
<li>and it's intended to store just a single row identified by <b>id=1</b></li>
</ul>
<verbatim>
UPDATE xx SET geom = (SELECT ST_Subdivide(geometry, 2048) FROM tuscany) WHERE id = 1;
SELECT Count(*)
FROM ElementaryGeometries AS t
JOIN civici AS c ON (ST_Intersects(c.geometry, t.geometry) = 1
AND c.rowid IN (
SELECT rowid FROM SpatialIndex
WHERE f_table_name = 'civici' AND search_frame = t.geometry))
WHERE f_table_name = 'xx' AND origin_rowid = 1;
----------------
1480991 (in 38.094 seconds)
UPDATE xx SET geom = (SELECT ST_Subdivide(geometry, 512) FROM tuscany) WHERE id = 1;
SELECT Count(*)
FROM ElementaryGeometries AS t
JOIN civici AS c ON (ST_Intersects(c.geometry, t.geometry) = 1
AND c.rowid IN (
SELECT rowid FROM SpatialIndex
WHERE f_table_name = 'civici' AND search_frame = t.geometry))
WHERE f_table_name = 'xx' AND origin_rowid = 1;
----------------
1480991 (in 12.617 seconds)
UPDATE xx SET geom = (SELECT ST_Subdivide(geometry) FROM tuscany) WHERE id = 1;
SELECT Count(*)
FROM ElementaryGeometries AS t
JOIN civici AS c ON (ST_Intersects(c.geometry, t.geometry) = 1
AND c.rowid IN (
SELECT rowid FROM SpatialIndex
WHERE f_table_name = 'civici' AND search_frame = t.geometry))
WHERE f_table_name = 'xx' AND origin_rowid = 1;
----------------
1480991 (in 9.534 seconds)
</verbatim>
<b>Quick evalutation</b>:
<ul>
<li><b>ST_Subdivide</b> definitely is the tool of the trade we were searching for.<br>
It effectively allows to resolve a computationally hard problem in a surprisingly quick time.<br>
Passing from many long hours to just a handfull of seconds is a simply astonishing improvement.<br></li>
<li>subdividing the initial huge Polygon into many simpler/smaller parts has several beneficial effects:
<ul>
<li>the time required to compute a point-to-polygon intersection quicly increases with the number of Vertices.<br>
Smaller polygons are obviously much more faster to be evaluated than bigger ones.</li>
<li>smaller polygons doesn't shadow the Spatial Index, that can be efficiently queryied.</li>
</ul></li>
</ul>
<br><br>
<table width="100%"><tr>
<td width="33%" align="left"></td>
<td align="center"><a href="https://www.gaia-gis.it/fossil/libspatialite/wiki?name=4.3.0+doc">back to index</a></td>
<td width="33%" align="right"></td>
</tr></table>
Z a16df845bfea2a085ee5e682872d121e