Artifact [22f4f55084]
Not logged in

Artifact 22f4f550841c61347fc9eddb86d9cdbc1fe4201c:

Wiki page [About ST_Subdivide()] by sandro 2019-02-16 10:48:06.
D 2019-02-16T10:48:06.719
L About\sST_Subdivide()
P fc842978ba0d7b9506859f68d4e4e2db8628dc3a
U sandro
W 12608
<table cellspacing="12" width="100%">
<tr><td colspan="2">
<table width="100%" bgcolor="#f0f0f8">
<tr><td align="center">
<h1>ST_Subdivide(): a quick intro</h1>
</td></tr></table>
<table width="100%"><tr>
<td width="33%" align="left"></td>
<td align="center"><a href="https://www.gaia-gis.it/fossil/libspatialite/wiki?name=4.3.0+doc">back to index</a></td>
<td width="33%" align="right"></td>
</tr></table><br>
<h2>About ST_Subdivide()</h2>
Since version <b>5.0.0</b> SpatiaLite supports <b>ST_Subdivide()</b>, an advanced SQL function already available on <a href="https://postgis.net/docs/ST_Subdivide.html">PostGIS</a>.<br>
The implementation is strictly similar in both Spatial DBMSes because on PostGIS the function is built on the top of the internal <b>lwgeom</b> library, and on SpatiaLite is built on the top of <b>librttopo</b> that simply is a more universal porting of lwgeom outside PostGIS.<br><br>
A short rationale: processing huge geometries having an impressive number of Vertices (many thousands or even more) is an intrinsically slow process.<br>
Subdividing them into many smaller parts (still preserving full topological consistency) usually helps to restore a satisfying processing speed.
This is exactly the intended scope of <b>ST_Subdivide()</b>:
<ul>
<li>this function will receive an input geometry (may well be a very huge one).</li>
<li>each <b>Linestring</b> or <b>Polygon</b> found within the input geometry will be then processed:
<ul>
<li>all Linestrings or Polygons using a number of Vertices lesser or equal than the given threshold will be returned as they are.</li>
<li>but all Linestrings or Polygons using an exceeding number of Vertices will be recursively split, until they'll be represented by a <b>collection</b> of elementary <b>parts</b> using no more than the required number of vertices.</li>
</ul></li>
<li>at the end of the process the function will return a collection containing all elementary parts (a <b>MultiLinestring</b> or a <b>MultiPolygon</b> depending on the nature of the input geometry).</li>
</ul>
<h2>Basic Examples</h2>
The following examples are based on the <b>am_reg_multipart</b> Shapefile freely available for download from <a href="http://www502.regione.toscana.it/geoscopio/cartoteca.html">here</a> (search for <b>Ambiti Amministrativi</b> - <i>Administrative Boundaries</i>).<br><br>
This one supports a very accurate and precise map representation, and the main exterior ring has more than 100,000+ Vertices.<br>
It's a very good example of a Geometry so huge to make extremely slow computing any Spatial operator (as e.g. ST_Intersects, ST_Touches, ST_Covers and so on).<br><br>
<table bgcolor="#e0ffe0" cellspacing="8" cellpadding="6" border="1">
<tr><th>SQL query</th><th>Visual Sample</th></tr>
<tr>
<td>
<verbatim>
SELECT ST_Subdivide(geometry) FROM tuscany;
</verbatim>
in this first call no optional argument is specified, and thus <b>ST_Subdivide()</b> will implicitly assume the standard threshold of <b>max 128</b> Vertices.<br><br>
As the side figure shows, the returned result is a collection of about 4,300+ elementary parts.
</td>
<td>
<img src="https://www.gaia-gis.it/gaia-sins/subdivide/sub128.png" alt="max=128">
</td>
</tr>
<tr>
<td>
<verbatim>
SELECT ST_Subdivide(geometry, 512) FROM tuscany;
</verbatim>
in this second call the optional argument is explicitly set, and <b>ST_Subdivide()</b> is now assuming a threshold of <b>max 512</b> Vertices.<br><br>
As the side figure shows, the returned result is a collection of about 1,800+ elementary parts.
</td>
<td>
<img src="https://www.gaia-gis.it/gaia-sins/subdivide/sub512.png" alt="max=512">
</td>
</tr>
<tr>
<td>
<verbatim>
SELECT ST_Subdivide(geometry, 2048) FROM tuscany;
</verbatim>
in this third and final call the threshold is set to <b>max 2,048</b> Vertices.<br><br>
As shown by the figure, the returned result is now a collection containing just 1,200+ elementary parts.
<h3>Short conclusion</h3>
<b>ST_Subdivide()</b> is effectively capable to subdivide some nasty huge Geometry into a collection of many smaller parts much more reasonable to be handled.<br>
And it's reasonably flexible and easy to be customized as required; you just have to set the most appropriate <b>max. Vertices</b> threshold.
</td>
<td>
<img src="https://www.gaia-gis.it/gaia-sins/subdivide/sub2048.png" alt="max=2048">
</td>
</tr>
</table>
<br>
However, a <b>big caveat</b> should be always considered about the multi-part collections returned by <b>ST_Subdivide()</b>.<br>
At least in the case of <b>MultiPolygons</b> any collection returned by ST_Subdivide() is inherently invalid, as the following SQL query demonstrates:
<verbatim>
SELECT ST_IsValid(ST_Subdivide(geometry)) FROM tuscany;
---------
0
</verbatim>
It's not at all difficult understanding why this happens.<br>
Accordingly to standard <b>OGC/SFS</b> rules two single-part Polygons belonging to the same MultiPolygon are always forbidden to reciprocally touch.<br>
More precisely: they can only touch on specific point(s), but they can never share a common boundary.<br><br>
<table bgcolor="#ffd0d0" cellspacing="8" cellpadding="6">
<tr><td align="center"><h3>Caveat !!!</h3></td></tr>
<tr><td><b>Never ever</b> directly pass a Geometry of this kind to any Spatial operator such as ST_Intersects, ST_Touches, ST_Covers and alike.<br><br>
This will certainly make crazy the <b>GEOS</b> library.<br>
Incorrect results may easily follow.<br>
And in the worst case some unexpected crash could eventually occur.
<h3>You have been warned !!!</h3>
</td>
</tr>
</table>
<br><br>
<hr>
<h2>Advanced Examples</h2>
Now we'll go to explore the real effectiveness of <b>ST_Subdivide()</b> in one of the most common Spatial Problems: identifying all intersections between a dataset of the POINT type and a huge Polygon/MultiPolygon.<br><br>
During this second test we'll use the following datasets:
<ul>
<li>once again, the same administrative boundary of Tuscany we've already used before.</li>
<li>in the role of the POINT dataset we'll use the <b>civici</b> Shapefile (<i>House Numbers</i>) freely available for download from <a href="http://www502.regione.toscana.it/geoscopio/cartoteca.html">here</a> (search for <b>Grafo Stradale</b> - <i>Road Network</i>).<br>
This too is a huge dataset containing about 1.5 million points.
</ul>
<h3>A first naive attempt</h3>
<verbatim>
SELECT Count(*)
FROM civici AS c
LEFT JOIN tuscany AS t ON (ST_Intersects(c.geometry, t.geometry) = 1);


SELECT Count(*)
FROM civici AS c
LEFT JOIN tuscany AS t ON (ST_Intersects(c.geometry, t.geometry) = 1 
     AND c.rowid IN (
         SELECT rowid FROM SpatialIndex
         WHERE f_table_name = 'civici' AND search_frame = t.geometry));
</verbatim>
The only difference between the above two SQL queries is in that the second one explicitly calls the Spatial Index.<br><br>
<b>Actual findings</b>:
<ul>
<li>I was forced to prematurely abort the first test when I discovered that during the first half an hour it had made only a very little progress.<br>
Accordingly to my estimated extrapolation, waiting for completion will had presumably required between 4 and 6 hours.<br>
Definitely this isn't a practical solution for our problem.</li>
<li>the second test was, if possible, even slower than the first one.<br>
And there is very good reason explaining for this; all Points to be checked fall inside the BBOX of Tuscany.<br>
So the Spatial Index isn't of any possible help.<br>
And even worst, querying the Spatial Index imposes a further overhead, thus requiring a longer time for absolutely nothing.</li>
<li><b>Short conclusion</b>: computing many million times the intersections between a Point and a Polygon defined by some hundredth thousands Vertices is a damn hard computational problem.<br> 
And consequently it requires a very long time, so long to be absolutely unpractical.</li>
</ul>
<h3>A second attempt based on ST_Subdivide() and VirtualElementary</h3>
<verbatim>
CREATE TABLE xx (id INTEGER PRIMARY KEY);
SELECT AddGeometryColumn('xx', 'geom', 3003, 'MULTIPOLYGON', 'XY');
INSERT INTO xx VALUES (1, NULL);
</verbatim>
We'll start first by creating a convenience Table intended to temporarily store the result of ST_Subdivide(). This Table is very simple:
<ul>
<li>it has just two columns, a Primary Key and a MultiPolygon Geometry.</li>
<li>and it's intended to store just a single row identified by <b>id=1</b></li>
</ul>
<verbatim>
UPDATE xx SET geom = (SELECT ST_Subdivide(geometry, 2048) FROM tuscany) WHERE id = 1;
SELECT Count(*)
FROM ElementaryGeometries AS t
JOIN civici AS c ON (ST_Intersects(c.geometry, t.geometry) = 1
     AND c.rowid IN (
         SELECT rowid FROM SpatialIndex
         WHERE f_table_name = 'civici' AND search_frame = t.geometry))
WHERE f_table_name = 'xx' AND origin_rowid = 1;
----------------
1480991 (in 38.094 seconds)


UPDATE xx SET geom = (SELECT ST_Subdivide(geometry, 512) FROM tuscany) WHERE id = 1;
SELECT Count(*)
FROM ElementaryGeometries AS t
JOIN civici AS c ON (ST_Intersects(c.geometry, t.geometry) = 1
     AND c.rowid IN (
         SELECT rowid FROM SpatialIndex
         WHERE f_table_name = 'civici' AND search_frame = t.geometry))
WHERE f_table_name = 'xx' AND origin_rowid = 1;
----------------
1480991 (in 12.617 seconds)


UPDATE xx SET geom = (SELECT ST_Subdivide(geometry) FROM tuscany) WHERE id = 1;
SELECT Count(*)
FROM ElementaryGeometries AS t
JOIN civici AS c ON (ST_Intersects(c.geometry, t.geometry) = 1
     AND c.rowid IN (
         SELECT rowid FROM SpatialIndex
         WHERE f_table_name = 'civici' AND search_frame = t.geometry))
WHERE f_table_name = 'xx' AND origin_rowid = 1;
----------------
1480991 (in 9.534 seconds)
</verbatim>
<b>Quick evaluation</b>:
<ul>
<li><b>ST_Subdivide</b> definitely is the tool of the trade we were searching for.<br>
It effectively allows to resolve a computationally hard problem in a surprisingly quick time.<br>
Passing from many long hours to just a handful of seconds is a simply astonishing improvement.<br></li>
<li>subdividing the initial huge Polygon into many simpler/smaller parts has several beneficial effects:
<ul>
<li>the time required to compute a point-to-polygon intersection quickly increases with the number of Vertices.<br>
Smaller polygons are obviously much more faster to be evaluated than bigger ones.</li>
<li>smaller polygons doesn't shadow the Spatial Index, that can be efficiently queried.</li>
</ul></li>
<li>using <b>VirtualElementary</b> in order to process all sub-parts one by one surely imposes some further complexity to our SQL queries.<br>
But at the end of the day it confirms to be a very efficient mechanism.</li>
<li><b>Note</b>: SpatiaLite is not PostGIS, and SQLite3 is not PostgreSQL; their respective internal architectures significantly differ under many aspects.<br>
More specifically, SQLite is completely unable to support arrays or multi-row results, so using VirtualElementary is an absolutely necessary workaround.</li>
</ul>
<h3>A final experimental attempt based on a pure SQL approach</h3>
There is an alternative mechanism based on pure SQL and not requiring to use nor ElementaryGeometries neither a convenience Table:
<verbatim>
WITH RECURSIVE magic(n, geom) AS (
    VALUES(1, (SELECT ST_Subdivide(geometry, 2048) FROM tuscany))
    UNION ALL
    SELECT n + 1, geom
    FROM geom_ind
    WHERE ST_GeometryN(geom, n) IS NOT NULL
)
SELECT n, ST_GeometryN(geom, n) FROM magic;
</verbatim>
<ul>
<li>we simply have to write a <b>Recursive SQL Query</b> in order to get a multi-row resultset, with a distinct row for each elementary part.</li>
<li>it effectively works, but seems to be slower than ElementaryGeometries.</li>
<li>and even more relevant, I was unable to write a complete Query determining all intersections between Points and Polygons in a reasonable time.<br>
I can simply suppose that the Query Optimizer of SQLite will end up in ignoring the Spatial Index, thus completely vanishing the usefulness of this approach.<br>
May well be that someone else could succeed where I failed; any further third-party investigation on this specific area is surely welcome.</li>
</ul>
<br><br>
<hr>
<br><br>
<table width="100%"><tr>
<td width="33%" align="left"></td>
<td align="center"><a href="https://www.gaia-gis.it/fossil/libspatialite/wiki?name=4.3.0+doc">back to index</a></td>
<td width="33%" align="right"></td>
</tr></table>
Z fc382a8b9504a084eef3c8e1b8005df3