SpatiaLite: Artifact [a027126d7e]

Artifact a027126d7e8b7b78fa1b0d9dd1e7071278910314:

Wiki page [About ST_Subdivide()] by sandro 2019-02-24 17:26:15.
D 2019-02-24T17:26:15.353
L About\sST_Subdivide()
P 22f4f550841c61347fc9eddb86d9cdbc1fe4201c
U sandro
W 12546
<table cellspacing="12" width="100%">
<tr><td colspan="2">
<table width="100%" bgcolor="#f0f0f8">
<tr><td align="center">
<h1>ST_Subdivide(): a quick intro</h1>
</td></tr></table>
<table width="100%"><tr>
<td width="33%" align="left"></td>
<td align="center"><a href="https://www.gaia-gis.it/fossil/libspatialite/wiki?name=4.3.0+doc">back to index</a></td>
<td width="33%" align="right"></td>
</tr></table><br>
<h2>About ST_Subdivide()</h2>
Starting with version <b>5.0.0</b> SpatiaLite supports <b>ST_Subdivide()</b>, an advanced SQL function already available on <a href="https://postgis.net/docs/ST_Subdivide.html">PostGIS</a>.<br>
The implementation is very similar in both Spatial DBMSes because in PostGIS the function is created with the internal <b>lwgeom</b> library, and in SpatiaLite with <b>librttopo</b>, which is a universal porting of lwgeom without PostGIS.<br><br>
A short rationale: processing huge geometries having an impressive number of Vertices (many thousands or even more) is an extremely slow process.<br>
Subdividing them into many smaller parts (while preserving full topological consistency) usually results in a satisfying processing speed.
This is exactly the intended scope of <b>ST_Subdivide()</b>:
<ul>
<li>this function will receive an input geometry (which may very well be extremely large).</li>
<li>each <b>Linestring</b> or <b>Polygon</b> found within the input geometry will be processed:
<ul>
<li>all Linestrings or Polygons using a number of Vertices lesser or equal than the given threshold will be returned as they are.</li>
<li>everything else will be recursively split, until a <b>collection</b> of elementary parts exists, using no more than the required number of vertices.</li>
</ul></li>
<li>when compleated, the function will return one geometry as a collection containing all of the elementary parts (a <b>MultiLinestring</b> or a <b>MultiPolygon</b> depending on the nature of the input geometry).</li>
</ul>
<h2>Basic Samples</h2>
The following sampless are based on the, freely available, <b>am_reg_multipart</b> Shapefile and can be download from here  <a href="http://www502.regione.toscana.it/geoscopio/cartoteca.html">here</a> (search for <b>Ambiti Amministrativi</b> - <i>Administrative Boundaries</i>).<br><br>
This Shapefile supports a very accurate and precise map representation, and the main exterior ring has more than 100,000+ Vertices.<br>
It's a very good example of a Geometry, that is so huge and will cause any Spatial operator, such as ST_Intersects, ST_Touches, ST_Covers etc., to become extremely slow.<br><br>
<table bgcolor="#e0ffe0" cellspacing="8" cellpadding="6" border="1">
<tr><th>SQL query</th><th>Visual Sample</th></tr>
<tr>
<td>
<verbatim>
SELECT ST_Subdivide(geometry) FROM tuscany;
</verbatim>
in this first call, with no optional argument specified, <b>ST_Subdivide()</b> will start splitting <b>after</b> use the standard threshold of <b>128 Vertices</b> is exceeded.<br><br>
As the side figure shows, the returned result is a collection of about 4,300+ elementary parts.
</td>
<td>
<img src="https://www.gaia-gis.it/gaia-sins/subdivide/sub128.png" alt="max=128">
</td>
</tr>
<tr>
<td>
<verbatim>
SELECT ST_Subdivide(geometry, 512) FROM tuscany;
</verbatim>
in this second call the optional argument is explicitly set, and <b>ST_Subdivide()</b> is now assuming a threshold of <b>512 Vertices</b>.<br><br>
As the side figure shows, the returned result is a collection of about 1,800+ elementary parts.
</td>
<td>
<img src="https://www.gaia-gis.it/gaia-sins/subdivide/sub512.png" alt="max=512">
</td>
</tr>
<tr>
<td>
<verbatim>
SELECT ST_Subdivide(geometry, 2048) FROM tuscany;
</verbatim>
in this third and final call the threshold is set to start after <b>2,048 Vertices</b>.<br><br>
As shown by the figure, the returned result is now a collection containing only 1,200+ elementary parts.
<h3>Short conclusion</h3>
<b>ST_Subdivide()</b> is very capable of subdividing a huge, nasty, Geometry into a collection of smaller parts, that can be handled in a more reasonable manner.<br>
And it's ay flexible and easy to customize; you just have to set an appropriate <b>Vertices</b> threshold.
</td>
<td>
<img src="https://www.gaia-gis.it/gaia-sins/subdivide/sub2048.png" alt="max=2048">
</td>
</tr>
</table>
<br>
However, a big disadvantage should be always be considered when using the multi-part collections returned by <b>ST_Subdivide()</b>.<br>
At least in the case of <b>MultiPolygons</b>, any collection returned by ST_Subdivide() is inherently invalid, as the following SQL query demonstrates:
<verbatim>
SELECT ST_IsValid(ST_Subdivide(geometry)) FROM tuscany;
---------
0
</verbatim>
It's not at all difficult to understand why this is so.<br>
Accordingly to standard <b>OGC/SFS</b> rules: two single-part Polygons, belonging to the same MultiPolygon, are always forbidden to share a common border.<br>
More precisely: they can only touch on specific point(s), but they can never share a common boundary.<br><br>
<table bgcolor="#ffd0d0" cellspacing="8" cellpadding="6">
<tr><td align="center"><h3>[&iexcl;&iexcl;&iexcl; Warning !!!]</h3></td></tr>
<tr><td><b>Never, ever, pass such a Geometry  to any Spatial operator such as ST_Intersects, ST_Touches, ST_Covers etc..<br><br>
This will certainly drive the <b>GEOS</b> library nutty.<br>
Incorrect results may easily follow.<br>
And in the worst case some unexpected crash could eventually occur.
<h3>You have been warned !!!</h3>
</td>
</tr>
</table>
<br><br>
<hr>
<h2>Advanced Samples</h2>
Now we'll go on to explore the real effectiveness of <b>ST_Subdivide()</b> for one of the most common Spatial Problems: identifying all intersections between a dataset of the POINT type and a huge Polygon/MultiPolygon.<br><br>
During this second test we'll use the following datasets:
<ul>
<li>using, once again, the same administrative boundary of Tuscany we've already used before.</li>
<li>for the POINT dataset we'll use the <b>civici</b> Shapefile (<i>House Numbers</i>)), also freely available, and can be download here <a href="http://www502.regione.toscana.it/geoscopio/cartoteca.html">here</a> (search for <b>Grafo Stradale</b> - <i>Road Network</i>).<br>
This too is a huge dataset containing about 1.5 million points.
</ul>
<h3>A first simple attempt</h3>
<verbatim>
SELECT Count(*)
FROM civici AS c
LEFT JOIN tuscany AS t ON (ST_Intersects(c.geometry, t.geometry) = 1);


SELECT Count(*)
FROM civici AS c
LEFT JOIN tuscany AS t ON (ST_Intersects(c.geometry, t.geometry) = 1 
     AND c.rowid IN (
         SELECT rowid FROM SpatialIndex
         WHERE f_table_name = 'civici' AND search_frame = t.geometry));
</verbatim>
The only difference between the two SQL queries is in that the second one explicitly calls the Spatial Index.<br><br>
<b>Actual findings</b>:
<ul>
<li>I was forced to prematurely abort the first test when I discovered that, during the first half hour, it had made little progress.<br>
My estimated conclusion was that between 4 and 6 hours would be required for completion.<br>
Definitely this isn't a practical solution for our problem.</li>
<li>the second test was, if possible, even slower than the first one.<br>
And for very good reasons, that are easy to explain; all Points to be checked fall inside the BBOX of Tuscany.<br>
So the Spatial Index isn't of any possible help.<br>
And to make matters worst, querying the Spatial Index imposes a further overhead, thus requiring a far longer time to achieve absolutely nothing.</li>
<li><b>Short conclusion</b>: computing millions of intersections between a Point and a Polygon, containing hundreds of thousands of Vertices, is a complex computational problem.<br> 
Requiring such a very long time, makes such a solution absolutely unpractical, so a new approch to solve this problem is needed.</li>
</ul>
<h3>A second attempt based on ST_Subdivide() and VirtualElementary</h3>
<verbatim>
CREATE TABLE xx (id INTEGER PRIMARY KEY);
SELECT AddGeometryColumn('xx', 'geom', 3003, 'MULTIPOLYGON', 'XY');
INSERT INTO xx VALUES (1, NULL);
</verbatim>
We'll start first by creating a convenience Table, which will store the result of ST_Subdivide().<br>
This is a very simple Table:
<ul>
<li>it has just two columns, a Primary Key and a MultiPolygon Geometry.</li>
<li>and will store only a single row identified by <b>id=1</b></li>
</ul>
<verbatim>
UPDATE xx SET geom = (SELECT ST_Subdivide(geometry, 2048) FROM tuscany) WHERE id = 1;
SELECT Count(*)
FROM ElementaryGeometries AS t
JOIN civici AS c ON (ST_Intersects(c.geometry, t.geometry) = 1
     AND c.rowid IN (
         SELECT rowid FROM SpatialIndex
         WHERE f_table_name = 'civici' AND search_frame = t.geometry))
WHERE f_table_name = 'xx' AND origin_rowid = 1;
----------------
1480991 (in 38.094 seconds)


UPDATE xx SET geom = (SELECT ST_Subdivide(geometry, 512) FROM tuscany) WHERE id = 1;
SELECT Count(*)
FROM ElementaryGeometries AS t
JOIN civici AS c ON (ST_Intersects(c.geometry, t.geometry) = 1
     AND c.rowid IN (
         SELECT rowid FROM SpatialIndex
         WHERE f_table_name = 'civici' AND search_frame = t.geometry))
WHERE f_table_name = 'xx' AND origin_rowid = 1;
----------------
1480991 (in 12.617 seconds)


UPDATE xx SET geom = (SELECT ST_Subdivide(geometry) FROM tuscany) WHERE id = 1;
SELECT Count(*)
FROM ElementaryGeometries AS t
JOIN civici AS c ON (ST_Intersects(c.geometry, t.geometry) = 1
     AND c.rowid IN (
         SELECT rowid FROM SpatialIndex
         WHERE f_table_name = 'civici' AND search_frame = t.geometry))
WHERE f_table_name = 'xx' AND origin_rowid = 1;
----------------
1480991 (in 9.534 seconds)
</verbatim>
<b>Quick evaluation</b>:
<ul>
<li><b>ST_Subdivide</b> definitely is the <b><i>tool of the trade</i></b> we were searching for.<br>
It effectively allows to resolve a computationally complex problem in a surprisingly quick time.<br>
From many long hours, to just a handful of seconds, is a singularly astonishing improvement.<br></li>
<li>subdividing the initial huge Polygon into many simpler/smaller parts has several beneficial effects:
<ul>
<li>the time required to compute a point-to-polygon intersection quickly increases with the number of Vertices.<br>
Smaller polygons are obviously swifter to be evaluated than bigger ones.</li>
<li>smaller polygons can be an efficient Spatial Index filter.</li>
</ul></li>
<li>using <b>VirtualElementary</b>, in order to process all sub-areas one by one, surely imposes further complexity to our SQL queries.<br>
But in the end, this method proves itself to be a very efficient mechanism to resolve this complex problem.</li>
<li><b>Note</b>: SpatiaLite is not PostGIS, and SQLite3 is not PostgreSQL; their respective internal architectures significantly differ in many ways.<br>
More specifically, SQLite is completely unable to support arrays or multi-row results, so using VirtualElementary is an absolutely necessary workaround.</li>
</ul>
<h3>A final experimental attempt based on a pure SQL approach</h3>
Another alternative mechanism, based on pure SQL and not requiring the use of ElementaryGeometries or of a convenience Table is:
<verbatim>
WITH RECURSIVE magic(n, geom) AS (
    VALUES(1, (SELECT ST_Subdivide(geometry, 2048) FROM tuscany))
    UNION ALL
    SELECT n + 1, geom
    FROM geom_ind
    WHERE ST_GeometryN(geom, n) IS NOT NULL
)
SELECT n, ST_GeometryN(geom, n) FROM magic;
</verbatim>
<ul>
<li>we simply have to write a <b>Recursive SQL Query</b> in order to get a multi-row resultset, with a distinct row for each elementary part.</li>
<li>it effectively works, but seems to be slower than ElementaryGeometries.</li>
<li>and even more relevant, I was unable to write a complete Query determining all intersections between Points and Polygons in a reasonable time.<br>
I suspect that the Query Optimizer of SQLite simply ignores the Spatial Index, thus ruining any advantage that this approach would bring.<br>
It may well be, that someone else could succeed where I failed; any further third-party investigation on this specific area would be surely welcome.</li>
</ul>
<br><br>
<hr>
<br><br>
<table width="100%"><tr>
<td width="33%" align="left"></td>
<td align="center"><a href="https://www.gaia-gis.it/fossil/libspatialite/wiki?name=4.3.0+doc">back to index</a></td>
<td width="33%" align="right"></td>
</tr></table>
Z c3964904ae73b24660b5e533be65989e