Broken R*Tree caused by shadowed ROWID issues
A new issue affecting the R*Tree Spatial Index has been recently reported.The specific problem causing this issue seems to have a very low probability to happen under normal conditions; anyway, once this problem will actually arise the consequences could easily have a really strong negative impact on the overall performance of SpatiaLite.
So you are kindly invited to carefully read this technical note.
Understanding what really is a ROWID
From the SQLite's own documentation:Each entry in an SQLite table has a unique 64-bit signed integer key called the "rowid". The rowid is always available as an undeclared column named ROWID, OID, or _ROWID_ as long as those names are not also used by explicitly declared columns. If the table has a column of type INTEGER PRIMARY KEY then that column is another alias for the rowid.In other words: a ROWID represents a Unique Identifier allowing to retrieve every individual row from a table in a very quick time.
Using ROWID values as relational keys whenever is possible is really useful in many SQL queries, because relational JOIN ops based on ROWIDs always are very efficient and fast to be resolved.
The SpatialLite's own implementation of the Spatial Index widely depends on ROWIDs in order to JOIN both the main table containing Geometries and the corresponding R*Tree table.
Anyway there is a dangerous hidden trap in all this: it works well only since ROWID values returned by the main table do exactly correspond to ROWID values stored into the R*Tree. If (for any reason) the above assumption is no longer satisfied then the Spatial Index will simply return unreliable and useless results.
You'll probably be well aware that not declaring an INTEGER PRIMARY KEY supporting any Geometry table could directly cause many severe Spatial Index releted headaches, e.g. immediately after performing a VACUUM. If you missed this critical point, reading this old technical note is warmly suggested.
The recently discovered issue is even worst (if possible) than the previous one: it will not simply cause a severe Spatial Index corruption, it could eventually transform any SQL query attempting to access the Spatial Index into a deadly slow query.
Now we'll read yet again the above paragraph taken from SQLite's own documentation, this time paying the due attention to any single world:
The rowid is always available as an undeclared column named ROWID, OID, or _ROWID_ as long as those names are not also used by explicitly declared columns. |
Here is the dangerous pitfall: if the same table containing your Geometries actually declares (for any possible reason) an ordinary column named "rowid", then all values contained into this column will completely shadow the expected Unique Identifiers allowing to immediately access each single row (aka ROWID values, as usually intended).
And as a rather obvious consequence a strong relationship JOINing each Geometry to its Spatial Index entry will immediately vanish.
That's not all: the worst consequence will now be that instead of using a really fast key allowing to quickly retrieve each row, your query under such odd conditions will require a lengthy full table scan for each single row to be retrieved; and consequently a Spatial Index query expected to add a noticeable performance boost will cause instead a catastrophic performance loss.
Happily enough this issue doesn't seems to materialize too often in real world cases; anyway once it occurs it could really have a dramatic negative impact.
Be aware and well conscious of the problem.
Remedies and cures
The current code base (already available from the Fossil repository) adopts the following precautions in order to mitigate this recently discovered issue:- CreateSpatialIndex SQL function
will now refuse to build a Spatial Index if the corresponding table declares any column named rowid, returning in this case a -1 value (specif error condition). Please note: this isn't a complete remedy, because a column named rowid could eventually be added to the table in a second time by invoking an ALTER TABLE ... ADD COLUMN rowid ... statement. - CheckSpatialIndex SQL function
will always refuse to check any Spatial Index corresponding to a shadowed rowid immediately returning a -1 value. - RecoverSpatialIndex SQL function
will refuse as well to check any Spatial Index corresponding to a shadowed rowid immediately returning a -1 value. - VirtualSpatialIndex interface
will refuse to access any Spatial Index corresponding to a shadowed rowid. - last but not least, a brand new CheckShadowedRowid SQL function has been implemented.
How to use CheckShadowedRowid
SELECT f_table_name, f_geometry_column FROM geometry_columns WHERE CheckShadowedRowid(f_table_name) = 1; ------ broken_table geometry insane_table geometry crazy_table geometryExecuting an elementary simple query as the above one requires just few seconds and could probably avoid to spend many long hours in fruitless debugging sessions.
Don't hesitate to immediately perform a check like this if you simply suspect that something is going bad in your Spatial Indices, or when you notice some puzzling or unexpected behaviour even vaguely related to Spatial Indices.
back