Back to main SpatiaLite-Tools Wiki page

Changes affecting the spatialite CLI frontend

A new dot macro already supported by the standard sqlite3 front end has been added starting since version 3.8.5:

.shell CMD ARGS......

or

.system CMD ARGS.....

Both .shell and .system are exact synonyms: invoking the one or the other always produces identical effects. By invoking one of these new dot macros you are now free to executes any system command directly from the CLI; and this obviously extends to any SQL script being executed from the standard input.

$ spatialite
......
Enter ".help" for instructions
SQLite version 3.8.5 2014-06-04 14:06:34
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
spatialite> .shell ls -l *.sqlite
-rw-r--r-- 1 sandro Administrators  27779072 Mar  5 18:44 beppe.sqlite
-rw-r--r-- 1 sandro Administrators    427008 Jul  3 21:14 gpkg_test.sqlite
-rw-r--r-- 1 sandro Administrators 276952064 Apr 28 19:00 grafo.sqlite
spatialite>

As you can see in this example, you simply have to specify the command name followed by any required argument immediately after declaring the .shell or .system macro.

new XML processing tools

Starting since version 4.2.0 three new tools have been added, and they are specifically intended to support XML processing via pure SQL:

spatialite_xml_load: this tool will parse any generic XML document of unlimited complexity, and will consequently create and populate a DB-file faithfully translating the XML tree into many relationally joined DBMS tables.
The whole translation is performed in such a way so to be sure that absolutely no information will be never lost or suppressed.
If two (or even more) XML files share exactly the same identical logical layout (i.e. they all support the same identical formal schema definition) this tool is capable to merge all them within the same DB-file.
There are no imposed size-limits: some huge GML files as big as many GBs have been successfully loaded by using this tool.
spatialite_xml_collapse: this tool is intended to simplify (by collapsing any not strictly required ramification) any DB-file created by spatialite_xml_load.
Collapsing a DB-file is a final irreversible operation.
spatialite_xml_print: this tool is intended to export a single XML file corresponding to the current DB-file content.
It's useful i.e. when you've some way modified the original contents by performing some SQL operation.

when and why using the XML tools could be a brilliant idea

Just few general order considerations:

if you simply are an ordinary end-user (i.e. a people mainly interested in consuming already available data coming from external trusted sources) you'll probably judge the whole XML/SQL stuff boring, cumbersome and practically useless.
if you are some kind of power-user (i.e. an highly skilled professional not worrying about performing complex data processing tasks) you'll probably find the XML/SQL tools some way interesting and may be sometimes useful.
anyway the intended scope for the XML/SQL tools is to efficiently support complex (and possibly repetitive) processes related with data validation and acceptance tests. In other worlds: it's an highly specialized tool mainly intended for peoples involved in producing high-quality and robustly self-consistent datasets.

Short conclusion: the XML tools make relatively easy and painless transforming any complex XML document into a DBMS schema. Starting from this point forward you'll be absolutely free to deploy full SQL (and eventually Spatial SQL) firepower in any conceivable way.
There are no practical limits except your own technical skills and creative imagination: you'll be now free to process, edit or validate your data in any possible way. And by wisely using SQL scripting techniques you could eventually nicely automatize many boring repetitive tasks.
If such a scenario sounds even vaguely interesting to you, giving a deeper glance at the new XML tools could be an interesting experience.

Mapping XML Entities into DBMS relational Tables: general order principles

any XML document can always be represented as a tree composed of elementary nodes.
every node in the tree will have a parent and could eventually have children.
- with the very remarkable exception of the root node who has no parent at all; this always is the first node found in any XML document.
  Only a single root can exist in a tree.
- nodes lacking any child are usually known as leaf nodes.
- all nodes sharing the same parent are usually referenced as siblings.
parent-child relationships are the real backbone modeling the whole tree.
Special condition: the <C><D><Z> chain has no internal ramification, so it's a very good candidate to be collapsed in a single cumulative node.
Please note: the same kind of node can eventually appear in different tree positions under different kind of parents.
In this arbitrary example this is e.g. the case of <Z> nodes (child of either <D> or <A> nodes).
The same condition applies to <W> nodes (child of either <B> or <E> nodes).

<W alpha="value1" beta="value2">
    <X gamma="value3">value4</X>
    <Y>value5</Y>
</W>

Accordingly to the XML format specifications:

a node could eventually contain a value, as e.g. in
<X>value4</X>
<Y>value5</Y>
a node could eventually contain one (or more) attributes, and each attribute will be expressed as a name=value pair as e.g. in
<W alpha="value1" beta="value2">
<X gamma="value3">

the whole XML tree will be mapped into many DBMS tables.
each kind of node will directly match a corresponding Table.
all parent-child relationships will be represented as Primary / Foreign Key relational constraints joining two Tables.
as a direct consequence, when the same kind of node appears on the tree in different positions (under different kind of parents) distinct tables must necessarily be defined, so to faithfully represent the original parenthood.
This is the case of e.g. "a_b_c_d_z" and "a_z" tables; both them are intended to match an original <Z> XML node, but we absolutely have to keep these two tables well separated so to exactly map parent-child relationships as PK-FK constraints.
Exactly the same applies to "a_b_w" and "a_e_w" tables; but in this second case this further propagates respectively to "a_b_w_x", "a_b_w_y" and "a_e_w_x", "a_e_w_y" tables.

CREATE TABLE "a_b_c_d_z" (
node_id INTEGER PRIMARY KEY AUTOINCREMENT,
parent_id INTEGER NOT NULL,
node_value TEXT,
CONSTRAINT "fk_a_b_c_d_z" FOREIGN KEY (parent_id) 
  REFERENCES "a_b_c_d" (node_id) ON DELETE CASCADE);

Each DBMS table will always support this basic layout:

the node_id column (Primary Key) is the unique identifier for each node.
the parent_id column (Foreign Key) directly references the parent node.
the node_value column will eventually contain the value associated to this node (if any). Such value will always be of the Text type accordingly to XML baseline requirements.

CREATE TABLE "a_e_w" (
node_id INTEGER PRIMARY KEY AUTOINCREMENT,
parent_id INTEGER NOT NULL,
alpha TEXT,
beta TEXT,
node_value TEXT,
CONSTRAINT "fk_a_e_w" FOREIGN KEY (parent_id) 
  REFERENCES "a_e" (node_id) ON DELETE CASCADE);

if an XML node will eventually contain one (or more) attributes, they'll be consequently added to the DBMS Table as Text columns.

Mapping XML Entities into DBMS relational Tables: fine-grained details

The above criteria allow to faithfully map any possible XML document into a DBMS relational schema. It's a nice and elegant design, it's rather simple and intuitive, and it can easily be extended so to automatically adapt to any possible data layout without imposing any constrained assumption.
Anyway real-world XML documents can easily present very ramified and very deep tree structures, thus causing an impressive proliferation of the required DBMS tables. This is not a big issue by itself: any DBMS is surely capable to store many hundredths tables.

So the first practical problem we have to preliminary resolve is the one to assign appropriate (and possibly intuitive) names to each DBMS tables.

a first naive naming criterion could be the one to simply assign artificially generated names safely ensuring uniqueness: e.g. as in tbl001, tbl002 and so on.
The main defect in this method is in that this way we'll certainly finishing to fight against lots of obscure and meaningless names; not at all a comfortable situation.
a surely better approach is the one to compose each table name by directly using XML node names exactly as they are, and chaining all the ancestors names so to precisely represent the hierarchical position of each table.
spatialite_xml_load exactly adopts such a criterion as its base naming strategy. Anyway this could easily produce intolerably long names, as e.g. dbtopofeaturecollection_dbtopofeaturemembers_porzioneterritoriorestituito_geometria_multisurface_surfacemember_polygon_interior_linearring_poslist; any attempt to use such a long table name in some SQL query will then be desperately impractical.
happily enough, a smarter criterion leading to shorter names is available.
Explicitly referencing the whole chain of ancestors always starting from the root node isn't usually required, because in many cases just referencing two or three ancestors (e.g. father, grandfather and great-grandfather) will ensure uniqueness. spatialite_xml_load supports such an advanced option.

A second option allowing to significantly reduce the depth of the XML tree structured is the one to immediately collapse all GLM geometries:

GML notations are usually very verbose and very ramified.
thus immediately resolving any GML geometry eventually found when parsing the input XML document will surely reduce both complexity and depth of the DBMS layout. A further benefit comes from activating this option; this way you'll immediately get a genuine SpatiaLite geometry you can directly use in any following SQL query.

XML MetaCatalog

Any DB-file created by the spatialite_xml_load tool will contain these two tables intended to facilitate the correct interpretation of the DBMS layout:

xml_metacatalog_tables: this first table will contain a row for each single table created by spatialite_xml_load.
- the table_name column is the Primary Key
- the tree_level column shows the corresponding depth of this Node in the XML tree (the root node corresponds to level 0).
- the xml_tag_namespace and xml_tag_name exactly correspond to the XML node name (as e.g. in <ns:tag>).
- the parent_table_name exactly identifies the corresponding parent node (will be NULL for the root node).
- the gml_geometry_column will be usually set to NULL, and will contain a meaningful value only in the case of collapsed GML Geometries.
- the status column will tell if the table was eventually affected by any subsequent post-processing operation (typically: after executing the spatialite_xml_collapse tool).
xml_metacatalog_columns: this second table will contain a row for each column in the above tables.
- the table_name and column_name columns are the Primary Key; table_name is a Foreign Key referencing xml_metacatalog_tables.
- the origin column will state the intended role for the column (e.g. a value directly imported from the XML file, DBMS internal identifiers, and son on).
- the destination column will state if the column was affected by any post-processing operation (after executing the spatialite_xml_collapse tool).
- the xml_reference will exactly correspond to the original XML syntactic role of the column (if any).

The XML MetaCatalog contains many useful informations, and allows to fully reconstruct the original structure of the input XML tree.
It's used e.g. by the spatialite_xml_print tool when exporting again an XML document from the DBMS.

Tutorials and practical examples

how-to process some simple GML dataset.
how-to process some complex GML-Topology dataset.

Back to main SpatiaLite-Tools Wiki page

Credits
Development of SpatiaLite XML-Tools 4.2.0 has been funded by Tuscany Region - Territorial and Environmental Information System
Regione Toscana - Settore Sistema Informativo Territoriale ed Ambientale.