XML Tools Tutorial: how to process a Complex GML-Topology dataset
Back to main pageDownloading the sample dataset
In this example we'll use a rather small but anyway complete GML-Topology sample kindly contributed by Tuscany Region (many thanks to Dr. Maurizio Trevisani).- Download the sample from here
- then uncompress the 7z archive
- as you can easily check, there are three strictly related GML files composing this dataset:
- the GML_RT98030316-3-328-4.xml file contains the main GML Topology.
- the GML_RT98030316-3-328-4_Curve_Livello.xml file contains the corresponding contour lines.
- the GML_RT98030316-3-328-4_Punti_Quotati.xml file contains the corresponding spot elevations.
Loading into the DBMS
$ spatialite_xml_load -x GML_RT98030316-3-328-4.xml -d rt-sample.sqlite -cg -xl -pl 2 -nl 1 SQLite version: 3.8.5 SpatiaLite version: 4.2.0-rc2 Target DB: rt-sample.sqlite Input XML: GML_RT98030316-3-328-4.xml Done - inserted/updated rows: 589652 [0:00:26.070] $This first invocation of spatialite_xml_load will create and initialize a new DB-file, and will import into it the main Topology.
- You've already encountered the -cg, -pl and -nl arguments in the previous Tutorial.
- Please note: in this specific case we'll be required to specify the -xl argument as well.
The GML Topology standard heavily relies on indirect references based on <gml:id> and <xlink:href> correspondences; by specifying the special -xl argument you'll requests spatialite_xml_load to activate any required support.
$ spatialite_xml_load -x GML_RT98030316-3-328-4_Curve_Livello.xml -d rt-sample.sqlite -cg -xl -pl 2 -nl 1 SQLite version: 3.8.5 SpatiaLite version: 4.2.0-rc2 Target DB: rt-sample.sqlite Input XML: GML_RT98030316-3-328-4_Curve_Livello.xml Done - inserted/updated rows: 14922 [0:00:04.586] $ spatialite_xml_load -x GML_RT98030316-3-328-4_Punti_Quotati.xml -d rt-sample.sqlite -cg -xl -pl 2 -nl 1 SQLite version: 3.8.5 SpatiaLite version: 4.2.0-rc2 Target DB: rt-sample.sqlite Input XML: GML_RT98030316-3-328-4_Punti_Quotati.xml Done - inserted/updated rows: 68827 [0:00:04.914] $And finally you simply have to load the last two GML files into the same DB-file.
direct check
Start a spatialite CLI or spatialite_gui session and connect to the DB-file you've just created:- As you can see, there are lots of tables; the XML tree supporting this GML schema is rather complex, but this simply is because this one is a real-world sample corresponding to an high-quality vector map.
Recall: you can query both xml_metacatalog_tables and xml_metacatalog_columns in order to correctly reconstruct the whole tree hierarchy. - we'll now focus our attention just on the more relevant topology items:
- Topology Nodes:
- the dbtopofeaturemembers_edge_directednode_node table does contain all Nodes; each individual Node is expected to declare its own unique identifier (gml_id column).
- Topology Edges:
- the dbtopofeaturemembers_topocurve_directededge_edge table does contain all gml_id values uniquely identifying each single Edge.
- the dbtopofeaturemembers_directededge_edge_curveproperty (child of the above one) does contains the collapsed geometry corresponding to each Edge.
- the dbtopofeaturemembers_directededge_edge_directednode table (sibling of the above one) does contain the start-Node and end-Node for each Edge.
- Please note: sometimes an Edges could reference a Node via a direct parent-child relationship; but in many other cases this relation is expressed in an indirect way using gml_id and xlinl_href cross references.
- Please note well: the coordinates for each Node are always expressed in a virtual way, i.e. they are assumed to coincide with the first (or respectively last) point of the Edge.
- Topology Faces
- the dbtopofeaturemembers_toposurface_directedface_face table does contain all gml_id values uniquely identifying each single Face.
- the dbtopofeaturemembers_directedface_face_directededge table (child of the above one) does enumerates all Edges delimiting each Face; every Edge is always indirectly identified by its xlink_href value.
- Please note: all Faces will never have an explicit Geometry. Accordingly to Topology principles such Geometry will always be indirectly determined by their delimiting Edges.
- Topology Nodes:
few useful Topology validations via pure SQL processing
extracting all Topology Nodes
Recall: all Topology Nodes are simply represented as pure virtual entities uniquely identified by a gml_id value.So we'll be necessarily required to scan all Edges (and their corresponding Geometries) in order to extract the physical Nodes with an appropriate Geometry.
CREATE TABLE aux_nodes AS SELECT e.gml_id AS EdgeId, n.gml_id AS NodeId, 'start' AS role, ST_StartPoint(eg.from_gml_geometry) AS NodeGeom FROM dbtopofeaturemembers_topocurve_directededge_edge AS e JOIN dbtopofeaturemembers_directededge_edge_curveproperty AS eg ON (e.node_id = eg.parent_id) JOIN dbtopofeaturemembers_directededge_edge_directednode AS nr ON (e.node_id = nr.parent_id AND nr.orientation = '-' AND nr.xlink_href IS NULL) JOIN dbtopofeaturemembers_edge_directednode_node AS n ON (nr.node_id = n.parent_id);This first SQL query will create and feed an auxiliary table containing all start Nodes associated to their corresponding Edges via direct parent-child relations.
- the ST_StartPoint() will be used so to extract the first point of each Edge's Geometry (always of the Linestring type).
- the nr.orientation = '-' term is intended to select only the start Node for each Edge.
- the nr.xlink_href IS NULL term is intended to exclude any indirect reference based on gml_id / xlink_href relations.
INSERT INTO aux_nodes SELECT e.gml_id AS EdgeId, n.gml_id AS NodeId, 'start' AS role, ST_StartPoint(eg.from_gml_geometry) AS NodeGeom FROM dbtopofeaturemembers_topocurve_directededge_edge AS e JOIN dbtopofeaturemembers_directededge_edge_curveproperty AS eg ON (e.node_id = eg.parent_id) JOIN dbtopofeaturemembers_directededge_edge_directednode AS nr ON (e.node_id = nr.parent_id AND nr.orientation = '-' AND nr.xlink_href IS NOT NULL) JOIN dbtopofeaturemembers_edge_directednode_node AS n ON (nr.xlink_href = n.gml_id);This second SQL query will append into the same auxiliary table all start Nodes associated to their corresponding Edges via indirect gml_id / xlink_href relations.
- in this case too the ST_StartPoint() will be used so to extract the first point of each Edge's Geometry; exactly as the nr.orientation = '-' term is intended to select only the start Node for each Edge.
- but in this second query the nr.xlink_href IS NOT NULL term is intended to consider only indirect references based on gml_id / xlink_href relations.
INSERT INTO aux_nodes SELECT e.gml_id AS EdgeId, n.gml_id AS NodeId, 'end' AS role, ST_EndPoint(eg.from_gml_geometry) AS NodeGeom FROM dbtopofeaturemembers_topocurve_directededge_edge AS e JOIN dbtopofeaturemembers_directededge_edge_curveproperty AS eg ON (e.node_id = eg.parent_id) JOIN dbtopofeaturemembers_directededge_edge_directednode AS nr ON (e.node_id = nr.parent_id AND (nr.orientation <> '-' OR nr.orientation IS NULL) AND nr.xlink_href IS NULL) JOIN dbtopofeaturemembers_edge_directednode_node AS n ON (nr.node_id = n.parent_id);This third SQL query will continue in appending into the same auxiliary table all end Nodes associated to their corresponding Edges via direct parent-child relations.
- the ST_EndPoint() will be used so to extract the last point of each Edge's Geometry (always of the Linestring type).
- the (nr.orientation <> '-' OR nr.orientation IS NULL) term is intended to select only the end Node for each Edge.
Recall: orientation = '-' always identifies the start Node of an Edge; but the end Node could be identified by either an explicit orientation = '+' or by NULL (accordingly to default GML assumptions). - the nr.xlink_href IS NULL term is intended to exclude any indirect reference based on gml_id / xlink_href relations.
INSERT INTO aux_nodes SELECT e.gml_id AS EdgeId, n.gml_id AS NodeId, 'end' AS role, ST_EndPoint(eg.from_gml_geometry) AS NodeGeom FROM dbtopofeaturemembers_topocurve_directededge_edge AS e JOIN dbtopofeaturemembers_directededge_edge_curveproperty AS eg ON (e.node_id = eg.parent_id) JOIN dbtopofeaturemembers_directededge_edge_directednode AS nr ON (e.node_id = nr.parent_id AND (nr.orientation <> '-' OR nr.orientation IS NULL) AND nr.xlink_href IS NOT NULL) JOIN dbtopofeaturemembers_edge_directednode_node AS n ON (nr.xlink_href = n.gml_id);And this last SQL query will finally append into the same auxiliary table all end Nodes associated to their corresponding Edges via indirect gml_id / xlink_href relations.
All right, we've now extracted all start/end Nodes for all Edges; so we can now continue into our validation process.
validating the Edges/Nodes
SELECT e1.EdgeId, e1.NodeId AS StartNode, e1.NodeGeom AS StartNodeGeom, e2.NodeId AS EndNode, e2.NodeGeom AS EndNodeGeom FROM aux_nodes AS e1 LEFT JOIN aux_nodes AS e2 ON (e2.role = 'end' AND e1.EdgeId = e2.EdgeId) WHERE e1.role = 'start' ORDER BY e1.EdgeId;This first query is mainly intended for didactic purposes, and will return the full list of all Edges with the corresponding Start/EndNodes.
SELECT n1.NodeId, n1.EdgeId, n2.EdgeId FROM aux_nodes AS n1, aux_nodes AS n2 WHERE n1.NodeId = n2.NodeId AND n1.EdgeId <> n2.EdgeId AND ST_Equals(n1.NodeGeom, n2.NodeGeom) <> 1;This second query is certainly more interesting for validation purposes; it will return an empty resultset, so we can safely conclude that all Topology Nodes are in a fairly self-consistent state.
SELECT a.Cardinality AS Cardinality, Count(*) AS Frequency FROM ( SELECT NodeId, Count(*) AS Cardinality FROM aux_nodes GROUP BY NodeId) AS a GROUP BY a.Cardinality; ----------------- 2 3388 3 5451 4 600 5 15This last query is interesting as well, because it reports a quick cardinality analysis:
- many Nodes simply have a cardinality 2: this means that they simply connect a left to a right Edge.
- as we could easily expect, many more Nodes have a cardinality 3: this identifies an Y-shaped junction between three different Edges; similarly cardinality 4 identifies an X-shaped junction between four Edges. Higher cardinality Nodes are very scarce.
- It's surely worth noting that there aren't cardinality values less than 2, so we surely have non dangling Nodes or Edges: our Topology is consistent and fairly well-connected.
materializing virtual Face geometries
Recall: all Topology Faces are simply described by the corresponding delimiting Edges; there is no explicitly set Geometry, it's always a purely virtual Geometry.SELECT f.gml_id AS FaceId, ST_Polygonize(e.from_gml_geometry) AS FaceGeom FROM dbtopofeaturemembers_toposurface_directedface_face AS f JOIN dbtopofeaturemembers_directedface_face_directededge AS er ON (f.node_id = er.parent_id) JOIN dbtopofeaturemembers_topocurve_directededge_edge AS de ON (de.gml_id = er.xlink_href) JOIN dbtopofeaturemembers_directededge_edge_curveproperty AS e ON (de.node_id = e.parent_id) GROUP BY f.gml_id;Anyway we could eventually materialize all Face Geometries by just executing a rather trivial SQL query like the above one.
Quick conclusions
SQL and Spatial SQL are powerful and very flexible processing tools; and SQL scripting is a smart technique allowing to nicely automatize many boring repetitive tasks.After loading an XML document into a DBMS you'll be absolutely free to deploy all the incredible firepower of SQL (and SQL scripts) at its best.
This should not always be a plain and easy task: but it surely is a powerful instrumentation you could successfully deploy so to resolve many complex problems as e.g. thoroughly validating some not trivial and may be huge dataset.
back to main page