Internal XML-BLOB format

SpatiaLite internally stores XML Documents using the XML-BLOB format into ordinary SQLite's BLOB columns.

The main rationale suggesting to adopt a specialized format for XML Documents is the need to embed several useful informations within the BLOB itself.

The second good reason to use a purposely encoded format is needing to be sure that a generic BLOB value really corresponds to a valid XML-BLOB; remember that one of SQLite specific features is to offer a very weak column-type enforcement.
So SpatiaLite simply relies on ordinary SQLite general support to check that one column value contains a generic BLOB, and then implements on its own any further check if this may be considered as a valid XML-BLOB value. To do this, SpatiaLite inserts some special markers at predictable and strategic positions.

The following is the general internal XML BLOB format used by SpatiaLite:

Byte Offset Start Byte Offset End Content Notes
0 START [hex 00] an XML-BLOB encoded value must always start with a 0x00 byte
1 FLAGS
Values for FLAGS are constructed by a bitwise-inclusive OR of flags from the following list:
  • 0x01 [ENDIANNESS] if set any integer value into the XML-BLOB header and footer is expected to be in little endian order; otherwise big endian order is expected.
  • 0x02 [COMPRESSED] if set the XML Document is compressed using the DEFLATE algorithm.
  • 0x04 [VALIDATED] if set the XML Document has successfully passed the Schema Validation.
2 HEADER [hex AB] an XML-BLOB encoded value must always have an 0xAB byte in this position
3
6 SIZE a 32-bits integer value [little- big-endian ordered, accordingly to the ENDIANNESS flag]
size (in bytes) of the XML Document payload
7 10 COMPR_SZ a 32-bits integer value [little- big-endian ordered, accordingly to the ENDIANNESS flag]
size (in bytes) of the compressed XML Document payload
if the COMPRESSED flag is not set this value is expected to be identical to SIZE
11 12 URI_LEN a 16-bits integer value [little- big-endian ordered, accordingly to the ENDIANNESS flag]
length (in bytes) of the Schema URI used for Schema Validation
set to ZERO if no Schema URI is defined
13 SCHEMA [hex BA] an XML-BLOB encoded value must always have an 0xBA byte in this position
14 13 + URI_LEN SCHEMA_URI the Schema URI used for Schema Validation
empty (skipped) if no Schema URI is defined: in this case the SCHEMA and FILEID markers are expected to be immediately adjacent
14 + URI_LEN 15 + URI_LEN FILEID_LEN a 16-bits integer value [little- big-endian ordered, accordingly to the ENDIANNESS flag]
length (in bytes) of the fileIdentifier string optionally declared by ISO Metadata documents
set to ZERO if no fileIdentifier is defined
16 + URI_LEN FILEID [hex CA] an XML-BLOB encoded value must always have an 0xCA byte in this position
17 + URI_LEN 16 + URI_LEN + FILEID_LEN FILE_IDENTIFIER the fileIdentifier string optionally declared by ISO Metadata documents
empty (skipped) if no fileIdentifier is defined: in this case the FILEID and PARENTID markers are expected to be immediately adjacent
17 + URI_LEN + FILEID_LEN 18 + URI_LEN + FILEID_LEN PARENTID_LEN a 16-bits integer value [little- big-endian ordered, accordingly to the ENDIANNESS flag]
length (in bytes) of the parentIdentifier string optionally declared by ISO Metadata documents
set to ZERO if no parentIdentifier is defined
19 + URI_LEN + FILEID_LEN PARENTID [hex DA] an XML-BLOB encoded value must always have an 0xDA byte in this position
20 + URI_LEN + FILEID_LEN 19 + URI_LEN + FILEID_LEN + PARENTID_LEN PARENT_IDENTIFIER the parentIdentifier string optionally declared by ISO Metadata documents
empty (skipped) if no parentIdentifier is defined: in this case the PARENTID and TITLE markers are expected to be immediately adjacent
20 + URI_LEN + FILEID_LEN + PARENTID_LEN 21 + URI_LEN + FILEID_LEN + PARENTID_LEN TITLE_LEN a 16-bits integer value [little- big-endian ordered, accordingly to the ENDIANNESS flag]
length (in bytes) of the Title string optionally declared by ISO Metadata and SLD/SE Style documents
set to ZERO if no Tile is defined
22 + URI_LEN + FILEID_LEN + PARENTID_LEN TITLE [hex DB] an XML-BLOB encoded value must always have an 0xDB byte in this position
23 + URI_LEN + FILEID_LEN + PARENTID_LEN 22 + URI_LEN + FILEID_LEN + PARENTID_LEN + TITLE_LEN TITLE_TAG the Title string optionally declared by ISO Metadata and SLD/SE Style documents
empty (skipped) if no Title is defined: in this case the TITLE and ABSTRACT markers are expected to be immediately adjacent
23 + URI_LEN + FILEID_LEN + PARENTID_LEN + TITLE_LEN 24 + URI_LEN + FILEID_LEN + PARENTID_LEN + TITLE_LEN ABSTRACT_LEN a 16-bits integer value [little- big-endian ordered, accordingly to the ENDIANNESS flag]
length (in bytes) of the Abstract string optionally declared by ISO Metadata and SLD/SE Style documents
set to ZERO if no Abstract is defined
25 + URI_LEN + FILEID_LEN + PARENTID_LEN + TITLE_LEN ABSTRACT [hex DC] an XML-BLOB encoded value must always have an 0xDC byte in this position
26 + URI_LEN + FILEID_LEN + PARENTID_LEN + TITLE_LEN 25 + URI_LEN + FILEID_LEN + PARENTID_LEN + TITLE_LEN + ABSTRACT_LEN ABSTRACT_TAG the Abstract string optionally declared by ISO Metadata and SLD/SE Style documents
empty (skipped) if no Abstract is defined: in this case the ABSTRACT and GEOMETRY markers are expected to be immediately adjacent
26 + URI_LEN + FILEID_LEN + PARENTID_LEN + TITLE_LEN + ABSTRACT_LEN 27 + URI_LEN + FILEID_LEN + PARENTID_LEN + TITLE_LEN + ABSTRACT_LEN GEOMETRY_LEN a 16-bits integer value [little- big-endian ordered, accordingly to the ENDIANNESS flag]
length (in bytes) of the Geometry (long/lat MBR aka BBOX) optionally declared by ISO Metadata documents
set to ZERO if no Geometry is defined
28 + URI_LEN + FILEID_LEN + PARENTID_LEN + TITLE_LEN + ABSTRACT_LEN GEOMETY [hex DD] an XML-BLOB encoded value must always have an 0xDD byte in this position
29 + URI_LEN + FILEID_LEN + PARENTID_LEN + TITLE_LEN + ABSTRACT_LEN 28 + URI_LEN + FILEID_LEN + PARENTID_LEN + TITLE_LEN + ABSTRACT_LEN + GEOMETRY_LEN MBR the Geometry (long/lat MBR aka BBOX) optionally declared by ISO Metadata documents
empty (skipped) if no Geometry is defined: in this case the GEOMETRY and PAYLOAD markers are expected to be immediately adjacent
29 + URI_LEN + FILEID_LEN + PARENTID_LEN + TITLE_LEN + ABSTRACT_LEN + GEOMETRY_LEN PAYLOAD [hex CB] an XML-BLOB encoded value must always have an 0xCB byte in this position
30 + URI_LEN + FILEID_LEN + PARENTID_LEN + TITLE_LEN + ABSTRACT_LEN + GEOMETRY_LEN 29 + URI_LEN + FILEID_LEN + PARENTID_LEN + TITLE_LEN + ABSTRACT_LEN + GEOMETRY_LEN + COMPR_SZ XML_DOCUMENT the XML Document payload
30 + URI_LEN + FILEID_LEN + PARENTID_LEN + TITLE_LEN + ABSTRACT_LEN + GEOMETRY_LEN + COMPR_SZ CRC [hex BC] an XML-BLOB encoded value must always have an 0xBC byte in this position
31 + URI_LEN + FILEID_LEN + PARENTID_LEN + TITLE_LEN + ABSTRACT_LEN + GEOMETRY_LEN + COMPR_SZ 30 + URI_LEN + FILEID_LEN + PARENTID_LEN + TITLE_LEN + ABSTRACT_LEN + GEOMETRY_LEN + COMPR_SZ CHECKSUM a 32-bits integer value [little- big-endian ordered, accordingly to the ENDIANNESS flag]
the CRC32 checksum
the checksum has to be computed including any byte starting from the START marker and ending at the CRC marker (these two included)
31 + URI_LEN + FILEID_LEN + PARENTID_LEN + TITLE_LEN + ABSTRACT_LEN + GEOMETRY_LEN + COMPR_SZ END [hex DD] an XML-BLOB encoded value must always have an 0xDD byte in this position