back to XmlBlob main page
XmlBlob - Quick start
Starting since version 4.1.0 SpatiaLite supports a specialized format (XmlBlob) intended to store a generic XmlDocument into the DB as a BLOB value.The binary format specification supporting XmlBlob is publicly available.
Supporting XmlDocuments is enough interesting by itself; but its real utility become absolutely clear considering the growing role played by XML in the GeoSpatial context: metadata, SLD/SE styles and many other representations are fully based on XML.
In the following steps we'll get a first quick glance at XmlBlob using very simple practical examples.
TOC - Table Of Contents
- Creating an XmlBlob from an XmlDocument
- Extracting an XmlDocument from within an XmlBlob
- XmlBlob compression
- XML: well-formedness and Schema validation
- The internal XML Schema cache
- Other useful misc SQL functions supporting XmlBlob
1 - Creating an XmlBlob from an XmlDocument
SELECT Hex( '<a><b>alpha</b></a>' ); --- 3C613E3C623E616C7068613C2F623E3C2F613E SELECT XB_Create( x'3C613E3C623E616C7068613C2F623E3C2F613E' ); --- XmlBLOB-resultThe XB_Create() SQL function in its simplest forms accepts a simple argument corresponding to a BLOB (XmlDocument payload).
The returned value (if successful) corresponds to a BLOB value (namely: an XmlBlob value).
Please note: the input XmlDocument is expected to be passed as a BLOB, not as Text string; and there is a very good reason accounting for this.
An XML Document can be internally encoded using several different charset encodings (UTF-8, UTF-16, ASCII ...): many of such encodings cannot be safely handled as Text strings, but surely all them can always be handled as a binary BLOB value.
The Hex() function will simply return the hexadecimal representation corresponding to a Text string, and the x'08..AF' notation can then be used to define a BLOB-constant.
A simpler way to pass an XML sample as an argument is by directly invoking the CastToBlob() SQL function; a third way to perform the same identical task is by using the SQL CAST operator.
SELECT XB_Create( CastToBlob( '<a><b>alpha</a></b>' ) ); --- NULL SELECT XB_Create( CAST( '<a><b>alpha</a></b>' AS BLOB ) ); --- NULLIf the input BLOB value passed to XB_Create() doesn't contains any valid XmlDocument, a NULL value will be returned.
SELECT XB_GetLastParseError(); --- noname.xml:1: parser error : Opening and ending tag mismatch: b line 1 and a <a><b>alpha</a></b> ^ noname.xml:1: parser error : Opening and ending tag mismatch: a line 1 and b <a><b>alpha</a></b> ^You can then call XB_GetLastParseError() in order to get a meaningful message explaining why the most recent call to XB_Create() failed.
2 - Extracting an XmlDocument from within an XmlBlob
SELECT XB_IsValid( x'0003AB130000001B0000000000BA0000CA0000DA0000DB0000DC0000DDCB789CB349B4B349B24BCC29C848B4D14FB2B3D14FB403003BC705D3BC2A1B7A02DD' ); --- 1The XB_IsValid() SQL function checks if a generic BLOB actually corresponds to some valid XmlBlob value.
SELECT XB_GetDocumentSize( x'0003AB130000001B0000000000BA0000CA0000DA0000DB0000DC0000DDCB789CB349B4B349B24BCC29C848B4D14FB2B3D14FB403003BC705D3BC2A1B7A02DD' ); --- 19And XB_GetDocumentSize() will return the corresponding XmlDocument length (measured in bytes).
SELECT XB_GetPayload( x'0003AB130000001B0000000000BA0000CA0000DA0000DB0000DC0000DDCB789CB349B4B349B24BCC29C848B4D14FB2B3D14FB403003BC705D3BC2A1B7A02DD' ); --- BLOB-valueYou can then call XB_GetPayload() in order to extract the XmlDocument from the XmlBlob, exactly as it was initially loaded from a previous call to XB_Create() (i.e., as a BLOB value fully preserving the original character encoding).
SELECT XB_GetDocument( x'0003AB130000001B0000000000BA0000CA0000DA0000DB0000DC0000DDCB789CB349B4B349B24BCC29C848B4D14FB2B3D14FB403003BC705D3BC2A1B7A02DD' ); --- <a><b>alpha</b></a>A similar SQL function is supported, i.e. XB_GetDocument(); the main difference is in that this second one will always return a Text string (UTF-8 encoded).
SELECT XB_GetDocument( x'0003AB130000001B0000000000BA0000CA0000DA0000DB0000DC0000DDCB789CB349B4B349B24BCC29C848B4D14FB2B3D14FB403003BC705D3BC2A1B7A02DD' , 2 ); --- <?xml version="1.0"?> <a> <b>c</b> </a> SELECT XB_GetDocument( x'0003AB130000001B0000000000BA0000CA0000DA0000DB0000DC0000DDCB789CB349B4B349B24BCC29C848B4D14FB2B3D14FB403003BC705D3BC2A1B7A02DD' , 8 ); --- <?xml version="1.0"?> <a> <b>c</b> </a> SELECT XB_GetDocument( x'0003AB130000001B0000000000BA0000CA0000DA0000DB0000DC0000DDCB789CB349B4B349B24BCC29C848B4D14FB2B3D14FB403003BC705D3BC2A1B7A02DD' , 0 ); --- <?xml version="1.0"?><a><b>c</b></a>You can eventually pass a second argument to XB_GetDocument(), this corresponding to indentation.
In this case a nicely formatted and properly indented XmlDocument will be returned. Any indent factor ranging from 1 to 8 is actually supported; any other bigger number will silently be understood as 8.
Any negative value will simply return the original XML Document exactly as it was initially stored; 0 will return the whole XML Document as a single line.
The SQL function XB_GetPayload() as well supports a similar indentation optional argument.
3 - XmlBlob compression
XmlDocuments notoriously tend to be verbose: storing an XmlBlob in a compressed form usually saves half of the required storage (and possibly even more).SELECT XB_IsCompressed( x'0001AB13000000130000000000BA0000CA0000DA0000DB0000DC0000DDCB3C613E3C623E616C7068613C2F623E3C2F613EBC9AB71406DD' ); --- 0The XmlBlob we've used in the previous example is uncompressed, as reported by XB_IsCompressed()
SELECT Hex( XB_Compress( x'0001AB13000000130000000000BA0000CA0000DA0000DB0000DC0000DDCB3C613E3C623E616C7068613C2F623E3C2F613EBC9AB71406DD' ) ); --- 0003AB130000001B0000000000BA0000CA0000DA0000DB0000DC0000DDCB789CB349B4B349B24BCC29C848B4D14FB2B3D14FB403003BC705D3BC2A1B7A02DDThe XB_Compress() SQL function will return a new compressed XmlBlob exactly corresponding to the input one.
If the input XmlBlob is already compressed this one is a harmless no-op.
SELECT XB_IsCompressed( x'0003AB130000001B0000000000BA0000CA0000DA0000DB0000DC0000DDCB789CB349B4B349B24BCC29C848B4D14FB2B3D14FB403003BC705D3BC2A1B7A02DD' ); --- 1As reported by XB_IsCompressed() this second XmlBlob actually is of the compressed kind; anyway it's indisputably bigger than the uncompressed one !!!
Don't be worried: this paradoxical effect can be easily explained.
The corresponding XmlDocument is very short-sized (unrealistically short-sized): under these very peculiar circumstances there is not enough entropy supporting an effective compression, and the net result will have a negative impact (inflation instead of the expected deflation).
But under more realistic conditions (i.e. using an XmlDocument of some KB or more) you can reasonably expect an efficient compression ratio of 50% or even more.
SELECT Hex( XB_Uncompress( x'0003AB130000001B0000000000BA0000CA0000DA0000DB0000DC0000DDCB789CB349B4B349B24BCC29C848B4D14FB2B3D14FB403003BC705D3BC2A1B7A02DD' ) ); --- 0001AB13000000130000000000BA0000CA0000DA0000DB0000DC0000DDCB3C613E3C623E616C7068613C2F623E3C2F613EBC9AB71406DDThe XB_Uncompress() SQL function will return a new uncompressed XmlBlob exactly corresponding to the input one.
If the input XmlBlob already is uncompressed this one is a harmless no-op.
4 - XML: well-formedness and Schema validation
it's now time passing to explore some more serious XML example.SELECT Length( XB_Create( CastToBlob( '<?xml version="1.0" encoding="utf-8"?> <Books xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="books.xsd"> <author firstName="Nathaniel" lastName="Hawthorne"> <title>The Scarlet Letter</title> </author> <author firstName="William" middleName="Makepeace" lastName="Thackeray"> <title>Vanity Fair</title> <title>The Luck of Barry Lyndon</title> </author> </Books>' ), 1 ) ); --- 287XB_Create() accepts an optional second argument corresponding to compression=TRUE | FALSE
SELECT Length( XB_Create( CastToBlob( '<?xml version="1.0" encoding="utf-8"?> <Books xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="books.xsd"> <author firstName="Nathaniel" lastName="Hawthorne"> <title>The Scarlet Letter</title> </author> <author firstName="William" middleName="Makepeace" lastName="Thackeray"> <title>Vanity Fair</title> <title>The Luck of Barry Lyndon</title> </author> </Books>' ), 0 ) ); --- 455As you can easily notice, this time the compression effectively works as expected, because the document size is a reasonable one.
In both cases XB_Create() was able to create a valid XmlBlob because this XML sample formally respects the well-formedness rules, thus being successfully parsed. So it indisputably is a valid XMLDocument.
Anyway this first check simply is formal, and nothing states about the actual semantic correctness of this XmlDocument.
In order to get a more comprehensive and thorough validity check (i.e. XML Validation) we are required to define an appropriate XML Schema, as the following one:
<?xml version="1.0" encoding="utf-8"?> <xs:schema elementFormDefault="qualified" version="0.1" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="Books"> <xs:complexType> <xs:sequence> <xs:element name="author" minOccurs="1" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="title" type="xs:string" minOccurs="1" maxOccurs="unbounded" /> </xs:sequence> <xs:attribute name="firstName" type="xs:string" use="required" /> <xs:attribute name="middleName" type="xs:string" use="optional" /> <xs:attribute name="lastName" type="xs:string" use="required" /> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>An XML Schema simply is yet another XmlDocument, representing the expected data layout of the target XMLDocument to be validated.
In this specific example:
- a root-node <Books> is expected to be declared.
- <Books> is expected to contain a list of <author> items of arbitrary length, but at least one item is required (... minOccurs="1" maxOccurs="unbounded" ...).
Each <author> should declare two mandatory attributes firstName and lastName (... use="required" ...); a third facultative attribute middleName could be eventually used whenever appropriate. - <author> is expected to contain a list of <title> items of arbitrary length (at least one is required).
- All values are requested to be of the Text/String data-type (... type="xs:string" ...).
Now you simply have to cut&paste the above XML Schema, saving it into some file on your local filesystem (usually XmlSchema documents are identified by the .xsd file-name extension).
SELECT XB_Create ( CastToBlob ( '<?xml version="1.0" encoding="utf-8"?> <Books xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="books.xsd"> <author firstName="Nathaniel" lastName="Hawthorne"> <title>The Scarlet Letter</title> </author> <author firstName="William" middleName="Makepeace" lastName="Thackeray"> <title>Vanity Fair</title> <title>The Luck of Barry Lyndon</title> </author> </Books>' ), 1, 'C:/sample.xsd' ) ); --- 1XB_Create() accepts a third facultative argument corresponding to the Schema URI; in this specific example the XmlSchema is expected to be found on a file into the local Windows filesystem (C:/sample.xsd).
When a SchemaURI is explicitly specified XB_Create() will perform a full XML Validation against the XmlSchema.
The XB_IsSchemaValidated() SQL function can then be used in order to detect if some XmlBlob has successfully passed a full XML Validation.
SELECT XB_Create ( CastToBlob ( '<?xml version="1.0" encoding="utf-8"?> <Books xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="books.xsd"> <author firstName="Nathaniel" lastName="Hawthorne"> <novel>The Scarlet Letter</novel> </author> <author firstName="William" middleName="Makepeace" lastName="Thackeray"> <title lang="en">Vanity Fair</title> <book>The Luck of Barry Lyndon</book> </author> </Books>' ), 1, 'C:/sample.xsd' ); --- NULLWhen a SchemaURI is explicitly defined and the XML Validation fails (for any reason) the XmlDocument will be assumed to be invalid, and a NULL value will be returned by XB_Create()
SELECT XB_GetLastValidateError(); --- Element 'novel': This element is not expected. Expected is ( title ). Element 'title', attribute 'lang': The attribute 'lang' is not allowed. Element 'book': This element is not expected. Expected is ( title ).You can then call XB_GetLastValidateError() in order to get a meaningful message explaining why the most recent call to XB_Create() failed.
5 - The internal XML Schema cache
Performing XML validation is a reasonably fast operation; anyway downloading from the web some very complex XML Schema definition sometimes could require a noticeable time length. (and this usually is the case of e.g. Metadata Schemata)SpatiaLite silently handles an internal cache of XML Schemata; once a Schema is successfully downloaded from the web it's immediately stored into this cache.
So any following reference to the same identical Schema URI will be serviced using the cached Schema, thus avoiding any further network access and allowing a noticeably faster processing.
SELECT XB_CacheFlush();();The XB_CacheFlush() SQL function immediately resets such internal cache to its initial empty state.
Please note: the internal cache is strictly connection-related; each single connection has its own internal cache.
6 - Other useful misc SQL functions supporting XmlBlob
SELECT XB_GetSchemaURI( XmlBlob-value ); --- C:/sample.xsdThe XB_GetSchemaURI return the SchemaURI initially used to perform XML Validation (if any).
SELECT XB_GetInternalSchemaURI( CastToBlob ( '<?xml version="1.0" encoding="utf-8"?> <Books xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.gaia-gis.it/books.xsd"> <author firstName="Nathaniel" lastName="Hawthorne"> <title>The Scarlet Letter</title> </author> </Books>' ) ); --- http://www.gaia-gis.it/books.xsdPlease don't be confused: the XB_GetInternalSchemaURI() SQL function performs a completely different task:
- it accepts on input an XmlDocument, not an XmlBlob.
- the returned SchemaURI (if any) is the one internally declared within the XmlDocument itself.
Please check the XML declarations xsi:noNamespaceSchemaLocation and/or xsi:schemaLocation
SELECT XB_GetEncoding( XmlBlob-value ); --- utf-8The XB_GetEncoding() SQL function will return the character encoding used by the XmlDocument stored within the XmlBlob.
Please note: NULL could be eventually returned, because some XmlDocuments has no explicit encoding declaration. In this case UTF-8 will be implicitly assumed as a silent default.
We've already encountered the XB_GetPayload() and XB_GetDocument() SQL functions. They are very closely related, but behaves quite differently under an important aspect:
- the XmlDocument returned by XB_GetDocument() will always be UTF-8 encoded.
- the XmlDocument returned by XB_GetPayload() will always be encoded into its original unaffected encoding.
This has many interesting further consequences:
- an XmlDocument returned by XB_GetDocument() surely is a valid TEXT string: you can safely process it in ant possible way using using any SpatiaLite tool (UTF-8 being the default encoding adopted by SpatiaLite).
- an XmlDocument returned by XB_GetPayload() is assumed to be a BLOB value; attempting to directly visualize or print such BLOB could be very difficult if it was e.g. UTF-16 encoded.
- anyway while re-exporting yet again the original XmlDocument from its corresponding XmlBlob representation strictly requires using XB_GetPayload(), so to carefully avoid introducing any possible discrepancy with the encoding internally declared by the XmlDocument itself.
SELECT XB_SchemaValidate( XmlBlob-value, SchemaURI ); --- a possibly SchemaValidated XmlBlobWe've already seen as XB_Create() directly supports XML Validation.
Anyway a two-steps approach is possible as well, by calling XB_Create() first (no Schema Validation), and then calling XB_SchemaValidate() in a second time so to perform a deferred XML Validation.
back to XmlBlob main page