Back to main SpatiaLite-Tools Wiki page
spatialite_xml2utf8
All spatialite-tools importing data from XML files are based on the very popular expat XML parser. As it emerged by practical experience, this parser isn't directly able to correctly handle XML documents adopting any character set different from UTF-8. When attempting to import any non-utf8 encoded XML file "invalid character" errors could eventually be reported. The new spatialite_xml2utf8 CLI tool is specifically intended to circumvent such a limitation in the easiest way.a practical example
Supposing some XML document starting with a first line like:<?xml version="1.0" encoding="windows-1252" ?> ...or
<?xml version="1.0" encoding="ISO-8859-1" ?> ...Such an XML file clearly adopts some character set different from UTF-8, and could easily cause the parser to complain about "invalid character" errors:
- windows-1252 (aka CP1252) corresponds to Windows Latin-1
- ISO-8859-1 corresponds to Latin-1 West European
Resolving this issue is basically simple; you just have to execute spatialite_xml2utf8 so to get a copy of the XML file correctly encoded as UTF-8:
spatialite_xml2utf8 CP1252 <old.xml >new.xml
- there is just a single argument to be passed, specifying the input character set.
You could check the full list of all supported encodings with their canonical names from here - the original XML file is read from the standard input.
- the corresponding UTF-8 XML file will be written on the standard output
Back to main SpatiaLite-Tools Wiki page