readosm
1.1.0
|
There are two distinct formats used to ship OSM datasets: both contains the exact same information, but the internal layout is radically different.
OSM files based on the XML notation are widely used: usually they are identified by the .osm suffix.
XML is notoriously verbose and usually requires lots of storage space; happily enough, XML it's strongly compressible.
Accordingly to this consideration, the most commonly found OSM files are identified by the .osm.bz2 suffix: this practically means that the .osm (XML) file has been compressed using bzip2. In order to actually process a .osm.bz2 OSM file a two-steps approach is always required:
An alternative OSM file format is based on the Google's Protocol Buffer encoding [https://developers.google.com/protocol-buffers/docs/encoding]
This OSM format is based on a public and documented specification: [http://wiki.openstreetmap.org/wiki/PBF_Format]
OSM files based on Protocol Buffer encoding are usually identified by the .pbf suffix.
The main benefit coming from using .pbf files is in that they are much more compact (smaller size) than the corresponding .osm.bz2; and they can be immediately parsed, no preliminary decompression step being required at all.
The intended scope of ReadOSM is to allow transparent parsing of both OSM formats indifferently. There is no need to take care of any internal low-level aspect, because the library itself silently handles any required step. The simple and easy abstract interface implemented by ReadOSM is exactly intended so to allow many reader-apps to consume OSM-input files in the most painless way; and all this requires only a very limited memory footprint.