Post by Generic Usenet AccountHello,
Are there are tools/W3C standards/design patterns etc. for linearizing
XML content? Basically I want to send information, which is natively
in XML, to a resource constrained device that does not have XML
awareness. In other words, the resource constrained device does not
do any DOM or SAX processing of XML.
depends on what exactly you are wanting...
if a library:
one option is to use (or write) an XML library, but depending on memory
resources, this may be too memory-hungry (for example, a lot of XML as
DOM nodes will eat up a large chunk of memory even on desktop PCs).
if one adjusts the implementation to their needs, they can do a DOM-like
implementation which needs a lot less memory than standard DOM (if one
omits namespaces and doubly-linked structures, and uses ASCII or UTF-8
rather than UTF-16, a fair bit can be saved).
SAX could be better, as it can allow a small implementation which does
not require in-memory storage.
if a binary interchange:
well, WBXML could work.
http://en.wikipedia.org/wiki/WBXML
there is EXI, but EXI looks likely to require a more complex
implementation (but is entropy/huffman coded so could save some bytes).
http://en.wikipedia.org/wiki/Efficient_XML_Interchange
also maybe relevant:
http://msdn.microsoft.com/en-us/library/cc219210%28PROT.10%29.aspx
for my uses, I rolled my own format (which I call SBXE) which is
structurally vaguely similar to WBXML, but in general is more compact in
my tests (for generic/schema-free operation, which is my main use-case),
and is simpler and faster to decode than textual XML. its main
difference from WBXML is that tags/strings are defined inline and go
into MRU lists, and when in the list is referenced by its MRU index (a
variant of "move to front" was used).
it also responds favorably to deflate.
some info (if server stays up...):
http://cr88192.dyndns.org/2010-10-27_SBXE11.txt
it was first defined/implemented around 2005, but I forgot about it for
several years due to not having much use for it at the time.
I designed a new variant which could be (potentially) more compact, but
the improvement was likely modest and not worth the hassle of having to
re-implement it.
looking, there are a few holes in the spec...
the UVLI (unsigned variable-length integer) scheme is like this:
0..127: 0xxxxxxx
127..16383: 10xxxxxx xxxxxxxx
16384.. ...: 110xxxxx xxxxxxxx xxxxxxxx
...
note: high-bits/bytes come first.
with sign folding (for VLI) being into the LSB, so:
0, -1, 1, -2, 2, -3, 3, ...