Thursday, December 13th, 2007

Kindle, Mobipocket and file format standards

372 words

In wondered what the details of the Kindle’s main file format, AZW, was. It’s well know that it’s based on the common .mobi format, which has been used by MobiPocket for eBooks for some time. But where is that format documented? Nowhere public, it seems.

There are enough interested parties to be working it out, though, and the news is pretty bad, according to this forum post.

It’s a PRC — an archive format for the Palm Pilot, that’s not officially documented — but reverse engineered specifications are floating around the Net.

Inside that, is HTML. It doesn’t seem clear what version of HTML, but it’s old - and it’s a fair bet that many Mobi readers, including the one in the Kindle would not parse modern HTML correctly. Reports seem to indicate that it doesn’t do a terrific job with Unicode — and that’s borne out with the Russian MobiPocket books I’ve tried and failed to view in Windows.

It seems a terrible shame for the Kindle to be hamstrung by a messy file format, less designed than cobbled together by accumulated accidents of history. I nearly used the word ‘evolved’, but that would imply a drift towards fitness.

Pragmatically speaking, it doesn’t matter one jot — if you want to read ASCII text on a Kindle, you can. If you want to generate .mobi files, there are various ways to do it (using a gratis downloadable Windows binary; using the Amazon email system). But dammit, it’s ugly.

There are better eBook formats emerging — for example ePub, which is little more than HTML, images, some meta-information, all wrapped up in a ZIP archive. I suppose the timing wasn’t quite right, and Amazon had to go with something established, with plenty of content already available. If you have the rights and the knowledge of the formats, it should be simple to convert between formats like these. Specifically, tidying old HTML into modern XHTML is a solved problem.

So let’s hope the Kindle can receive firmware upgrades, and let’s hope they migrate the service to a variant of ePub when it becomes possible. They can bung some DRM in there if they like — let the market stamp out that particular evil. Just do it using good open standards.

2 Responses to “Kindle, Mobipocket and file format standards”

  1. David Hayes Says:

    Seems the DRM has already been hacked
    http://gizmodo.com/gadgets/hacks/kindle-drm-hacked-that-was-easy-333415.php
    You have to wonder why companies bother with DRM.

  2. John Says:

    Sort of. All he’s done is notice that Mobipocket DRM is the same as Amazon DRM, and so demonstrated how you can persuade Mobi vendors to encrypt files with your Kindle’s credentials. It just allows you to buy DRM’d content from more sources.

    For the DRM to be properly cracked, one would need to be able to buy DRM content, and extract it so that it could be read on any device. I’m sure that will happen sooner or later.

Leave a Reply



Spam Karma 2 has sent 63614 comments to hell and 182 comments to purgatory. The total spam karma of this blog is -33428. What's your karma?