4

Kindle, EPUB, And Amazon’s Love Of Reinventing Wheels

 2 years ago
source link: https://hackaday.com/2022/05/17/kindle-epub-and-amazons-love-of-reinventing-wheels/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Kindle, EPUB, And Amazon’s Love Of Reinventing Wheels

epaper_feat3.jpg?w=800

Last last month, a post from the relatively obscure Good e-Reader claimed that Amazon would finally allow the Kindle to read EPUB files. The story was picked up by all the major tech sites, and for a time, there was much rejoicing. After all, it was a feature that owners have been asking for since the Kindle was first released in 2007. But rather than supporting the open eBook format, Amazon had always insisted in coming up with their own proprietary formats to use on their readers. Accordingly, many users have turned to third party programs which can reliably convert their personal libraries over to whatever Amazon format their particular Kindle is most compatible with.

Native support for EPUB would make using the Kindle a lot less of a hassle for many folks, but alas, it was not to be. It wasn’t long before the original post was updated to clarify that Amazon had simply added support for EPUB to their Send to Kindle service. Granted this is still an improvement, as it represents a relatively low-effort way to get the open format files on your personal device; but in sending the files through the service they would be converted to Amazon’s KF8/AZW3 format, the result of which may not always be what you expected. At the same time the Send to Kindle documentation noted that support for AZW and MOBI files would be removed later on this year, as the older formats weren’t compatible with all the features of the latest Kindle models.

If you think this is a lot of unnecessary confusion just to get plain-text files to display on the world’s most popular ereader, you aren’t alone. Users shouldn’t have to wade through an alphabet soup of oddball file formats when there’s already an accepted industry standard in EPUB. But given that it’s the reality when using one of Amazon’s readers, this seems a good a time as any for a brief rundown of the different ebook formats, and a look at how we got into this mess in the first place.

The history of the EPUB format can be tracked back to 1999, with the version 1.0 release of the Open eBook Publication Structure (OEBPS). Used by some of the very first dedicated electronic readers from the likes of Sony and Intel, it essentially consisted of a manifested ZIP archive that contained pages written in a form of XHTML, with CSS used for styling. OEBPS went through several revisions over the years, and in 2007 it became the official technical standard of the International Digital Publishing Forum (IDPF). At that point it was renamed to EPUB, short for Electronic Publication.

epub_logo.png?w=400

EPUB continued to evolve over the years, and in 2016 the IDPF merged with World Wide Web Consortium (W3C) in an attempt to bring the publishing industry inline with the latest in web development. The current version of the EPUB format (3.2) was released in May of 2019, and offers features such as the ability for Internet-connected devices to load fonts and other content from outside the container file itself.

While the 3.x branch has introduced some fairly large changes in the core format to better handle multimedia content, EPUB can still ultimately still be thought of as a relatively simple web page contained in a ZIP file. As they are exceptionally easy to parse and render, you can find EPUB reader applications on even very low-end devices.

It’s also worth noting that, while the EPUB format does allow for Digital Rights Management (DRM), it is not part of the standard. That means if a vendor wants to implement DRM in EPUB, they have to figure out how to do it themselves. In theory this could lead to incompatibility issues between vendor-specific solutions, but in practice, most people who are using EPUBs are doing so specifically because they are DRM-free.

MOBI/AZW

Even older than EPUB, MOBI has its origins in the PalmDOC format from 1996. Originally conceived as a way of storing large text files on the Palm Pilot, the format offered little in the way of formatting outside the ability to mark the start and end points of paragraphs. It did however offer basic bookmarking capability, which in some cases was used to offer a rudimentary table of contents. Being that PalmDOC was a variation of the standard “Palm Database” file, it also featured the ability to store various bits of metadata in a standardized header, such as the author name, book title, and current reading position.

ebookformats_mobipalm.png?w=452

MobiPocket Reader on Palm OS

While suitable enough for the low-resolution displays of the early Palm Pilots, the lack of any real formatting support in PalmDOC became a liability as the hardware improved. In 2000 MobiPocket, developers of ebook reader applications on Palm, Symbian, and later BlackBerry devices, decided to take matters into their own hands and expand PalmDOC. They added an HTML-like markdown language, improved support for images, and as it was an open format, even borrowed a bit from OEBPS. Since they didn’t have the authority to call it an update to the original PalmDOC, they dubbed their creation MOBI.

The story might have stopped here if it wasn’t for the fact that in 2005, Amazon purchased MobiPocket, and in turn the rights to MOBI. But rather than use the format as-is for the Kindle, they added a new DRM scheme and cranked the format’s LZ77 compression to the maximum. As the first-gen Kindle only offered a relatively meager 250 MB of onboard storage and was limited to downloading new titles over a 3G cellular connection, they wanted to shave off as many bytes as possible.

This tweaked version of MOBI, which became the standard format for Amazon’s ebook empire, was dubbed AZW. From here on out Amazon essentially starts using AZW as a blanket term for their ebook containers, and the actual formats underneath start getting a bit blurry. In the early days, it was possible to come across other similarly named file types:

Known officially as Topaz, this proprietary Amazon format has little relation to MOBI/AZW beyond a shared DRM scheme and similar metadata header. In addition to supporting larger images compared to the earlier formats, it was unique in that each title could include its own fonts and glyphs rather than relying on what was built into the Kindle itself. This made it well suited for old books or non-English works, as it could better retain the original text and style.

This actually isn’t an ebook format at all, so don’t be surprised if you’ve never ran across one. Rather, this is a container file for executable Kindle applications and games.

KF8/AZW3

With the release of the first Kindle Fire tablet in 2011, Amazon needed a new format that could handle multimedia content. The answer was KF8, which is essentially a combination of EPUB and MOBI. In fact, it specifically picks up some of the EPUB 3.x features such as support for HTML5 and CSS3. New support for both fixed-layout pages and SVG images makes this format well suited for comic books, which was a big selling point for the large color display of the Kindle Fire.

Rather than maintaining two different file formats, Amazon decided to move all of their readers over to AZW3 and make it the new standard for the marketplace. While the electronic paper Kindles may not necessarily benefit from the features offered by the new format, all of them beyond the first and second generation are able to read them thanks to redundant MOBI header information which is kept specifically for backwards compatibility.

KFX/AZW8

With the release of the Kindle Paperwhite 3 in 2015, Amazon rolled out their latest format, KFX. Technical information about KFX is a bit hard to come by, as it appears Amazon developed it in-house to be their “ultimate” book format. Some of the new improvements include an enhanced typesetting engine, additional fonts, and support for JPEG XR images. It also rolls in support for video and interactivity, theoretically allowing the same format to be used for both books and software applications.

But perhaps the most obvious change was the enhanced DRM, which has caused plenty of headaches for users who wish to read Amazon purchased ebooks on other devices. At this point the format and DRM is understood well enough that it can be handled by third-party software, but it takes additional steps and intermediary tools that aren’t required for AZW3 content.

It’s generally recommended that anyone who wishes to maintain their own local library of ebook files should avoid this format altogether — though as more and more of Amazon’s library switches over, that may mean you need to purchase your books elsewhere.

Alexandria On Your Hard Drive

If all you ever do is read Amazon-purchased books on your Kindle, then you’ve probably never had to worry about any of this. To their credit, Amazon has largely perfected the experience of buying and consuming electronic books — there is, after all, a reason the Kindle has become the defacto ereader. All this technical shuffling about is hidden from view, and for the most part, you just tap the book you want to read and get on with your life.

calibre_logo.png?w=400

But for those of us who want to source their books from multiple marketplaces, keep an offline copy of their purchased books, or read their Amazon books on a non-Amazon reader, things can get a bit messy. The best advice I can give you, if you’ve managed to get this far without hearing it already, is to grab a copy of Kovid Goyal’s phenomenal Calibre.

This cross-platform GPLv3 program lets you build a format-agnostic virtual library that lives on your local computer, and seamlessly performs device-specific file conversion when uploading to your reader. It might not be quite as easy as spending your days in Amazon’s walled garden, but for users who demand a bit more control over their digital content, it’s a price worth paying.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK