Building electronic books using ANT

It’s been a while now since I wrote about my first meeting with electronic literature. Since then a lot has happened in the e-book world and I’ve been bit by the bug. As many others I now read most of my books, newspapers and magazines using a Kindle, iPad or some of the other reader tablets. So I thought it’s only appropriate that I should use my knowledge to help creating such publications from a programmers perspective. Several tools are already available, but I could not find any that can be used as part of a continuos integration build.

The basic idea is described in Eclipse bug 332122. Wiki markup converted to HTML (using Mylyn WikiText) or pre-existing HTML files is processed by a ANT script and a EPUB file is created as a result. I initially thought it would be a good idea to use Xtext for creating a DSL that could be used to describe the publication. I’m not so sure about that any more, as it would create some extra overhead and it is strictly not required. Maybe at a later stage.

There are several e-book formats available, but I chose EPUB (version 2.0.1) as it is open, free and based on existing standards such as HTML and CSS. It also appears to be quite popular although it’s currently not supported on the Amazon Kindle which is currently the most popular reader tablet.

iBooks on iPad

So far I have Ecore models representing the Open Packaging Format (OPF), a subset of Dublin Core and the Navigation Control File (NCX). All required parts of a properly assembled EPUB file. In addition there is a helper type that is used for composing and creating the EPUB file without having to paying attention to all the details (and there are quite a few). There is still a lot of work remaining but it seems my approach is sound. I am easily able to create readable content as the illustration shows (screenshots from iBooks on iPad). The book I made for testing is based on the “Development Conventions and Guidelines” page on the Eclipse wiki.

If you have any ideas on how to make this a even more useful tool please feel free to add them to the mentioned bug report, or comment on this blog. I’ll keep you posted on the progress.

 

6 Comments

  1. If the goal is EPub and you are starting with WikiText, then I might suggest the following solution.

    1. Mylyn WikiText to Docbook.
    2. Docbook to ePub

    You can use the DocBook Projects XSL stylesheets to generate the necessary ePub files. Much simplier than re-inventing the wheel.

  2. Thank you for the tip David. Looks like a good idea that would probably work nicely if conversion is the only goal.

    My idea is to model the EPUB structure in Ecore and provide mechanisms for writing and reading this model to and from an EPUB file. This would work whether you already have XHTML files to use or files that must be converted first. It will also be possible to read in EPUB files for modification and write these back. So this code is basically an API for managing EPUB data.

    The ANT task however will have options for converting markup to HTML prior to assembling the EPUB file. This is of course using Mylyn WikiText to do the job.

  3. Looks interesting. Have you considered creating a Maven plugin for the use with Tycho?

    I'm aggressively trying to transfer all my eclipse projects from ANT to Tycho

  4. Thanks Alan.

    No, I have to admit that I've not. However, I have noticed that Tycho is becoming increasingly popular so checking it out is definitely on my to-do list. I guess adding a Maven plug-in would be a good start. Thank you for the tip.

  5. Something to think about regarding the Xtext angle.

    – Implementing a Xtext DSL in this context is only useful if it provides better (e.g., clearer/more direct error reporting) tooling support than the current chain.
    – You already have an Ecore: that's an excellent starting position.
    – Because the tokenizers that Xtext (or actually: ANTLR) produces are context-insensitive, it's typically a bit of a challenge to get unwrapped generic content (e.g., literal text) to play nice with markup content (like commands, headers and such). In the situations where I needed to get this working smoothly (an existing, SGML-like language), I've written my own tokenizer.

Leave a Reply