EPUB Validation Made Easy

Having had the opportunity in the last year to help write and edit the EPUB 3 specification, I can commiserate with people who find the process of putting a publication together a little daunting. The fundamentals aren’t all that complicated, but when you have a specification that’s made up of four distinct documents (Content Documents, Publications, Media Overlays and the Open Container Format), it can be a challenge to pull all the basic requirements together and verify whether you’ve met them all.

Fortunately, EPUB is built on an open standards-based foundation, and each of these standards has programmatically verifiable rules and requirements. That’s a lot of jargon for saying that there are tools that can help make sure that your coding is correct and that your book will open and work in reading systems. These tools are called validators. Not surprisingly, the IDPF maintains a free one called EpubCheck for exactly this purpose.

If you’ve distributed EPUB 2 publications through the likes of Apple’s iBookstore, you’re undoubtedly already familiar with this tool, as ensuring your publication is compliant is one of the first tests in determining whether your ebook will be accepted or rejected. It can’t tell you what your ebook will look like on any system, but it will help ensure that rendering issues aren’t the result of a lack of compliance to the standards.

The tool hasn’t always been the simplest for lay people to use, however; it has a bit of a reputation of being built by geeks for geeks. If that’s been your perception, don’t fret because help is on the way! The IDPF recently rolled out a beta Web-based version as part of expanding the tool to support EPUB 3. The incredibly simple interface has only a single file selection field:

EPUB Validator form

All you have to do is select your EPUB, upload it and wait for your results to come back; the validator can determine which version you’ve uploaded automatically from the package document.

I stripped out some font files from a test document available from the EPUB 3 working group downloads page and here’s a screenshot of the errors it reported:

Validator results showing errors

No more reading command line output!

The errors are easily identified, and in this case the messages tell me exactly what I’ve done wrong (it indicates the files are missing, but the other interpretation is that I left the font entries in the manifest after taking the font files out).

EPUB validator error markup

If the problem were more specific to an individual file, the file name, line and position columns would provide that detail. In other words, it can even catch specific coding errors and typos. Here’s a screenshot of the result I get after introducing a markup error to show these fields in action:

mark up error

The error message now tells me that I have a p tag appearing where it’s not allowed in chapter_001.xhtml. When I look at my file, I can see that I forgot to close the h1 heading right before this element:

Running the tool is now the easy part of validation; learning how to interpret the messages that come out of it can still be a bit of a challenge, though.

Work is ongoing to try and improve the readability, however; and there is a page on the epubcheck site that lists some common errors (but it’s getting a bit out of date now). The new IDPF forums are also now up and running, which is another useful resource for finding answers when you do get stumped. A forum dedicated to epubcheck is also anticipated.

And the downloadable Java version is still available from the EpubCheck site. After you get used to validation, you’ll probably find it faster and simpler not to go through the upload process every time. And if you’re making EPUBs over 10MB in size, you’re not going to have any other choice. If the thought of getting up and running seems daunting, have a look at this blog post from ThreePress. It’ll run you through the install process for both Windows and Mac, as well as explain how to open a command line and run the tool.

And that’s EpubCheck in a nutshell. Happy validating!

Matt Garrish is the author of What Is EPUB 3? (O’Reilly). He will be speaking about EPUB 3 and Accessibility at Technology Forum 2012 in March.