ACT in 3 acts: An overview of EPUB accessibility conformance testing

Photo of Marisa DeMeglio.

Marisa DeMeglio is a software developer for the DAISY Consortium. Her background is in Computer Science and she has been working in the field of accessibility and ebooks for the past 18 years. She leads the epubtest.org website development, is a core member of the Ace, by DAISY project, and participates in the W3C Web Publications working group, particularly in regards to accessible synchronized media. Outside of work, she volunteers with Rock n’ Roll Camp for Girls Los Angeles as an instructor and member of their Executive Team. She will be at ebookcraft with Romain Deltour leading a workshop called Be an Ally to a11y: An Accessibility Workshop with DAISY Consortium.

So, you wanna make an accessible EPUB? Let Marisa DeMeglio walk you through the steps to verify that it's accessible.
CLICK TO TWEET

Hello! So, you wanna make an accessible EPUB — that’s awesome! Not only are you setting a great example by thoughtfully formatting the information you’re putting out into the world, you’re also reaching more markets and perhaps even complying with laws in your region.

This post intends to walk you through the steps to verify that an EPUB publication is accessible. (It won't really cover how you create it in the first place.) I'll spend some time with tools that help the process go faster, and talk about how to make corrections based on feedback from those tools.

For this post, I’ve made a new sample EPUB, quickly hand-coded from the Wikipedia page on Paleontology. I just wanted some new content to work with, instead of the same old demo books, and also, I think if you’re a person like me who works on EPUB standards and writes EPUB-centric software, that you should, from time to time, make an actual EPUB that’s more than a few test sentences. Will it be perfect? No. But what it will be is accessible.

So, fast-forward past a lot of copy-pasting, a bit of cursing, a few moments of “why did I think this was a better idea than using Moby Dick?”, and some light CSS tweaks: The sample files we’ll be using can be found in this GitHub repository.

There are different folders in the repository, each representing a version of the book in one of the steps below. Many of those versions have errors, so if you’re looking for a sample to copy, go right to Step4.Accessible. For this blog post, we’ll start with what’s in the folder called Step1.NotValidYet.

Validation

Before we can start accessibility conformance testing, we have to make sure that our publication validates. This doesn’t necessarily relate to finding accessibility issues but it’s something we have to do anyway, so why not take a minute and go over it.

When you create an EPUB fileset, the rules for what files must be included and how they should be formatted are defined by a standard. The most current EPUB standard, as of March 2019, is version 3 (with 3.2 on its way). Running your publication’s fileset through a validation tool will tell you if it’s compliant or if there are warnings and errors.

We’ll be using a free and open source tool called EPUBCheck to validate our fileset. If you’re playing along at home, you can download EPUBCheck from the releases page and run it from the command line.

In the example below, I downloaded and unzipped the brand-new beta of EPUBCheck 4.2 into ~/Downloads/epubcheck-4.2.0-beta/.

Go to the directory containing the book:

$ cd ./paleontology-demo-epub/Step1.NotValidYet

Run EPUBCheck on the folder containing our EPUB:

$ java -jar ~/Downloads/epubcheck-4.2.0-beta/epubcheck.jar --mode exp .

And sit back and wait for the all-clear! Or, as it turns out, the not-at-all-clear:

Error notices: the output of validation.

Well that output looks a bit scary. But let’s take a closer look. In this case, there are a lot of errors in the package document, and a few stray markup errors elsewhere. Many look like typos and things I forgot (e.g., media-type=”TODO”). Rather than go over each and every error, I’m going to tell you that I fixed the problems in the package document first, referencing the EPUB specification when I needed clarity on the rules. This cleared up a lot of other errors throughout and I was left with just a handful of easily resolved markup issues, like the following:

As far as I can tell, this was an attempt to quote some text, but the markup had errors when ported from Wikipedia to an XHTML context. (EPUB 3’s text files are XHTML, which is stricter than your average HTML web page.)

<dl>
<dd>
<dl>
<dd>
"the number of distinct genera alive at any given time; that is, those whose first occurrence predates and whose last occurrence postdates that time"
<sup id="cite_ref-RohdeMuller2005_97-0" class="reference">
<a href="refs.xhtml#cite_note-RohdeMuller2005-97">&#91;97&#93;</a>
</sup>
</dd>
</dl>
</dd>
</dl>

View source

So, I replaced it with this, which works just as well:

<blockquote>
"the number of distinct genera alive at any given time; that is, those whose first occurrence predates and whose last occurrence postdates that time"
<sup id="cite_ref-RohdeMuller2005_97-0" class="reference">
<a href="refs.xhtml#cite_note-RohdeMuller2005-97">&#91;97&#93;</a>
</sup>
</blockquote>

View source

There were one or two other cases like the above example where that bit of editing was all that was required. The corrected files are in the Step2.NotAccessibleYet folder. So let’s go to that folder and run EPUBCheck again:

$ cd ../Step2.NotAccessibleYet
$ java -jar ~/Downloads/epubcheck-4.2.0-beta/epubcheck.jar --mode exp .
Validating using EPUB version 3.2 rules.
No errors or warnings detected.
Messages: 0 fatal / 0 errors / 0 warnings / 0 info
EPUBCheck completed

Great, so our book is now valid EPUB 3.2!

Automated testing

Now we can start testing our newly-validated EPUB for accessibility. The first step is to run it through an automated checker.

The first thing to know about automated accessibility testing is that it covers around 30% of the total task of accessibility checking. The rest has to be done manually, but don’t worry, there are tools to help with that too.

We’re going to use Ace, by DAISY to do our automated checking. (Full disclosure, Romain and I are both developers of Ace.) Ace is free and open source, and you can install it by following these instructions.

Hang on, that’s the link for the command line version. It generates the exact same report, so it would be just fine to use. But I’m going to show you screenshots from the GUI version, newly developed, and with a public beta coming soon. If you were feeling intrepid, you could check out the project from GitHub and build it yourself, and while it’s a few issues shy of release, it works quite well.

You just drop your EPUB folder or file on the logo (or sidebar)! And then it spins and thinks a bit…

You just drop your EPUB folder or file on the logo (or sidebar)! And then it spins and thinks a bit…

And then you get a report about your EPUB, organized into five tabs: Summary, Violations, Metadata, Outlines, and Images.

And then you get a report about your EPUB, organized into five tabs: Summary, Violations, Metadata, Outlines, and Images.

We’re going to look at the Violations tab, which shows a table where each row represents a violation found in the book. The rows can be filtered by several properties, such as the rule that triggered the violation, or the file in which it was found.

We’re going to look at the Violations tab, which shows a table where each row represents a violation found in the book. The rows can be filtered by several properties, such as the rule that triggered the violation, or the file in which it was found.

I see that our book has a lot of missing alt attributes, some missing metadata, a few other minor things, and one case of an empty title element, which EPUBCheck didn’t care about. (It cares that there is a title element, but not what it contains.)

The rightmost column in the table describes possible causes for each violation, and includes a link where you can learn more about that topic. This link will take you to the relevant page in the DAISY Accessible Publishing Knowledge Base, where you’ll find a topic overview, techniques, and examples.

First, I’ll tackle the images. In addition to the Accessible Publishing Knowledge Base, I’m going to reference The DIAGRAM Center for guidance. And I'll have Ace's Images tab open.

Ace’s Images tab shows all the images in the EPUB, along with their alt text, aria-described-by content, and associated figcaption, so this is a quick way to see what needs attention.

Ace’s Images tab shows all the images in the EPUB, along with their alt text, aria-described-by content, and associated figcaption, so this is a quick way to see what needs attention.

With my text editor open, I added alt attribute values and also took this chance to reformat my images as figure elements:

<figure>
<img alt=”T. Rex skull” src="../images/TREX.jpg" width="150" height="100"/>
<figcaption>Analyses using <a href="https://wikipedia.org/wiki/Engineering" title="Engineering">engineering</a> techniques show that <i><a href="https://wikipedia.org/wiki/Tyrannosaurus" title="Tyrannosaurus">Tyrannosaurus</a></i> had a devastating bite, but raise doubts about how fast it could move.</figcaption>
</figure>

View source

All the images should be sorted out now. If I wanted, I could re-run Ace to check again.

The next thing to look at is metadata. Ace reported some violations related to metadata; in addition, the Metadata tab shows me which accessibility metadata is missing. Note that “missing” in this case does not mean “must have,” it just means that it’s not there. Not all accessibility metadata is required, but it’s good to use as much as is relevant to your publication. I’ll use the EPUB Accessibility 1.0 Specification as a reference for the metadata requirements.

As a result, I added the following to the package document:

<meta property="schema:accessMode">textual, visual</meta>
<meta property="schema:accessibilityFeature">alternativeText</meta>
<meta property="schema:accessibilityHazard">none</meta>
<meta property="schema:accessibilitySummary">
The publication has been partially evaluated for accessibility.
</meta>
<meta property="schema:accessModeSufficient">textual</meta>

So, now that I’ve addressed all the issues reported by Ace, I’m going to re-run both EPUBCheck and Ace to make sure we're good to go.

This is exciting! No errors and no accessibility violations. These tools really should show a little celebratory animation or something. Someone should tell the developers…

EPUBCheck:

$ cd ../Step3.AutomatedChecksDone
$ java -jar ~/Downloads/epubcheck-4.2.0-beta/epubcheck.jar --mode exp .
Validating using EPUB version 3.2 rules.
No errors or warnings detected.
Messages: 0 fatal / 0 errors / 0 warnings / 0 info

Ace:

Ace showing zero violations.

Ace showing zero violations.

The corrected files have been saved under Step3.AutomatedChecksDone, which are the files we’ll start with in the next section.

I also want to save my Ace report. To do that, go to the File menu and choose “Export Report.” This provides a zipfile containing two versions of the report: the raw JSON data and the more visually-appealing HTML version. Unzip the archive and have report.json ready for the next part.

Manual testing

Our EPUB is in pretty good shape at this point — we’ve fixed all the problems that the automated tools could find. Take a break and celebrate!

Okay, not so fast. This book isn’t accessible yet. As you might imagine, it’s not possible at this point in time for a computer to tell you if your image description is accurate or your pages are arranged in a meaningful sequence. That’s where manual testing comes into play.

Manual testing can be the most tedious and overlooked part of accessibility testing, but we can use a tool called SMART to lighten the load significantly. It picks up where Ace left off.

SMART is a web-based tool. Go to the SMART homepage and click “sign up” to try it out.

SMART is a web-based tool. Go to the SMART homepage and click “sign up” to try it out.

Drag and drop the report.json file saved from Ace into SMART. This file tells SMART about the different features of your book, so that SMART can intelligently filter a series of checkpoints based on your content. For example, there’s no point in spending time on audio or video checkpoints if your book contains neither.

You might be wondering where these checkpoints come from — if you look at the EPUB Accessibility specification, you’ll get an idea. It include things like checks that come from WCAG, for example, regarding images; and EPUB-specific topics such as checking that your spine is arranged logically.

So, let’s go through the checkpoints and see how our book looks. We won’t go over every single checkpoint together, but the examples below should give you a good idea of what SMART does.

I’m going to keep my Ace Images tab open so I can skim the images list while I verify what SMART is asking about how I use alt and figcaption.

I’m going to keep my Ace Images tab open so I can skim the images list while I verify what SMART is asking about how I use alt and figcaption.

Oh hang on, what’s that alt tag for the T. Rex image on the cover? alt="It’s an image"? Let’s fix that so it’s a little more useful. Ace didn't mention this as a violation because it saw content; it didn't care what it was.

After reviewing the images, I move on to the content structure, making sure the order looks good and the headings are in order.

I can use Ace’s Outlines tab for a high-level overview.

I can use Ace’s Outlines tab for a high-level overview.

Then, back in SMART, I can sign off on those items by marking them as “Pass,” and continue along through the checkpoints. I notice that many are greyed-out because they are about topics not relevant to my publication (e.g., video captioning).

SMART notes under “Contrast" that if I haven’t run Ace, I'll need to go over this section manually, without a doubt. But I did run Ace, which would have let me know if I was failing contrast minimums, so I feel pretty good about this part too.

Next, SMART helps me generate some better metadata than what I had added previously. I hadn’t been as thorough as I could have been. I copy and paste the improved metadata into my package document.

SMART metadata generator. View source.

SMART metadata generator. View source.

Proceeding through the rest of SMART’s prompts and checkpoints gives me a chance to create metadata for an ONIX record and also certify this publication as accessible.

SMART ends by generating a report summarizing your publication’s accessibility.

SMART ends by generating a report summarizing your publication’s accessibility.

And here is your reminder to run EPUBCheck again if you've edited your files at all since the last time.

Conclusion

I still see room for improvement, so I hope one day I can continue this post by experimenting with some of the charts from the original Wikipedia article that were left out here. There was a lot of really weird markup on these, so making them accessible would definitely be an interesting challenge.

Even with the relatively simple text I prepared for this example, I saved a lot of time by using tools to help me do the accessibility testing. They considerably reduced the tedium by providing automated analysis and guided manual checks; and they also gave appropriate reference information about important concepts along the way.

The goal of this post was to show the steps of accessibility conformance testing for EPUBs, using currently available resources. The open-source tools demonstrated here are actively being developed further and benefit from your support and involvement.

Resources

Thanks to Romain Deltour for his reviews and edits of this post!

Marisa DeMeglio and Romain Deltour will be leading an accessibility workshop at ebookcraft on March 18, 2019 in Toronto. You can find more details about the conference here or sign up for the mailing list to get all of the conference updates.