Nellie McKesson likes solving problems. Based in New York, she currently spends her days trying to find technological or workflow solutions for the publishing industry, combining her past experience in print and digital book production and design with her knowledge of digital workflows and programming. She has meandered through various roles in the publishing industry, starting out in journal publishing, then tackling tech publishing and authoring platforms at O'Reilly Media (where she pioneered an HTML/CSS-centric workflow and was part of the team that wrote the HTMLBook spec), and now rethinking trade publishing tools and workflows at Macmillan Publishers. Nellie went to school for philosophy, and plays the drums in an artsy metal band. She's written here about the automated EPUB toolchain to give you a glimpse of what you can expect from her ebookcraft 2017 talk, How I Built an Automated Ebook Production Platform—and You Can, Too!
In my role at Macmillan Publishers, I’ve spent the last couple of years balancing traditional publishing technology (e.g., InDesign, XML, XSLT) with modern web markup and conversion methods to build the first version of an automated ebook production toolchain that converts Microsoft Word manuscripts to EPUB files ready for distribution.
There are already tools out there for taking content, making adjustments to it, and transforming it into other formats, so we took some of those tools and tied them together with common open-source programming languages to create a complete toolchain. I wanted to leverage modern technology — and the vast development and support system that comes with it — to quickly build a toolchain that is simple, easy to maintain, and braced for whatever the future may bring.
I’ll be diving into our journey at ebookcraft 2017, where I’ll be talking about why we made certain decisions and how we implemented the toolchain. In this post, I’ll give you a brief overview of three popular technologies driving web programming that are at the heart of our automated ebook production system.
Microservice architecture is actually a concept rather than a tool — it’s a hot (and sometimes controversial) trend in software architecture today. At a basic level, microservice architecture is the concept of splitting up the things you need to do into multiple small services that connect to each other, rather than tying up your entire process into a single monolith that does everything. Since we’re a small development team working relatively quickly, we had to take some liberties with the underlying microservice ideology, but this philosophy is a constant part of our decision-making process and enables us to adjust our toolchain quickly and add functionality without rewriting the entire codebase.
This article gives a nice overview of what microservice architecture encompasses and how it works.
HTML and HTMLBook
Since HTML is the heart of web content and of the EPUB format, it’s no surprise that it’s also at the core of our content conversion toolchain. HTML is a world-wide standard for semantically describing content, so by using it for our books we're positioned to take advantage of that community of developers and the tools they build.
One such tool is HTMLBook. HTMLBook is a set of rules for how to use HTML to describe book content specifically. It’s pure HTML5, with additional rules about when to use certain tags and certain attributes that need to be added to identify the sections of your book. It was created by the O’Reilly development team (and, in fact, I was part of that team when HTMLBook was taking shape) as a publicly available spec for storing book content as HTML. The full spec is online here.
But the real selling point was that HTMLBook also comes with pre-built scripts to transform raw HTML files into a packaged EPUB (the caveat being that those HTML files must conform to the HTMLBook specification), and we used those scripts for our core EPUB conversion. You can find them on GitHub.