HTML-First at Wiley

Photo of Tzviya Siegman.

Tzviya Siegman is Wiley’s Information Standards Lead. Tzviya wrote and maintained Wiley’s ebook specifications and stylesheets and serves as Wiley’s liaison to industry standards groups. She currently works in Wiley’s Platform Architecture Group, joining her interests in content structure, standards, accessibility, and linked data. Tzviya co-chairs the W3C Publishing Working Group, helping to make the web and publishing better friends.

Photo of Benjamin Young.

Benjamin Young is a web, digital publishing, and open source advocate. Benjamin's focus is on content and how human beings interface with it and each other around it. He currently explores the edges of a re-decentralized web using annotation, distributed identity, and offline-friendly web apps and extensions. Benjamin is currently a Solutions Architect at John Wiley & Sons where he works on the Web Annotation Working Group and Digital Publishing Interest Group at the W3C.

Tzviya and Benjamin will be speaking at ebookcraft in a session called HTML-First at Wiley, or, How We Learned to Stop Worrying and Love the DOM.

Background

Wiley has been publishing content for 210 years. As technology has changed, so, too, have our workflows. We were early adopters of XML-first publishing for our books and journals in the 1990s. And the resulting WileyML format is still the primary interchange format in our pipeline. But as people read on the web more and more, we realized we needed HTML, the lingua franca of the web.

Why HTML?

There are some key advantages of using HTML:

  • It provides the option to more easily build in-browser tools for review, editing, conversion, and data extraction.
  • It has native accessibility. (For example, it has automatic ARIA, which provides information to assistive technologies.)
  • HTML describes content and CSS describes appearance. But many different CSS files could be associated with a given HTML file, allowing the same source to be used in an ebook, on the web, or even in print.
  • It can be created by XML toolchains and, as XHTML, can even be processed by XML tools. Adapting existing XML processes to HTML is often easier than starting from scratch.
  • It serves as a foundation that's closer to the future of Web Publications (an in-progress W3C standard that needs your help).

Wiley Research, our journals business, was a good place to start implementing HTML. Journal articles are relatively simple compared to books: There's only one document. And we have a platform (Wiley Online Library) on which people already access HTML. Streamlining the article workflow to HTML-in, HTML-out was a logical place to start.

How did we get there?

In scholarly publishing, the metadata is about as important as the content. Researchers’ jobs can depend on their citations and the data they use and reference in their work. XML can hold extensive metadata, but how does that work in HTML?

It's possible to embed data into HTML via Linked Data, a method of exposing and connecting data on the web from different sources, using languages such as RDFa and JSON-LD. These techniques are common in the world of search engine optimization and can provide the same sort of visibility and value to the world of scholarly publishing and surrounding tools.

A small group of Wiley XML expats got together to see if we could make things work in HTML with RDFa. The result is what we call Melville (a nod to our origins as one of Herman Melville’s publishers). It's HTML that can be transformed to WileyML and can easily be stripped (normalized) to simplified HTML for display in browsers.

If you’re intrigued by some of what we’ve mentioned here and want to see some samples, we’d love to have you join us on March 22 at ebookcraft!

Thanks to Dave Cramer and Laura Brady for their helpful edits.

If you'd like to hear more from Tzviya Siegman and Benjamin Young about digital standards, register for ebookcraft, March 21-22, 2018 in Toronto. You can find more details about the conference here, or sign up for our mailing list to get all of the conference updates.