Home
Blog
Overview of all products
SalesData
LibraryData
CataList
Loan Stars
BiblioShare
Webform
EDI
Products for publishers
Products for retailers
Products for libraries
Information for authors
BNC Research
Canadian literary awards
SalesData & LibraryData Research Portal
Events
Tech Forum
Webinars & Training
Code of Conduct
Standards
EDI standards
Product identifiers
Classification schemes
ONIX standards
About
Contact us
Media
Bestseller lists
Newsletters
Podcast
Jobs
SalesData
LibraryData
CataList
BiblioShare
Webform
EDI

BookNet Canada

Home
Blog
Overview of all products
SalesData
LibraryData
CataList
Loan Stars
BiblioShare
Webform
EDI
Products for publishers
Products for retailers
Products for libraries
Information for authors
BNC Research
Canadian literary awards
SalesData & LibraryData Research Portal
Events
Tech Forum
Webinars & Training
Code of Conduct
Standards
EDI standards
Product identifiers
Classification schemes
ONIX standards
About
Contact us
Media
Bestseller lists
Newsletters
Podcast
Jobs
SalesData
LibraryData
CataList
BiblioShare
Webform
EDI
Tom Richardson
October 21, 2009
BiblioShare, ONIX, Standards & Metadata

Data Exchange Tip #1: Why XML?

Tom Richardson
October 21, 2009
BiblioShare, ONIX, Standards & Metadata

I’m going to do a series of blog posts on some of the very basic issues in file trading—what needs to be done before you submit an ONIX file (or an E-book if your e-book is in XML). In doing this I’m hoping that publishers will comment about software they like (and don’t), problems they have—and with any luck their successes.

So, for the first post: Why XML?

Any discussion about file exchange has to start with why XML works, which is because of its underlying assumptions and the software that supports them. The main assumption is that all the characters, line returns, visible and hidden content—all of it—are recognized in every file. XML software tests for this and it’s so important that information about it normally appears in the first line of an XML file as an encoding statement, right after you identify that this is an XML document:

<? xml version=”1.0” encoding=”utf-8” ?>

or

<? xml version=”1.0” encoding=”iso-8859-1” ?>

Think about that for a moment: How obvious and how could it be otherwise? And then think about just how unlikely it is to be true about a publisher’s ONIX file, built up over long periods of time through cut and paste from who knows what source documents. You don’t really know where all the millions of characters in your ONIX file came from, do you? And that’s why trading delimited files or database files doesn’t work. None of these test the incoming data. But XML software does and it won’t work with less than “well encoded” data.

Publishers can think of it this way: You’ve probably heard of or published a book where an “incompetent freelance designer didn’t use the right font” (or used “outdated software,” or provided “bad thingies”) and the files screwed up when it went to the printer. And your production manager “fixed that file” with a lot of overtime and foul language. That’s an encoding problem: What you sent to someone else didn’t appear as you intended it to be. If you were trading files in XML and did it right that wouldn’t happen. All sorts of other things might—but not that.

The trick to the encoding statement is it doesn’t really matter where the characters came from—it’s not your ability to answer the Zen koan: “What is the encoding of the letter you’re typing now?” What matters is what happens when someone else loads the file. Does their software recognize all the characters? You may have software designed to create an ONIX file, but does it monitor what’s going into it? Does it prevent you from loading dashes from Word 97 or WP5.1 with an error message? Does it ask you want the output encoding to be and prevent anything else going it? It would be surprising if it did.

So the first rule of data exchange is that you must test the ONIX output every time you create it. You test your data with XML software before you send it. The XML standard demands it. The ONIX standard depends on it.

That’s why XML works. The XML standard and software are designed to enforce things like this. You may think you can trade data using Excel or delimited formats, but none of these will do a good job of ensuring that what you send can be read at the other end. XML does (somewhat—don’t think it’ll be perfect), and that’s main reason it’s better for data transfer.

Tagged: xml, data exchange tips

Newer PostPoll: Is QUE or Nook a Weirder Name for an E-Reader?
Older PostBookCamp Vancouver: Major Wrap
Blog RSS

The Canadian Book Market 2024 is the comprehensive guide to the Canadian market with in-depth category data.

Get your copy now

Listen to our latest podcast episode


  • Research & Analysis 446
  • Ebooks 304
  • Tech Forum 266
  • Conferences & Events 261
  • Standards & Metadata 228
  • Bookselling 218
  • Publishing 194
  • ONIX 178
  • Marketing 152
  • Podcasts 117
  • ebookcraft 112
  • BookNet News 99
  • Loan Stars 71
  • Libraries 66
  • BiblioShare 59
  • SalesData 51
  • 5 Questions With 48
  • CataList 42
  • Thema 42
  • Awards 30
  • Diversity & Inclusion 20
  • Publishing & COVID-19 18
  • Sustainability 10
  • LibraryData 9
  • EU Regulations 8
  • ISNI 4

 

 

BookNet Canada is a non-profit organization that develops technology, standards, and education to serve the Canadian book industry. Founded in 2002 to address systemic challenges in the industry, BookNet Canada supports publishing companies, booksellers, wholesalers, distributors, sales agents, industry associations, literary agents, media, and libraries across the country.

 

Privacy Policy | Accessibility Policy | About Us

BOOKNET CANADA

Contact us | (416) 362-5057 or toll free 1 (877) 770-5261

We acknowledge the financial support of the Government of Canada through the Canada Book Fund (CBF) for this project.

Back to Top

BookNet Canada acknowledges that its operations are remote and our colleagues contribute their work from the traditional territories of the Mississaugas of the Credit First Nation, the Anishnawbe, the Haudenosaunee, the Wyandot, the Mi’kmaq, the Ojibwa of Fort William First Nation, the Three Fires Confederacy of First Nations (which includes the Ojibwa, the Odawa, and the Potawatomie), and the Métis, the original nations and peoples of the lands we now call Beeton, Brampton, Guelph, Halifax, Thunder Bay, Toronto, Vaughan, and Windsor. We endorse the Calls to Action from the Truth and Reconciliation Commission of Canada (PDF) and support an ongoing shift from gatekeeping to spacemaking in the book industry.