Home
Blog
Overview of all products
SalesData
LibraryData
CataList
Loan Stars
BiblioShare
Webform
EDI
Products for publishers
Products for retailers
Products for libraries
Information for authors
BNC Research
Canadian literary awards
SalesData & LibraryData Research Portal
Events
Tech Forum
Webinars & Training
Code of Conduct
Standards
EDI standards
Product identifiers
Classification schemes
ONIX standards
About
Contact us
Media
Bestseller lists
Newsletters
Podcast
Jobs
SalesData
LibraryData
CataList
BiblioShare
Webform
EDI

BookNet Canada

Home
Blog
Overview of all products
SalesData
LibraryData
CataList
Loan Stars
BiblioShare
Webform
EDI
Products for publishers
Products for retailers
Products for libraries
Information for authors
BNC Research
Canadian literary awards
SalesData & LibraryData Research Portal
Events
Tech Forum
Webinars & Training
Code of Conduct
Standards
EDI standards
Product identifiers
Classification schemes
ONIX standards
About
Contact us
Media
Bestseller lists
Newsletters
Podcast
Jobs
SalesData
LibraryData
CataList
BiblioShare
Webform
EDI
Tom Richardson
October 27, 2009
BiblioShare, ONIX, Standards & Metadata

Data Exchange Tip #2: So What('s) Encoding Anyway?

Tom Richardson
October 27, 2009
BiblioShare, ONIX, Standards & Metadata

In the first tip, I tried to establish why you, the ONIX file sender, have to test your file, and that’s simply to ensure that the file’s content—all the characters—would be recognized by the aggregator’s software. The “encoding” declaration in the first line of the file tells the recipient what to expect—and your job is to ensure that the file matches that.

If you’re trading files in English speaking North America you’ve got a choice of three encodings that will almost certainly be considered acceptable by aggregators. (There are lots of others, but my assumption is that you’re trading files largely in English, with some French and/or Spanish thrown in).

The default encoding in ONIX is UTF-8. It’s the most commonly used in English North America for XML and the most supported by XML software. It’s more-or-less what was called ASCII (but not extended ASCII)—the English language keyboard characters. Any text document in English will almost certainly be largely in UTF-8 encoding without any work on your part.

The other common encoding is ISO-8859-1, what might be called ‘extended ASCII’ or Latin-1. It supports the common accented characters in French, Spanish and German. BISG has identified this as the preferred encoding for the US supply chain. We in Canada are more demure and think it slightly impolite to discuss, but are OK with it too.

And then there is “windows-1252.” This is what, in desperation, your trading partners will use when they hope you’re on the Windows operating system and your file is screwing up when they load it. It’s the Windows version of ISO-8859-1. I think. I don’t really know… Who could possibly care about this?!

Here’s the dummy version: When you hit a computer key some code is generated and interpreted and appears on your screen. There’re conventions and standards that control all this and when you bought your computer if the sales person was awfully knowledgeable, they might have been able to tell you what conventions your computer follows. If you’re on a PC with a number pad try this: Hold the ALT key down and on the number pad key 80. If you did that you made a big pee, and I’m really, really pleased with myself for getting you to do it. My only point is that there really isn’t a way to know what your computer is doing, except that:

  • If you bought your computer in English speaking North America;
  • and no one said it wasn’t an standard keyboard;
  • and you’ve not really thought much about it;

then what happens when you make simple keystrokes is almost certainly UTF-8 (unless some piece of software is screwing with what you type). Can you cut and paste into a text document or email and it (usually) doesn’t turn to gibberish? Then it’s more or less UTF-8.

XML software doesn’t care. It’s up to you to tell it what your characters are, and as a start assume that you’re typing largely in UTF-8. You don’t really have a choice. But here’s a quick solution to testing your ONIX and it’s not loading properly because of unrecognized characters. Change the encoding declaration to encoding=”iso-8859-1” and hope. It may be all that you need, but more likely you’ll have a small number of unrecognized types of characters in your file.

To summarize: You must test all XML files before sending them, and the initial point of testing XML files is to ensure that the contents are recognized and defined. There are some secondary data quality and validation issues that will come up when the actual ONIX standard is discussed, but the first step is always a coherent recognized file acceptable to XML software.

The next post is some practical tips on cleaning files, and the one after that is on what to do with special characters outside of your encoding statement, so don’t worry about your weekly excitement just yet.

Tagged: xml, data exchange tips

Newer PostEmployee of the Month
Older PostThe Digital Age Is Moving Fast: Lists for a Happy New Year!
Blog RSS

The Canadian Book Market 2024 is the comprehensive guide to the Canadian market with in-depth category data.

Get your copy now

Listen to our latest podcast episode


  • Research & Analysis 447
  • Ebooks 304
  • Tech Forum 266
  • Conferences & Events 261
  • Standards & Metadata 228
  • Bookselling 218
  • Publishing 194
  • ONIX 178
  • Marketing 152
  • Podcasts 118
  • ebookcraft 112
  • BookNet News 99
  • Loan Stars 71
  • Libraries 66
  • BiblioShare 59
  • SalesData 51
  • 5 Questions With 48
  • CataList 42
  • Thema 42
  • Awards 30
  • Diversity & Inclusion 21
  • Publishing & COVID-19 18
  • Sustainability 10
  • LibraryData 9
  • EU Regulations 8
  • ISNI 4

 

 

BookNet Canada is a non-profit organization that develops technology, standards, and education to serve the Canadian book industry. Founded in 2002 to address systemic challenges in the industry, BookNet Canada supports publishing companies, booksellers, wholesalers, distributors, sales agents, industry associations, literary agents, media, and libraries across the country.

 

Privacy Policy | Accessibility Policy | About Us

BOOKNET CANADA

Contact us | (416) 362-5057 or toll free 1 (877) 770-5261

We acknowledge the financial support of the Government of Canada through the Canada Book Fund (CBF) for this project.

Back to Top

BookNet Canada acknowledges that its operations are remote and our colleagues contribute their work from the traditional territories of the Mississaugas of the Credit First Nation, the Anishnawbe, the Haudenosaunee, the Wyandot, the Mi’kmaq, the Ojibwa of Fort William First Nation, the Three Fires Confederacy of First Nations (which includes the Ojibwa, the Odawa, and the Potawatomie), and the Métis, the original nations and peoples of the lands we now call Beeton, Brampton, Guelph, Halifax, Thunder Bay, Toronto, Vaughan, and Windsor. We endorse the Calls to Action from the Truth and Reconciliation Commission of Canada (PDF) and support an ongoing shift from gatekeeping to spacemaking in the book industry.