Home
Blog
Overview of all products
SalesData
LibraryData
CataList
Loan Stars
BiblioShare
Webform
EDI
Products for publishers
Products for retailers
Products for libraries
Information for authors
BNC Research
Canadian literary awards
SalesData & LibraryData Research Portal
Events
Tech Forum
Webinars & Training
Code of Conduct
Standards
EDI standards
Product identifiers
Classification schemes
ONIX standards
About
Contact us
Media
Bestseller lists
Newsletters
Podcast
Jobs
SalesData
LibraryData
CataList
BiblioShare
Webform
EDI

BookNet Canada

Home
Blog
Overview of all products
SalesData
LibraryData
CataList
Loan Stars
BiblioShare
Webform
EDI
Products for publishers
Products for retailers
Products for libraries
Information for authors
BNC Research
Canadian literary awards
SalesData & LibraryData Research Portal
Events
Tech Forum
Webinars & Training
Code of Conduct
Standards
EDI standards
Product identifiers
Classification schemes
ONIX standards
About
Contact us
Media
Bestseller lists
Newsletters
Podcast
Jobs
SalesData
LibraryData
CataList
BiblioShare
Webform
EDI
Graham Bell, EDItEUR
July 14, 2023
ONIX, Standards & Metadata

Character encoding mismatches in ONIX

Graham Bell, EDItEUR
July 14, 2023
ONIX, Standards & Metadata

The following post was featured in the international ONIX implementation group — a mailing list that you really should follow for ONIX announcements and discussion. Want to stay on top of ONIX implementation issues in a global discussion led by EDItEUR? Join the group here and continue reading.

Nâzım Hikmet

Marie-Adélaïde Barthélemy-Hadot

‘Excellently written’ – Wisława Szymborska

It’s hard to make any sense of text like this. But it’s a sure sign of a mismatched character encoding. In fact, the above should look like this…

Nâzım Hikmet

Marie-Adélaïde Barthélemy-Hadot

‘Excellently written’ — Wisława Szymborska

The three names, of Turkish, French and Polish writers, require some special characters. And those characters can be “encoded” digitally in a variety of ways. Ultimately, for data to look correct, the encoding method needs to match all the way along the data supply chain. But you often see character encoding issues like this, in online bookstores and elsewhere, and a publisher asked about it earlier this week because author names on their own website did not match the names in their internal system.

In a publisher’s system, the publisher’s staff probably know how to make these names look correct in their own system. If the system exports ONIX, then the character encoding method used by that system is listed in the ONIX itself — it’s on the first line of the ONIX message:

<?xml version="1.0" encoding="UTF-8"?>

It needn’t always be UTF-8 (Unicode) — it can be Windows-1252 or a range of other encodings.

And then an ONIX recipient needs to respect that encoding when the ONIX data is imported into another system, for example, a retailer’s system or a website server. The retailer's system or webserver might use that same encoding internally, or they might use a different encoding — and if different, the encoding of the data needs to be explicitly changed into whatever the retail system uses, as it is imported.

The top example here illustrates what happens if the ONIX encoding is not respected as the data is imported. This is data that was encoded using UTF-8 Unicode in the ONIX, but was imported into a retailer system that uses Windows-1252 encoding as if it were already encoded using Windows-1252.

Conversion from one encoding into another is mostly straightforward for developers.

It’s a fact that some encodings are more comprehensive than others. Sometimes this means that you can have a character in the ONIX that cannot be ‘converted’ into whatever encoding is used by the recipient system because the encoding used in that recipient system doesn’t have an encoding for that particular character at all. Cyrillic or Arabic characters cannot be encoded using Windows-1252, for example. Unicode is the most universal, and forward-looking developers will mostly be using Unicode internally.

There is an appendix in the ONIX Implementation and Best Practice Guide that covers the question of character encodings in more detail.

And you might ask why the ONIX tags themselves — like <Header> or <PersonName> — don’t get affected by this? That’s a consequence of the fact that the ONIX tags themselves use only ASCII characters. Of course, the data within the tags can use any available characters. ASCII is a tiny subset of all the characters available, and for historical reasons almost every encoding method encodes them in the same way — so whichever character encoding you’re using, the tags themselves will remain recognizable. Whichever encoding you are using, <PersonName> will look like <PersonName>, even if Marie-Adélaïde Barthélemy-Hadot ends up looking like Marie-Adélaïde Barthélemy-Hadot.

Graham Bell is Executive Director of EDItEUR, responsible for the overall development of EDItEUR’s standards and the management services it provides on behalf of other standards agencies (including the International ISNI agency and the International DOI Foundation).

He joined EDItEUR as its Chief Data Architect in 2010, focused on the continuing development and application of ONIX for Books, and on other EDItEUR standards for both the book and serials sectors.

Subscribe

Don’t miss any new blog posts. Sign up for our weekly eNews to receive updates.

You can unsubscribe at any time. We respect your privacy.

Thank you!
Recent posts
Canadian book borrowers in 2024
Canadian book borrowers in 2024

Insights into the behaviour of Canadian book borrowers.

Read More →
Standards goals for 2025: A recap and a conversation about what may be next
Standards goals for 2025: A recap and a conversation about what may be next

Book supply chain standards are changing rapidly, let us help identify which recent updates are relevant to you.

Read More →
May 2025 Loan Stars Junior Canadian top picks
May 2025 Loan Stars Junior Canadian top picks

Find out what titles made it to the May 2025 Loan Stars Junior Canadian list.

Read More →
Canadian book buyers in 2024
Canadian book buyers in 2024

Insights into the behaviour of Canadian book buyers.

Read More →
Common metadata issues and how to fix them: Forgetting to include related products in your metadata
Common metadata issues and how to fix them: Forgetting to include related products in your metadata

Tips on including related products in your metadata.

Read More →
Podcast: Canadian bookmark project
Podcast: Canadian bookmark project

This month we’re talking with Chandler Jolliffe, owner of Cedar Canoe Books in Huntsville.

Read More →
 The Canadian Book Consumer Study 2024 is now available
The Canadian Book Consumer Study 2024 is now available

Get a free copy of the study in PDF or EPUB format today!

Read More →
Subject spotlight: Body, Mind &amp; Spirit
Subject spotlight: Body, Mind & Spirit

Sales and library circulation data of Body, Mind & Spirit titles during the the first quarter of 2025.

Read More →
ONIX Codelist 69 released
ONIX Codelist 69 released

Insights into the latest updates and additions made to ONIX codelists.

Read More →
5 questions with Caitlin Press
5 questions with Caitlin Press

5 questions with Sarah Vasu from Caitlin Press.

Read More →
Using Thema to identify diverse content in product metadata: worked example #15
Using Thema to identify diverse content in product metadata: worked example #15

Featuring River in an Ocean: Essays on Translation edited by Nuzhat Abbas.

Read More →
Subject spotlight: LGBTQ+
Subject spotlight: LGBTQ+

Sales and library circulation data of LGBTQ+ titles during the fourth quarter of 2024.

Read More →

Tagged: onix, book metadata best practices

Newer PostGet to know Canadian book consumers: Ages 45-54
Older PostJuly Loan Stars Adult Canadian Top 10 list
Blog RSS

The Canadian Book Market 2024 is the comprehensive guide to the Canadian market with in-depth category data.

Get your copy now

Listen to our latest podcast episode


  • Research & Analysis 446
  • Ebooks 304
  • Tech Forum 266
  • Conferences & Events 261
  • Standards & Metadata 227
  • Bookselling 218
  • Publishing 194
  • ONIX 177
  • Marketing 152
  • Podcasts 117
  • ebookcraft 112
  • BookNet News 99
  • Loan Stars 71
  • Libraries 66
  • BiblioShare 59
  • SalesData 51
  • 5 Questions With 48
  • CataList 42
  • Thema 42
  • Awards 30
  • Diversity & Inclusion 20
  • Publishing & COVID-19 18
  • Sustainability 10
  • LibraryData 9
  • EU Regulations 8
  • ISNI 4

 

 

BookNet Canada is a non-profit organization that develops technology, standards, and education to serve the Canadian book industry. Founded in 2002 to address systemic challenges in the industry, BookNet Canada supports publishing companies, booksellers, wholesalers, distributors, sales agents, industry associations, literary agents, media, and libraries across the country.

 

Privacy Policy | Accessibility Policy | About Us

BOOKNET CANADA

Contact us | (416) 362-5057 or toll free 1 (877) 770-5261

We acknowledge the financial support of the Government of Canada through the Canada Book Fund (CBF) for this project.

Back to Top

BookNet Canada acknowledges that its operations are remote and our colleagues contribute their work from the traditional territories of the Mississaugas of the Credit First Nation, the Anishnawbe, the Haudenosaunee, the Wyandot, the Mi’kmaq, the Ojibwa of Fort William First Nation, the Three Fires Confederacy of First Nations (which includes the Ojibwa, the Odawa, and the Potawatomie), and the Métis, the original nations and peoples of the lands we now call Beeton, Brampton, Guelph, Halifax, Thunder Bay, Toronto, Vaughan, and Windsor. We endorse the Calls to Action from the Truth and Reconciliation Commission of Canada (PDF) and support an ongoing shift from gatekeeping to spacemaking in the book industry.