Home
Blog
Overview of all products
SalesData
LibraryData
CataList
Loan Stars
BiblioShare
Webform
EDI
Products for publishers
Products for retailers
Products for libraries
Information for authors
BNC Research
Canadian literary awards
SalesData & LibraryData Research Portal
Events
Tech Forum
Webinars & Training
Code of Conduct
Standards
EDI standards
Product identifiers
Classification schemes
ONIX standards
About
Contact us
Media
Bestseller lists
Newsletters
Podcast
Jobs
SalesData
LibraryData
CataList
BiblioShare
Webform
EDI

BookNet Canada

Home
Blog
Overview of all products
SalesData
LibraryData
CataList
Loan Stars
BiblioShare
Webform
EDI
Products for publishers
Products for retailers
Products for libraries
Information for authors
BNC Research
Canadian literary awards
SalesData & LibraryData Research Portal
Events
Tech Forum
Webinars & Training
Code of Conduct
Standards
EDI standards
Product identifiers
Classification schemes
ONIX standards
About
Contact us
Media
Bestseller lists
Newsletters
Podcast
Jobs
SalesData
LibraryData
CataList
BiblioShare
Webform
EDI
Graham Bell, EDItEUR
November 17, 2023
ONIX, Standards & Metadata

Checking a problematic ONIX file: Part one

Graham Bell, EDItEUR
November 17, 2023
ONIX, Standards & Metadata

The following post was featured in the international ONIX implementation group — a mailing list that you really should follow for ONIX announcements and discussion. Want to stay on top of ONIX implementation issues in a global discussion led by EDItEUR? Join the group here.

A couple of weeks ago, an EDItEUR member sent us a file they had received from another organization. It wouldn’t validate, so the member had rejected it, but they were unsure exactly what was wrong — so EDItEUR took a look.

In fact, there turned out to be several things wrong, mostly unrelated, any one of which would have caused problems for a data recipient.

1. Loading the file into a web browser — this is a good way of viewing a small ONIX file — caused an error. What was happening was this. On the first line, the ONIX said:

<?xml version="1.0" encoding="UTF-8"?>

So far, so good — but the file was NOT encoded as UTF-8 (Unicode). In fact, the file used the Windows character set. Now, whichever character set you use, all characters are stored or transferred as binary numbers (A = 01000001, B = 01000010, and so on). Certain numbers that are used in the Windows character set are not used at all in Unicode UTF-8, and therefore cause an ‘unknown character error' or even completely fail to load. Changing the beginning of the faulty ONIX file to this allowed it to load properly:

<?xml version="1.0" encoding="Windows-1252"?>

How do you spot the wrong encoding? Sometimes the ONIX file will not load without creating an error. Other times, it loads, but you get obvious character flaws like:

<PersonName>Marie-Adélaà ̄de Barthélemy-Hadot</PersonName>

which clearly should look like this:

<PersonName>Marie-Adélaïde Barthélemy-Hadot</PersonName>

There’s no simple method of checking the way the character set has been encoded, beyond a bit of trial and error. UTF-8 and Windows-1252 are the most common in European and North American ONIX, but other encodings might be used as it depends somewhat on the configuration of the computer used to compile the ONIX file, and often on the language used by the data sender.

 

2. Once the ONIX file could be loaded into an XML parser to check it, it was clear the XML was not “well-formed.” What this means is that the XML tags themselves don’t follow the correct <tag>data</tag> structure. EDItEUR uses Oxygen, but any XML parser will reveal these same issues. Here’s a couple of examples from the file which illustrate the issue:

<Name>

    <PersonNameType>01</PersonNameType>

    <KeyNames>Voltaire</KeyNames>

</name>

and

<RegionCode)FR</RegionCode>

You can see that the final </name> tag doesn’t match the opening <Name> tag — XML is case-sensitive. And the Region code opening tag needs a > character instead of the ). Unless these are fixed, the file doesn’t match the required XML syntax — it’s not well-formed.

 

3. Once the file is well-formed, it can be checked to see if it is valid ONIX. There are various ways to do that (the process of validation) but I won’t go into the details. The first line of the file was:

<ONIXMessage xmlns="http://www.editeur.org/onix/2.1/reference">

which means that the file is intended to be ONIX 2.1 with so-called “long tags.” Ignoring the fact that ONIX 2.1 hasn’t been recommended for use for about a decade, and was declared obsolete by EDItEUR earlier this year, old ONIX 2.1 files must normally include a DOCTYPE declaration like this:

<!DOCTYPE ONIXMessage SYSTEM "http://www.editeur.org/onix/2.1/reference/onix-international.dtd">

But this DOCTYPE declaration can be omitted, by agreement between parties, and an XML namespace declared instead — that’s the xmlns attribute in the ONIXMessage tag. You’d add this namespace attribute in the expectation of validating the ONIX file using an XSD schema instead of a DTD. DTDs need a DOCTYPE, XSDs need an xmlns attribute. And XSDs are very much preferred these days — DTDs are much more limited and run the risk of indicating a file is valid when it is clearly not. This file was validated using the ONIX XSD — and it was not “valid.”

An “invalid” XML file means “this matches the XML syntax requirements to be well-formed, but does not match the extra requirements to be correct ONIX.”

 

4. XML parsers like Oxygen are used to check an ONIX file matches those requirements of the ONIX specification. (There are many others parsers you can use — check out our pre-recorded webinar on validation on a Mac and a similar document for Windows. XML Nanny on the Mac or XML notepad on Windows are both good examples of free or very low-cost validation tools.)

The first validation error in the problematic ONIX file indicated that a <Name> composite (the ONIX 2.1 equivalent of ONIX 3.0’s <AlternativeName>) followed <BiographicalNote>. The ONIX 2.1 Specification requires that it must precede it, and moving <BiographicalNote> down fixed this issue. In all versions of ONIX, the tag order is vital — the tags must be in the order they are documented in the Specification — although contrarily, where a particular tag is repeated, for example using multiple <Contributor> composites when there are multiple contributors, the order of the repeats isn’t particularly significant.

The second validation error was caused by use of <ContributorPlace>. This is in fact an ONIX 3.0 and 3.1 tag, and cannot occur within an ONIX 2.1 file — you can’t mix and match your tags* from different versions of the standard. In ONIX 2.1, contributors can only be associated with a single country or region, and 2.1 cannot make use of the different <ContributorPlaceRelator> codes that ONIX 3.0 uses.

Third, the problematic ONIX contained this:

<PublicationDate>20230102</PublicationDate>

<CopyrightYear>2023</CopyrightYear>

<CopyrightOwner>

    <CopyrightOwnerIdentifier>01</CopyrightOwnerIdentifier>

    <PersonName>Graham Bell</PersonName>

</CopyrightOwner>

This appears in roughly the right order within the ONIX, after <PublicationDate> as you can see, but it must be enclosed within a <CopyrightStatement> composite. <CopyrightOwnerIdentifier> is a composite rather than a single tag (a composite is a small group of tags nested within another tag), and so when creating ONIX, you need to take account of the proper nesting of tags inside composites inside bigger composites…. This works:

<PublicationDate>20230102/PublicationDate>

<CopyrightStatement>

    <CopyrightYear>2023</CopyrightYear>

    <CopyrightOwner>

        <CopyrightOwnerIdentifier>

<CopyrightOwnerIDType>16</CopyrightOwnerIDType>

<IDValue>0000000427566266</IDValue>

</CopyrightOwnerIdentifier>

        <PersonName>Graham Bell</PersonName>

    </CopyrightOwner>

</CopyrightStatement>

Why did the data sender of the problematic ONIX use <CopyrightOwnerIdentifier> 01? Well, in the codelist for CopyrightOwnerIDType (List 44), 01 means “proprietary,” so I suspect this might have been an attempt to show the copyright was proprietary to (i.e., owned by) the rightsholder — but that isn’t what that code means. The correct use of <CopyrightOwnerIDType> is to specify what kind of identifier you are using to identify the rights holder — and you might be using a “proprietary identifier” (a non-standard ID created for internal use by a specific organisation, as opposed to a standard ID like an ISBN or an ISNI). The most commonly-used proprietary identifier is probably the ASIN, Amazon’s internal product identifier. In this case, I’ve given the rightsholder a 16-digit ISNI (in blue) instead of a proprietary ID.

There were more issues with what should have been a relatively simple, one record ONIX file. These will be covered in part two of this post.

Graham Bell is Executive Director of EDItEUR, responsible for the overall development of EDItEUR’s standards and the management services it provides on behalf of other standards agencies (including the International ISNI agency and the International DOI Foundation).

He joined EDItEUR as its Chief Data Architect in 2010, focused on the continuing development and application of ONIX for Books, and on other EDItEUR standards for both the book and serials sectors.

Subscribe

Don’t miss any new blog posts. Sign up for our weekly eNews to receive updates.

You can unsubscribe at any time. We respect your privacy.

Thank you!
Recent posts
Canadian book borrowers in 2024
Canadian book borrowers in 2024

Insights into the behaviour of Canadian book borrowers.

Read More →
Standards goals for 2025: A recap and a conversation about what may be next
Standards goals for 2025: A recap and a conversation about what may be next

Book supply chain standards are changing rapidly, let us help identify which recent updates are relevant to you.

Read More →
May 2025 Loan Stars Junior Canadian top picks
May 2025 Loan Stars Junior Canadian top picks

Find out what titles made it to the May 2025 Loan Stars Junior Canadian list.

Read More →
Canadian book buyers in 2024
Canadian book buyers in 2024

Insights into the behaviour of Canadian book buyers.

Read More →
Common metadata issues and how to fix them: Forgetting to include related products in your metadata
Common metadata issues and how to fix them: Forgetting to include related products in your metadata

Tips on including related products in your metadata.

Read More →
Podcast: Canadian bookmark project
Podcast: Canadian bookmark project

This month we’re talking with Chandler Jolliffe, owner of Cedar Canoe Books in Huntsville.

Read More →
 The Canadian Book Consumer Study 2024 is now available
The Canadian Book Consumer Study 2024 is now available

Get a free copy of the study in PDF or EPUB format today!

Read More →
Subject spotlight: Body, Mind &amp; Spirit
Subject spotlight: Body, Mind & Spirit

Sales and library circulation data of Body, Mind & Spirit titles during the the first quarter of 2025.

Read More →
ONIX Codelist 69 released
ONIX Codelist 69 released

Insights into the latest updates and additions made to ONIX codelists.

Read More →
5 questions with Caitlin Press
5 questions with Caitlin Press

5 questions with Sarah Vasu from Caitlin Press.

Read More →
Using Thema to identify diverse content in product metadata: worked example #15
Using Thema to identify diverse content in product metadata: worked example #15

Featuring River in an Ocean: Essays on Translation edited by Nuzhat Abbas.

Read More →
Subject spotlight: LGBTQ+
Subject spotlight: LGBTQ+

Sales and library circulation data of LGBTQ+ titles during the fourth quarter of 2024.

Read More →

Tagged: onix, book metadata best practices

Newer PostChecking in with Canadian contributors: Young Adult
Older PostTracking banned books in Canada
Blog RSS

The Canadian Book Market 2024 is the comprehensive guide to the Canadian market with in-depth category data.

Get your copy now

Listen to our latest podcast episode


  • Research & Analysis 446
  • Ebooks 304
  • Tech Forum 266
  • Conferences & Events 261
  • Standards & Metadata 227
  • Bookselling 218
  • Publishing 194
  • ONIX 177
  • Marketing 152
  • Podcasts 117
  • ebookcraft 112
  • BookNet News 99
  • Loan Stars 71
  • Libraries 66
  • BiblioShare 59
  • SalesData 51
  • 5 Questions With 48
  • CataList 42
  • Thema 42
  • Awards 30
  • Diversity & Inclusion 20
  • Publishing & COVID-19 18
  • Sustainability 10
  • LibraryData 9
  • EU Regulations 8
  • ISNI 4

 

 

BookNet Canada is a non-profit organization that develops technology, standards, and education to serve the Canadian book industry. Founded in 2002 to address systemic challenges in the industry, BookNet Canada supports publishing companies, booksellers, wholesalers, distributors, sales agents, industry associations, literary agents, media, and libraries across the country.

 

Privacy Policy | Accessibility Policy | About Us

BOOKNET CANADA

Contact us | (416) 362-5057 or toll free 1 (877) 770-5261

We acknowledge the financial support of the Government of Canada through the Canada Book Fund (CBF) for this project.

Back to Top

BookNet Canada acknowledges that its operations are remote and our colleagues contribute their work from the traditional territories of the Mississaugas of the Credit First Nation, the Anishnawbe, the Haudenosaunee, the Wyandot, the Mi’kmaq, the Ojibwa of Fort William First Nation, the Three Fires Confederacy of First Nations (which includes the Ojibwa, the Odawa, and the Potawatomie), and the Métis, the original nations and peoples of the lands we now call Beeton, Brampton, Guelph, Halifax, Thunder Bay, Toronto, Vaughan, and Windsor. We endorse the Calls to Action from the Truth and Reconciliation Commission of Canada (PDF) and support an ongoing shift from gatekeeping to spacemaking in the book industry.