It's serious stuff, metadata and standards, so I can't help but invoke revolutionaries like Lenin and Nikolay Chernyshevsky and cultural icons like George A. Romero and Jean Baudrillard.
To define my terms, starting with "living data": As far as I know, there's a developed consensus that we need metadata that can change and adapt. Business needs change, therefore what's in the metadata, i.e., what describes your business, should change as well. If your business has defined a need to use a standard capable of change, then you've identified a need for "living data."
The reverse simulacra of ONIX is mostly a bad and pretentious pun but references ONIX versions 2.1 and 3.0, which share a likeness or a similarity. Simulacrum is a poor choice of term since neither ONIX version is an imitation of the other but I'm trying to highlight that 3.0 is a development from the concepts and implementation first done in 2.1. Normally, simulacra references degradation through imitation and the loss of clarity that results, but by adding "reverse" I want to convey that ONIX 3.0 is more focused, more capable — dare I say, more capable of describing what's real than its predecessor. So, mostly, it's a bad pun but it allows me an explanation that contains a truth. Some of my recent blog posts have tried to address that truth by looking at specific changes that are genuine improvements.
What is to be done, then?
Accept that your business needs change and that in order to adapt to those changes you will have an ongoing need to develop your business's metadata.
Efficiency in metadata is best defined by following a standard. For senders that means knowing that the data expressed this way will have that result, which is efficient. For recipients it means using and displaying data in a predictable way when it arrives, following your expectations that are defined by the same standard.
Publishers (let's say they're all in the UK) who are using the book subtitle space for promotional copy are responding to the lack of promotional appeal of on-line display;
Retailers who are seeing the discoverability of products ruined by indexing ludicrous terms from those subtitles;
both have the same option:
Implement the Promotional Header provided by the ONIX standard to fulfill publishers' legitimate needs while protecting retailer indexing and processing needs. As an added bonus, not mucking around with a data field intended to match the actual book will make librarians a lot happier.
Both publishers and retailers would need to change their current practices. Both would need to live up to their theoretical desire for living data by implementing new practices. The point of my pretentious pun is to use icons and terms from political and cultural theory to emphasize that this is NOT A THEORETICAL POINT. Praxis your data practice is the answer.
How do we know what we should do? It's a big standard.
Use the standards community to implement change systematically. Both sides have to agree to talk. Both sides have to agree to change.
Neither can wait for the other to act. Both have a responsibility to act.
You have to demonstrate an ROI for metadata!
Define any ongoing business need that requires regular updating whose ROI can be defined in terms of a current publishing season or year. Please — add it to the comments.
It's not a single implementation; it's an ongoing need and one that blockchain will not help. Poorly implemented metadata securely distributed across multiple platforms will remain bad metadata. Same if the metadata is distributed in an app using JSON. Wait! You could exclude metadata that doesn't meet certain parameters so your app only distributes good metadata but there's another piece of the puzzle.
All metadata, virtually no matter how badly done, is better than no metadata if there are sales to be made. If a buyer has said they can make money and made a buy based on that money, then a retailer will do what it takes to sell the product.
And there squats a sad truth: There's no practical way to enforce good metadata as long as any metadata will be accepted.
But let's unpack that: Any metadata for a product that will sell will be accepted regardless of its quality but that doesn't describe most books.... Maybe the ROI of metadata can be defined for books that don't sell. Maybe there's an unacceptable level of metadata quality for generic and unknown products.
I'm not denying that there are costs to producing good metadata but you invest in and derive a return from a product, not from its description. Processing efficiency — getting the right data in place for the least cost to maximize the opportunities for sales — is a more sensible way to analyze metadata than the ROI of... well what?
We can't afford to change. Isn't there an alternative?
Do you accept the need for living data or not? Okay, I can envision that the industry might develop standard interpretations. How might that work?
Territory statements for sales rights and marketing areas are often poorly presented. It's important because retailers want to take in data from more than one market for a variety of reasons. They may be selling in more than one market, or want to develop the option to, or it may be a way to integrate more information into their records to enhance their ability to sell. Each scenario requires that they load data with an understanding of which market this data represents.
There are still distributors and data suppliers in North America and the UK who provide what I call "local" files. That's data supplied where who sent the data and to whom provides some of the definition of the market areas they serve. The information isn't embedded in the metadata. Any not-the-publisher company using publisher sales rights to represent their rights, or US companies supplying metadata to Canada that states the ISBN is offered for sale only in Canada (when the same ISBN is for sale in other markets) is producing a "local" file.
Data like that has to be interpreted properly — the sender expects the receiver to respect their intent but they're not supporting the receiver's need to integrate that data with other sources. Standard industry interpretations could provide a solution to that by agreeing to processing rules for incomplete files or typical poor practices. We enhance predictability of processing by agreeing on how we will process — i.e., what we will add — to poorly produced metadata. We, as an industry, make side deals. To solve problems where dates appear from other markets, we could propose that wherever publication-date is after on-sale-date that on-sale-date is the date to be understood as publication-date.
We do it already without talking about it. Sticking to Sales and Marketing rights data, I've noted an improvement over the past couple of years in that I see more complete rights statements. However, it's being done using country lists in every segment. For example, if you're selling the book in CA and US you list every other country in the world as "not for sale." Being a nerd, I've sampled these now and then and even at that cursory level of testing I've found out-of-date and redundant coding, missing countries, and contradictory entries. But the industry seems to have reached a consensus that we won't support the simplicity and readability offered in the standard for minimizing code list entries by referencing "WORLD." Instead we use statements only a computer can read — but apparently don't use computers to proof and update the results.
Does it matter? After all, most sales come from an incredibly small subset of countries. It does if we want retailers to interpret the data we provide. It does if you're a company who has identified higher growth coming from atypical foreign sales.
Decisions are being made — we just haven't decided to make them.
I think anyone who spends any amount of time thinking about how this might work will realize that the costs of implementing a well-documented, fully-thought-out standard are less than developing a series of protocols to enable current poor data practices to be sustained by standardizing interpretation of them. Do I need to point out that doing this would still force both sides, sender and receiver, to change their processing to accommodate the standardized interpretation of the practices they want to maintain?
I do think we should talk more about how we interpret data. And part of using the standard well is knowing that presentation X gets results Y. That's not something that happens in isolation.
The missing link
Communication from end users back to data producers is the missing link. "Good" metadata can't exist in a vacuum — if no one tells a sender about a problem then they assume there is none. Nor should a retailer who displays or uses data poorly be removed from feedback. I don't have a solution — fully documenting problems that prevent a file from loading adds about twice the time it takes to fix and load the file. But feedback, in all directions, is an important part of the solution. Currently metadata is a single direction line and that needs to change.
Imagine if your incoherent Sales Rights statement was returned by a retailer listing as "interpreted as providing us access to full world sales." And you responded with a correction or updated metadata that you knew would correct the issue. Would that help?