Podcast: Best practices for book classification

In this month's episode, we ask our Bibliographic Manager, Tom Richardson, to ruminate on how subject metadata can be optimized to better serve the book industry. What's the best subject code to market your book? How many subject codes should you use? Why does it even matter? Strap in for 20 minutes of subject codes, meandering, and lots of telling it like it is.

Looking for the keywords white paper mentioned in the episode? Find it right here: www.bisg.org/best-practices-keywords-metadata.

(Scroll down for a transcript of the conversation.)

Want to make sure you never miss an episode of the podcast? You can subscribe for free on iTunesStitcherPocket CastsTuneIn, or SoundCloud.

Transcript

Krista Mitchell: Hello, everyone. And welcome to BookNet's podcast. I'm Krista Mitchell, the Marketing Associate here at BookNet Canada. In the latter part of 2015, BISG, backed by extensive stakeholder feedback, found a demonstrated need to give YA and teen content its own BISAC classifications, separate from Juvenile. With the 2015 code list, two new top-level trees have been added for Young Adult Fiction and Young Adult Non-Fiction to address young adult audiences. Up until 2014, such content would largely be in classified under a limited set of Juvenile Fiction and Juvenile Non-Fiction headings. YA and YA non-fiction now have much larger sets of classification codes up to 446 new codes actually. All in all, it's definitely a change for the better. To talk about new codes and classification overall, Tom Richardson, BookNet's own keeper of the data will be joining me today. Let's get into it. I don't know if you wanna introduce yourself or if you want me to introduce you.

Tom Richardson: Just don't call me a rockstar, please.

Krista: Was that the worst? Okay. I'll avoid...

Tom: Whatever you like to do, I mean, I can introduce myself as necessary.

Krista: We'll do one where you introduce yourself, and then we'll see how you feel about it.

Tom: I will say I'm Tom Richardson, BookNet Canada's Bibliographic Manager.

Krista: Perfect. So today we're going to talk about the YA BISAC codes which are new. So how many codes did they introduce? Do you remember?

Tom: Not exactly, 460 or something like that.

Krista: And how does that work when they...they just directly mapped them from the old Juvenile codes, or did they come up with brand new ones for the YA?

Tom: Largely, they are the old Juvenile codes basically made into YA ones, but, you know, there's clearly some adjustments. I mean, some codes don't exist. You know, toilet training has not been moved into YA and...

Krista: It's not relevant.

Tom: And then there were some Juvenile codes that were simply removed, you know, like things like pregnancy or that that level of stuff just went into YA, that type of thing. So, I mean, it's not 100% on anything, and I think there's probably a few unique codes been added, why wouldn't they?

Krista: There's one that I thought was, like, particularly interesting, which was the YA code for magic realism. Like, they don't seem to have a Juvenile code for magic realism. And I'm wondering why that is, I guess, in Juvenile magic realism would just be fantasy.

Tom: Do you really think that, you know, like, people under 12 are reading magic realism in a way...

Krista: Maybe. They might be.

Tom: Well, exactly. I mean, like, magic realism strikes me as being a very Adult, like breakout. Now, it probably makes sense to add it to Young Adult. And I don't think it probably has any more meaning to Juvenile than a Juvenile thing on concepts where you're talking about books about red and colours and numbers would have for YA. I mean, you know, there are conceptually... Okay. Dystopian fiction, YA, magic realism, same.

Krista: Right. So when the committee was deciding to break out these YA codes, they were basing them mostly on the age ranges.

Tom: Well, okay. Use is gonna prove the point. Okay. There is some sort of like thing where there's a recommendation that for a specific age range. But, you know, conceptually, it's not that difficult to imagine that there is a book for 14-year-olds that might really still be considered Juvenile without it really being made into a YA classification. Its age range is very approximate in Juvenile. Books are famous for everything being bits and pieces. Just they're very hard to classify. I mean, you have students who are, like, reading well above their age, and generally speaking, those students are conceptually above their age as well. And that's why they're reading above level. And then there are, you know, 17-year-olds who read the, you know, the grade four or five level. Now, they are probably conceptually closer to their own age than they are to their reading level. You know, so you would have YA books designed for readers at an age level that's well below YA, and you may have, you know, like, Juvenile books designed for a reading level well above their age range, you know, so that the concepts could be held down to an appropriate level, but the reading and the words might be up to the right level.

Krista: Right. So, like, one of the cases that I'm most interested in discovering is a series like "Harry Potter," where you start off solidly in middle grade and move into young adult concepts. How would the "Harry Potter" books be classified? Would their codes change as the books aged up, or would they all be classified as one series with one code?

Tom: Have any clue.

Krista: Have any clue?

Tom: Exactly. I mean, it would be however it makes the most sense. Okay. Going back to first principles...

Krista: Right. A good place to start.

Tom: ...is in general... I mean, not in general, specifically, BISAC requests that every book be assigned, I mean, subject. And the main subject, you know, like, should not be violated. So, if you classify the main subject's Juvenile, it is a Juvenile book. If it's a YA book, it's a YA book. And if it's an Adult book, it's an Adult book. And you shouldn't mix those two up. So you shouldn't have secondary codes, some different areas, and things like that. And the logic behind that is, well, multifold. One part of the logic would be, if you approach a retailer, they have a buyer. The buyers are normally broken up by subject and age, and that type of classification. So when you start crossing those boundaries, it's like you're trying to present the book to multiple buyers. And generally speaking, retailers only want one buyer to have the book represented to them. So that simplifies things. The other part of it is, is equally like that, which is what you do not want. Okay. It's perfectly reasonable for an intelligent publisher to be able to classify a book that is both Adult and YA and do it coherently and reasonably. However, experience in the industry shows that the number of people who do it badly outnumbers the ones who do it...doing it well.

Krista: What's an example of a way that it's done poorly versus a way that it could be done correctly?

Tom: No, I mean, the solution is that...

Krista: Don't do it at all.

Tom: ...don't do it at all, right? Is that it is far, far better to provide a very specific code to the book and only to sort of like segment it into like one spot and then to rely on the fact that other professionals actually deal with this material and will do the appropriate thing with it at, you know...

Krista: So reaching out to an audience that the book might not be intended for, like crossing audiences should be done from a marketing perspective and not a data perspective.

Tom: Well, okay. Yeah, exactly. And that's kind of one of the main points is that different parts of the metadata have a different intended audience. And while BISAC codes are arguably an important part of marketing, they are really intended for the professional industry. They're intended to be used by librarians to classify the book. They're intended to be used by booksellers to classify the book. So they are the area within the metadata that is most likely to be changed by the end-user before it's presented to the consumer. And the reason for that is the retailers and librarians and everybody else reserves the right to understand their market directly. And they are catering to specific markets and may have legitimate disagreements with the publisher on things or may not.

But, I mean, it's one of the aggravations of publishers is that people run around changing these codes, but there's reasons they're being changed. So you should understand that these are really being used by other professionals and orientate yourself towards them. So, again, going back to why wouldn't you want to classify it across multiple sort like levels is because a professional will look at this and...you know, in the case of "Harry Potter," how do they deal with the fact that the early books are like maybe for a younger age than the older books? Well, they market them within their domain properly. They're not idiots. And they don't need to be guided all that much. There are other areas within the metadata where you can provide guidance. But the area where you really need to be thinking that you're providing information to professionals is audience range, age range, interest range, complexity, which is basically reading level, and subject. Those are the areas that are intended primarily to be used by professionals for the classification of the book, and not specifically oriented towards the consumer. Now, keywords...

Krista: Keywords.

Tom: Keywords would be a spot where, you know, like, it can go a little bit crazier, and you can provide a list of, like, how you perceive the consumer as having an entry point to your book. So that's like kind of the bridge between like the professional side and the other side. And keywords should be used as, you know, terms that a consumer is gonna go onto this site and search for a book.

Krista: Right. So the keywords and the subject codes are utilized wildly differently, but should still overlap in some ways?

Tom: Well, okay. I mean, you got a book. The book's got a subject. The book has an audience. I mean, all these things are things that people have to know about. I mean, the editor is working to produce...I mean, particularly in the Juvenile, YA things, "You're producing a book for a specific audience and age range and reading level and all these things." And if you don't know what those things are, then you probably shouldn't be producing a professionally done YA Juvenile book. So, you know, like, these are all part and parcel of how it's done. So, you know, it's not like an author necessarily has to, like, know absolutely what they're doing in these things, but authors should be writing for those sorts of things. And their editor should be guiding them and helping them in that prospect. The marketing should be, like, geared to it as well.

Yeah, exactly. The book professionals metadata on these areas should be specific, clear, and unambiguous because they should be all matching to the intent of the book going back to the editor. It has to be taken seriously. And it's probably not something best left to somebody who doesn't know anything more than the catalogue copy of the book. It probably shouldn't be left to an intern. And you know, like, people within the organisation should really be taking these primary points kind of seriously. I mean, on one level, the answer to the problem of BISAC doesn't have the right code for my book, or Thema doesn't have the right code for my book, or otherwise should be answered by, "Are there actually enough books in the supply chain to make this worth highlighting for a retailer?" If the answer is yes, then someone should be contacting their local people who would be responsible for this thing to actually get the right code into BISAC. So, if you're an editor and you're frustrated by the BISAC codes, it's your lack of involvement in the supply chain that actually creates the problem. So, you know, contact BISG or BookNet Canada, and see what you can do about getting the codes right. Okay. So that's...

Krista: Step one.

Tom: Step one is paying attention to what you're doing and regarded as being a reason for your involvement. Now, looking at it from the point of view of BISG and, like, the setting in the BISAC codes into...some degree, the same holds for Thema, is that what do you want as a goal for subject heading for retailers? Well, first of all, it has to be relatively friendly and not expert used, which basically means, like, the librarian market actually provides training to librarians as to how to use the subject classification systems. I'm sorry, Seneca College doesn't. Okay? This is not being used by experts. So I'm sort of contradicting myself by saying that it's a non-expert system when we want it to be used professionally, but, you know, expert professional, there's some differences in intent here.

So what you want to have is something that's relatively simple that doesn't have like an infinite capacity for subject headings. That should be what is selling today at any given time. That's what a retailer-driven system is trying to do. So it should more or less, well, metaphorically speaking fit the shelves of a bookstore or the online subject things of it. If it's too granular, it's no good. And if it's too broad, people can't find the book that easily. So each year BISG culls the listing of codes and adds some. And generally speaking, about 50 codes a year kind of change in terms of new ones. There's probably a similar number dropped. So maybe 100 codes a year kind of get updated. Not that much. This year is exceptional because, you know, like, the list of changes is like 550.

Krista: That's right. And it's like a whole new subject category.

Tom: Right. So, I mean, this is a whole subject. It doesn't usually work that much. Why is it being done? It's straightforward enough. We were making money selling YA books. Therefore, we needed subject system that was appropriate to the money we're making.

Krista: Why do you think that didn't happen when new Adult really boomed at first? Because for a long time, it seemed like new Adult was the thing, like you were selling like crazy at first? They sort of tapered off and levelled out a little bit now, but...

Tom: Well, I would disagree that the capacity wasn't there. I mean, there has always been a specific audience code for YA. YA, for the 20 years, has had its own code as an audience code. It was perfectly possible to use a Juvenile code with a YA audience code and have pretty much exactly what BISG has developed, which is the Juvenile codes largely transferred over to the YA thing, except that people actually didn't do the work in order to use consistent audience codes to enable that to actually work. So, actually, what this represents is a workaround for the fact that people don't support audience codes properly, so that they're forcing to use a subject code to circumvent the audience code, which they haven't been done using properly.

Krista: In a way, it just sort of like adds clarification to the whole system.

Tom: Well, used properly. I mean, one of the main sorts of points that one might wanna make here is if your audience code and your new YA codes don't match, you've just contradicted yourself. It's stupid. And getting back to how a professional would, like, use your data, a librarian will go up and say, "I need YA books." They'll go to the audience code and say they have a code for YA. I'll do a search up based on it. That search might be done two ways. They might do, "Gimme all the codes that match it," or depending upon what they're actually looking for, they might say, "Show me everything, but something else." Right? And because there are two ways of doing it is another reason why you don't want to start mixing your codes up. Because if you select provide two codes for any particular book and someone uses that alternate way of, you know, deselecting, like removing...you actually wind up being lost from two lists instead of one. Right?

Krista: Right.

Tom: Okay. So, again, there's other...you know, if you look at how things are being used, then you kind of have to sit down and think through what you're creating. I can give a concrete example. Somebody asked BookNet to produce a list of business books. So I can tell you that if you take the active list of business books that we have in BiblioShare right now, simply by subject category from going to the main subject category, there are about 50,000 active titles, or slightly under. Of those books, approximately 8,000 of them had both a...there's two ways amongst 2.1 file. You can provide that piece of data, right? Okay. One of the problems you use the main subject composite, which is the new way, and then there's an old way, which just is the simple way that most people use.

Okay. So some people provide both pieces of data. Okay. If you eliminated the matches because that would be how you'd want to see it, it's only one main subject, so, therefore, they should match. If you eliminate the matches, you were still left with 2,000 books that provided different main subject codes. So they provided two main subjects within their thing, depending upon main and the old version, new version. Who knows what they were meaning? Who knows why they did that? But they did. So that's kind of like a really confusing. Because I understood the data well enough, I then carefully actually took the list for the additional subjects. And before I, you know, matched them onto the data that I was creating, I actually sat down and I took the list that I had already created and eliminated those codes from the list. So I didn't really keep track of the numbers. But, you know, probably I eliminated another third of the data as being duplication. You know, basically saying is that none of that should be there.

Krista: Yeah. And this way with better classification tools, you get more accurate results and better retailers can put your books in the hands of the intended audience, which is really why this change was made, to begin with.

Tom: But again, yeah, librarian doing that type of activity, and librarians do that type of activity, would sit down and basically about this point say, "The data is crap."

Krista: "I have no idea what's going on anymore."

Tom: And they would start basically refusing to do it. And they would then go back to their own databases, their own data sets, which are clean, coherent, and usable to their needs, which is... I mean, we've had real reluctance by the library and community to really use ONIX, and there's been a level of education involved in getting them trained up to it so that they could actually take advantage of all of the depth. They want the depth the publishers are providing. They want the consumer information that the publisher is providing, but they are driven crazy by the fact that you can't just search by audience code and get your YA list. You can't do it this way. If you do it that way, you'll find contradictions of... At a webinar, just recently, I've read an email from a retailer who was doing the same thing. They were aggregating data from multiple sources and trying to isolate the YA books because they wanted to match them onto the new YA codes.

Guys are a little crazy in some ways because I don't know why retailer would be doing this. Why not wait for the publisher to do it for you? But okay, I'll leave aside that. And he had exactly the same problem. The data he had wasn't usable for the purpose. Why do you need to pay attention? It's just, you know, you're creating a coherent thing, audience, subject, range values, complexity. I mean, complexity, in particular, needs to be sort of like, for instance, a bot reading level, Lexile...there's another one which I'm always forgetting, Fonthill. But, yeah, the reading level is, in a sense, bought from professional sources precisely so that people know this. Now, whether or not that's worthwhile for a given publisher, for a given book, is like a separate thing. That's an expense.

So you don't want to do it stupidly. But, you know, if it's there, it should be a good quality one that is usable because you're providing a reading level that should be used by the professionals. The age ranges, interest range, grade ranges, tend to be more approximate, but unless they're very specific and things, they are completely and utterly useless to the professional market to make any sense of it. You know, you get past three years of a range and it's clearly just wrong. I mean, it can't be, the book can't be written out that well, not unless there's a coherent, so the system where, say, you know, the reading level is set very low, and the age level's cut very high, and the complexity level then supports it where you can use a very well-designed book.

Krista: But there are publishers who do publish those sort of special readers for older ages that have sort of lower reading level, those exist.

Tom: Right. Or you might be trying to highlight the fact that, you know, like, there are Adult books that reach down into Young Adult. That's different from that Young Adult book that we just opened into the Adult market. So, you know, an Adult book that has legs for Young Adults is a different thing than a Young Adult book that, you know...so you can kind of start seeing the subtleties of that. And other professionals in the industry can make sense of that level, but only if you produce the data in a usable format for them.

Krista: Right. So, overall, I think what we're making a case for here is like richer, more robust data coming in, and the new subject codes assist with that, but you have to have sort of all of the pieces together.

Tom: Right. And conveniently enough, BISG did a white paper on keywords, which actually put all the pieces together. So, keywords, which is its primary focus, was a slight something, but conveniently enough, it mentioned the fact that, you know, the subject codes were in there, the audience codes were in there, all these other things, and linked all the pieces together. So you could read out how to create good quality keywords and see how all the various parts in this one lovely little white paper, you know, about 15 pages long, lots of white space, not too bad as these things go very readable.

Krista: We can link that in the description for the podcast. Actually, we'll just link that down below. And if people are interested in taking a look at that, they can certainly go right from this podcast to check it out.

Tom: Are we done?

Krista: I think we're done. That was pretty good. That was like all... That event does it for the podcast. Thank you for joining me today, Tom. To learn more about what we do at BookNet, you can find us at booknetcanada.ca. We gratefully acknowledge the financial support of the government of Canada through the Canada Book Fund. And, of course, thanks to you for listening. We'll see you next month.