This month’s episode features a conversation with Sophie Wu, a Master's candidate in Digital Humanities at McGill University, about AI and one of its potential applications in the publishing industry: storytelling analysis.
(Scroll down for the transcript.)
Want to make sure you never miss an episode of the podcast? You can subscribe for free on Spotify, iTunes, Pocket Casts, TuneIn, or SoundCloud.
Further Reading/Listening
Tech Forum presentations from .txtLAB current and former members:
Tech Forum sessions on AI:
Transcript
Nataly Alarcón: Hello, everyone, and welcome to the BookNet Canada podcast. I am Nataly Alarcón, your host for this episode. AI has been here for a while, and it's not going away anytime soon. At BookNet, we have been cautiously thinking and reflecting on the opportunities, challenges, and potential applications of this technology in the book industry. As part of our Tech Forum programming, we have invited industry experts to talk about the good, the bad, the ugly, and the unknown. Leaning into the opportunities side of the conversation, we have invited Sophie Wu, a Master's candidate in Digital Humanities at McGill University, to chat with us about her latest research on storytelling analysis using AI.
Welcome, Sophie. Let's start by sharing a bit of context with our listeners. So, can you please introduce yourself and tell us a little bit about .txtlab and how the research we're about to discuss came to be?
Sophie Wu: Yeah, for sure. So, I'm currently doing my MA in Digital Humanities at McGill. And for anyone who has never heard of Digital Humanities before, that's most people that I talk to, it's this umbrella term for any sort of work that combines anything computational, so anything digital, with anything in the humanities. So, as you can imagine, this is pretty broad. There's lots of people who do very different things. But my corner in this field is I look at how data and computational methods can be used to understand culture.
And specifically, my tagline for my thesis and the research that I'm involved with is I use AI to study storytelling and to study stories. And where this came from is I've always been a pretty big reader. I love books, so I'm really excited to talk more about the publishing angle on this research today too. And when ChatGPT came out in the last year of my undergrad, I just felt a lot of, like, "Whoa, it's crazy that computers can read now." Obviously, the way that they "read" is super different from how humans read, but it's still really fascinating, and I wanted to learn more about how they do this, what we can use with this.
So, I applied to my programme at McGill, and I'm now working with my supervisor, Prof. Andrew Piper, who runs .txtlab. So, that's a text analytics group at McGill that focuses a lot on this digital study and computational and data-driven study of culture. And we focus a lot on storytelling because we work with a lot of text, as the name implies. And there's been a really big explosion of new research possibilities in this field with AI. So, that's what I'm going to talk about today.
Nataly: That's very exciting. For our listeners, some of them may remember Andrew. Andrew has presented at Tech Forum before. We will make sure to share the links to his presentation and also from some of his other students, which are also very research-driven. But back to you, Sophie. So, when you tell someone at a party what you are researching, how do you explain it in simple terms?
Sophie: Yeah, so I usually expand on the "I use AI to study stories" tagline like this. I start with explaining that I use a specific kind of AI. So, I use large language models, which are ... If you've interacted with ChatGPT before or any sort of AI chatbot where you can give it text and it'll write text back to you, that was probably a large language model. So, I work with the kind of AI that can "read" language.
And for the rest of this podcast, I will use AI as a stand-in for this. But whenever I talk about AI, I will be talking about large language models because that's the kind of AI that I deal with. And using these large language models, I basically have them scan lots and lots of different texts. So, humans can only read so fast. You can only read so many books in a month, a year. But an AI can scan hundreds, thousands, millions of texts, even in multiple languages, really, really quickly.
So, with that, we can then annotate a lot of text at the same time. So, an example of this is if I have a data set of a lot of different stories. So, as an example, maybe I have a lot of different summaries of novels on Wikipedia. I can get an AI to annotate all the stories and to answer questions that I have about the stories. Like, I could say, "Hey, ChatGPT, can you read this story thoroughly and then tell me if there's a female protagonist in the story? Answer one for yes and zero for no." And the result is if I do that with all my stories, then I get a super big data set where I can then count, "Hey, how many of these stories actually have female protagonists? Have we gotten more stories with female protagonists over time? Are they more common in certain genres?" Things like that. And then that way, we can actually study culture really at large and using a lot of data to answer a lot of interesting questions.
Nataly: So, to further dive into that, can you walk us through what annotating stories means within the context of your research?
Sophie: Yeah, definitely. So, annotations are a really, really big, important part of my research. So, when we start out with a story, we have this really complex cultural object. There's usually a lot of details in it, and that's part of why we like to read stories. We like to engage with all those details.
But then when we tell people why we like a story or what happened in a story, as humans, we're pretty good at condensing that information to say like, "Oh, well, what I liked about the story was that I was really surprised when this happened. I really liked that this character did that." We take all those details and we can condense that information into whatever was most meaningful to us.
And when we have lots and lots of stories, we need to condense a lot of that information to study patterns and differences and what happens between all those different stories. So, the process of annotation is the process of taking a single story into an annotation, so into some sort of data that we can analyse.
With that previous example that I gave, with asking ChatGPT to tell us if there's a female protagonist in the story, the annotation is just the result of that. It's the answer that the AI gave us or that a human annotator could give us too. Before we had AI, we would just have humans do these sorts of annotations.
And then at the end, when we have all our annotations, what we have is a data set of all our stories and all the annotations that tell us the answer to that question that we have about those stories. And then once we have all those annotations, that's when we can start to answer those more data-driven questions like, "Well, where are we seeing more female protagonists? Do we actually have a lot of female protagonists to begin with?" Because then we actually have the numbers on that.
Nataly: That's very cool. So, you've kind of alluded to this already in your answers, but I'm wondering what kind of information in your research, particularly, is the AI identifying when it analyzes a story?
Sophie: Yeah, it's a really interesting question. I like to think of AI as kind of like a really, really good high school essay writer. When you think of the kinds of things that we usually ask high school students to write in an essay after reading a book, you really want them to form a clear, polished argument where they can extract details from whatever it is that they just read and explain why it is that they think whatever they think about what they just did.
So, AI is generally pretty decent at this. So, in terms of what information the AI is identifying when it's analyzing the story, it's really a tool to help you do that information-condensing process that I was talking about before. And whatever it's going to identify and focus on in the story, that's going to be highly dependent on whatever you've asked to have it do in the prompt that you gave it. So, you really want to be careful with more subjective prompts.
In some of my research, I've found that asking something like, "Is someone experiencing emotion in the story?" will get you a different response from, "Is someone feeling emotion in the story?" So, if it's the kind of question where a human might answer the question differently from another human being, you could expect that the AI might answer that differently from humans. And that's a big part of my research too is looking at comparing human annotations to AI annotations to see whether or not AI tends to have similar responses to humans so that we can even use it for the sort of cultural analysis in the first place, because if it turns out that AI has these massive biases, then we probably don't want to be using it to assume what people might think.
But generally AI is pretty detail-focused. So, I think that's part of why people like to use things like ChatGPT for everyday questions nowadays, because it's really thorough, and it's really quick, and it can get to these super detail-oriented conclusions much faster than a human can. So, if you've asked it something like, "Hey, so you said that this is the moral of the story. Can you explain why this is the moral of the story?" it will give you these super detailed bullet points that really go through all the different parts of the story, assuming it wasn't too long for the AI to process. It will explain all of them as clearly as possible and will address and justify all those points. That can feel really useful when you just want to quickly get from some details in the story that you've asked it to focus on to some meaningful conclusion about the story.
Nataly: That sounds very promising. So, looking at potential applications in the book industry, do you think that this could be, for example, a tool to help publishers spot patterns that indicate, for example, which manuscripts align with what they are looking for? So, in case they are looking to highlight stories specific to a group or region or theme, do you think that this could be a potential application?
Sophie: Yeah, definitely. I will give a disclaimer here that I'm not in the publishing industry, but like I said before, I do read a lot, and I'm thinking often about different ways that this sort of technology could be used. So, it is fun to brainstorm about these things, and these are some ideas off the top of my mind right now.
I definitely think that AI could be really useful to help group stories by traits and themes automatically using that process that I described, where you have a lot of stories and you're collecting a lot of annotations on them. It can really probably help you spot patterns that humans might miss at first glance. We do have limitations in how many things we can read at once.
I will say that one limitation with manuscripts is I would imagine that a lot of them are pretty long, and a lot of AI models still really struggle with length. Especially on longer documents, they really could end up missing details and resorting to things that don't have anything to do with the story itself to try to answer the question, although it seems like they are improving a lot on this recently.
But assuming you have an AI model that you really trust to analyse something well, and maybe you even have a collection of something shorter, like maybe a bunch of book synopses or descriptions or query blurbs, and you really want to know what makes them different, and you've already read them but want to directly compare all of them at once, you could even give it all the different plot summaries or texts and then ask things like, what are the key differences in certain things that you might be interested in?
And then assuming you're not doing super data-driven research and you're not analyzing hundreds of texts at once, and you're just comparing a few at a time, you also probably don't need to have questions that really condense the information all that much. You can ask it to be really exploratory, and you can ask more open-ended questions. And you can get more free-text responses and maybe even more closely look at those responses. And I would assume that that could really help you at least brainstorm more, and compare more, and at least have another angle on understanding exactly what's different and exactly what's important and meaningful about the different manuscripts that you're comparing.
Nataly: And what about, for example, helping marketing teams understand what elements of a story will resonate within a specific audience? I'm thinking using the information that the AI generates to develop marketing materials, of course, always keeping a human in the loop.
Sophie: Yeah, I definitely think there's a lot of potential here too. There's probably a similar principle here to what I was talking about before with analysing manuscripts, where the AI can be really good for brainstorming and for providing a different angle and different ideas.
So, if you have a specific book that you're looking at, you could probably ask questions like, "What makes this book unique? What are some tropes that this book embodies?" And then the AI can just help you generate more ideas. And then you have more resources available to be thinking on your own. I will say I would imagine that it's still the human marketer's job, like you said, to keep the human in the loop, to really filter those ideas, and to apply their expertise and judgement. And most importantly, their own experiences, knowing what works and what doesn't, and what would actually resonate with humans, to choose which of those ideas might be worth taking to the next step. It might need a little bit more tweaking, and probably that tweaking process is still something that's pretty important for making sure that anything that an AI produces in terms of ideas or content can actually be useful. So, at this point, I think AI as a brainstorming partner is probably pretty promising, even if it shouldn't be the final decision maker.
Nataly: And given that a reader's taste is very subjective, right, in what someone likes, what they don’t, and so I'm wondering how does the AI handle this complexity and if there are limitations to it in addition to the ones that you've already mentioned.
Sophie: Mm-hmm. I like to think of AI taste as a really averaged-out taste. So, the way that large language models are trained is they're meant to predict the statistically most likely next word to appear after the previous word. So, that's why they're pretty good at producing fluent language and for even being those detail-oriented readers that we were talking about before. But that also means they're not meant to simulate any specific person or kind of person. They're kind of this average of all the bazillions of words that they've read in all their training data, a lot of which is sourced from the Internet.
So, you end up getting this really powerful machine at simulating language in general, but you don't actually get a sample of a specific kind of taste. So, the strength of this is that you get really smooth and polished writing from AI. And sometimes this is even preferred. I read a study recently that found that some people liked AI poems over classical poems, and I thought that was pretty interesting. But I also had a second thought, which was, "Well, the average person isn't buying classical poem books. It's the kind of person who likes classical poem books that will be buying those books, right?" And those kinds of people, if you show them AI poems versus classical poems, they would probably choose the classical poems.
So, in general, I would say AI could be good at noticing those details that we were talking about earlier and helping us reach those really objective conclusions about like, when you have a question like, "Well, is this happening or not happening?" But when it comes to actually simulating a person's individual taste, that's one of the larger weaknesses. And this is probably something worth keeping in mind if you're using AI to help you generate any sort of potential content or ideas. The weaknesses of AI are that it can produce a lot of clichés. It uses a lot of cheesy language. There's probably not a lot of novelty in the actual ideas that it's bringing up, and there's also a huge risk of biases in using these AI models too. A lot of these large language models are known to have biases towards English, and Eurocentric, and Western values, just because that's where most of the text that they've been trained on has come from.
But obviously, in marketing, sometimes clichés also sell as well. But I think so long as you're not expecting originality and you're expecting help with trying to get some really general, averaged-out, expected kinds of sayings or words, AI can be pretty good at helping you generate that.
Nataly: So, there's an ongoing conversation about AI's role in creative industries. So, there's a lot of curiosity but also a lot of skepticism, which makes sense. Publishing professionals bring so much nuance and creativity, and expertise to their work, which is something that you've already mentioned. So, I'm wondering, how do you see this technology working with that human insight rather than trying to replace it?
Sophie: Mm hmm. Yeah, I absolutely think that some of that skepticism is valid. Anyone who's interacted with AI chatbots has probably eventually come to points where they're like, "Oh, wow, it's really brilliant that it can do this," and also points where it's like, "Oh, there's a reason why we still need a human being to do certain things." So, there are still a lot of big gaps in what AI can do, and I'm expecting that AI will get a lot better in a lot of the things that it's not so good at, but there will still be many areas where we're still always going to need human beings to help us out. And then, of course, there's also the fear of AI taking over people's jobs, a lot of ethical considerations around using AI with privacy, the environment, and copyrighted materials. So, I also understand if people are reluctant to use AI for any of those reasons. But all the same, given the fact that AI can be so powerful and it can be a tool to help us understand a lot of new things, I would encourage anyone who's interested in it to at the very least experiment with how they can use it and to keep in mind that, even as you're using AI, that the human experience and your own human perspective should still stay central to whatever it is that you're doing.
And I would imagine that with publishing, the easy way to do this is to remember that AI can't enjoy reading the same way that people do. When you give an AI a story, it really needs direction to be able to get some sort of meaning out of it. You have to ask it a question. You have to tell it to focus on something specific. But human readers, when we pick up a book, we are so open to whatever the book has to offer us. It can really surprise us. We might relate to a character that we really didn't like at the beginning. We might be surprised by what we got out of it. We might hate it or love it for reasons we had no idea of what happened before we picked up the book. And that sort of experience is the kind of expertise that human beings bring to the table, and it's what drives people to want to buy books and what drives the publishing industry in the first place.
So, I think that in this way, AI can really be a tool to support human creativity and to help us think more about our own experiences. But it obviously can't replace those experiences. So, as professionals bring their expertise and creativity to figuring out how to integrate AI in their workflows, I would just encourage a lot of that sort of of curiosity and a lot of that ... reminding yourself that you have to integrate AI with your own experiences and your own perspectives in order to find the places where it's most appropriate to use.
Nataly: I appreciate that. So, we've reached the end of our questionnaire, but I'm wondering just before we go, how can we follow your research? Are you planning on publishing a paper? What's next for all this experimentation that you were talking about?
Sophie: Yeah, I have a website. And if you want to search up my research, you can search my Google Scholar page too. Some of my papers should show up. And I have some others in the works that I've been hinting at today. One that I'm really excited about on looking at multilingual story moral generation using AI. And if anyone is ever interested in talking more about these things, I'm especially curious about how my research and the things that I'm interested in could possibly be applied in industries like publishing, feel free to shoot me a message. I'd be really happy to chat.
Nataly: That's wonderful. Thank you, Sophie.
Thank you to Sophie for this wonderful conversation.
Before I go, I’d like to take a moment to acknowledge that BookNet Canada’s operations are remote and our colleagues contribute their work from the traditional territories of the Mississaugas of the Credit, the Anishinaabe, the Haudenosaunee, the Wyandot, the Mi’kmaq, the Ojibwa of Fort William First Nation, the Three Fires Confederacy of First Nations (which includes the Ojibwa, the Odawa, and the Potawatomie), and the Métis, the original nations and peoples of the lands we now call Beeton, Brampton, Guelph, Halifax, Thunder Bay, Toronto, Vaughan, and Windsor. We encourage you to visit the native-land.ca website to learn more about the peoples whose land you are listening from today.
Moreover, BookNet endorses the Calls to Action from the Truth and Reconciliation Commission of Canada and supports an ongoing shift from gatekeeping to spacemaking in the book industry. The book industry has long been an industry of gatekeeping. Anyone who works at any stage of the book supply chain carries a responsibility to serve readers by publishing, promoting, and supplying works that represent the wide extent of human experiences and identities in all that complicated intersectionality.
We, at BookNet, are committed to working with our partners in the industry as we move towards a framework that supports "spacemaking," which ensures that marginalized creators and professionals all have the opportunity to contribute, work, and lead. We'd also like to acknowledge the Government of Canada for their financial support through the Canada Book Fund. And of course, thanks to you for listening.


This month’s episode features a conversation about AI and one of its potential applications in the publishing industry: storytelling analysis.