Large Language Models in Quest for Adventure
Keywords: Large Language Models, Computational Literary Studies, Adventure Novels, Digital Humanities
Tl;dr
How can we use Large Language Models (LLMs) in digital humanities ? This blog post proposes a methodological effort on the use of generative LLMs in a literary context. We attempted to identify what constitutes stereotypical passages in adventure novels. To achieve this, we leveraged GPT3.5 as a tool for large-scale annotation. Human evaluation of the performance shows a convergence of over six times out of ten with the automatic annotations, showing that the model was good enough to the task while still struggling to capture completly abstract and complex notion. We used these annotations to train a model capable of recognizing GPT annotations with 86% accuracy. Upon generalization to a large corpus of literary texts, we discuss the results and their potential implications.
Introduction
"Adventure is the essence of fiction" (Tadié, 2013).
Adventure is a leap forward, a physical and psychological plunge into uncertainty, that irresistible urge that makes us ask, "but what is behind this door?". It is a state of putting oneself in danger, of confrontation with the real, which is not necessarily pleasant to do. Indeed, the leap into adventure introduces a disturbance of state, an eruption of chance that makes, even in homeopathic doses, death possible. This might echo Jankélévitch's thoughts: "Man burns to do what he fears the most" (Jankélévitch, 1963). Our human condition assigns us to a resistance to the desire for adventure, caught between what is reasonable and what is desired. This kierkegaardian anxiety [^1] restricts us in the fear of changing our comfortable daily life and the mad desire to finally push open that door and discover the secrets hidden behind it.
Returning to the introductory quote, fiction is one among many means of satisfying this thirst for the unknown, this longing for elsewhere. Reading fiction is a way to experience this probable death, as if it allows a cathartic resolution of this anxiety to make it bearable. The sub-genre of adventure novels is surely the one that crystallizes this definition of adventure. In the French context, the adventure novel has been narrowly defined in Letourneux's PhD thesis (Letourneux, 2010). He defines a specific period (1870-1930), allowing him to identify the genre consistency: "The importance of exoticism [...], and the central place attributed to violent action that exposes the hero to a risk of death or at least physical peril". The adventure novel is not just a novel with adventures, as it cannot exist without them.
This brings us to the research questions that motivate this post and the method to answer it. What are the links between the entire novelistic fiction and the adventure novel? How is adventure materialized in these adventure novels? Can we detect passages suitable for adventure in some novels that, at first glance, does not seem to lend itself to it? We want to delve into the texts and understand which scenes exactly play an important role in classifying what falls under adventure or not. And the related question would be, is it possible to find in the adventure novel what Tomachevski calls the "dominant" of the genre (Tomachevski, 1966), that is, all the specific traits of the genre.
Our approach here is the one of the field of computational literary studies, and this project allows us to test the advantages and limitations of a technology that is taking up more and more space in our field: LLMs.
Method
We started with randomly segmented texts, ensuring each segment began with a complete sentence. For the passage length, we chose an arbitrary value of 3000 words (approximately 9 pages). The idea was to have relatively short passages for efficient annotation, yet not too short, as one of the primary objectives was to compare embeddings of raw text with embeddings derived from data retrieved by BookNLP-fr (Lattice, 2021), a NLP pipeline capable of extracting semantic information, about how characters are characterized or the spatio-temporal information.
We randomly sampled 1000 passages from an adventure novel corpus that we would annotate automatically using GPT-3.5 turbo. For each text snippet, the API call was accompanied by an instruction providing a broad definition of what we wanted to detect, specifically an adventure scene.
instruction = "Give me as an output only one word, if this text is typical of the adventure novels genre; write ADVENTURE; elif unsure; write NON_ADVENTURE. Prefer NON_ADVENTURE when unsure. HELP: adventure novels are characterized by the importance of change of scenery (historical/geographical/ fantastical or social) and of violent action putting the hero in mortal danger or physical peril. Typical adventure scene: someone (described as brave/heroic) doing something dangerous in a heroic manner in a wild setting."
A check on 100 GPT annotations revealed a human-machine agreement (kappa) of 0.73. This shows that GPT-3.5 is not fully capable of handling abstract concepts like 'adventure' and annotates too many elements (as soon as any hint of action or description occurs). While it is still struggling, the results were still interesting, and we decided to train a model to recognize GPT annotations.
Results
The next step was to generalize these annotations across a large novel corpus. To save a non-extensible PhD scholarship (an understatement) and not to further enrich OpenAI, we decided to train a model on these synthetic data and have it predict on the rest of the corpus. For this, we needed a representation of text snippets, and the embedding approach took precedence. We compared the performance of OpenAI's ADA encoder with a widely used method in our field, paragraph vectors or Doc2vec (Quoc, 2014). A sklearn SVM model (Pedregosa, 2011) trained on synthetic data achieved performance up to 86% efficiency. For the remaining 14%, it was challenging to determine whether these were errors from GPT in annotation, from embedding models, or from SVM. The method needs further strengthening, but the performance seemed good enough to generalize across the entire corpus.
All tokens | BookNLP tokens | Random tokens | |
---|---|---|---|
ADA embeddings | 0.86 | 0.77 | 0.72 |
DBoW | 0.78 | 0.67 | 0.63 |
Tableau 1: Benchmark Evaluation
A note on the results with BookNLP-fr features: The model's performance on these elements was surprisingly good! We lose about 10 points of performance, but it's still a pleasant surprise since BookNLP-fr retrieves approximately 100 to 150 tokens per 3000-word snippet and loses all word order information. So the information contained in these words (verbs and adjectives of characters, spatio-temporal information) is relevant enough for the task, which is reassuring regarding the quality of BookNLP-fr and our human certainties: an adventure scene is characterized in part by its spatio-temporal unfamiliarity.
Multiscale Analysis
Novel Scale
Twenty Thousand Leagues Under the Sea
Our initial idea to assess the validity of our approach was to return to the scale of a novel, specifically the most famous adventure novel in the Francophone sphere: Twenty Thousand Leagues Under the Sea.
Almost all passages from the novel are considered stereotypical of the adventure genre for our model, with nearly all scores oscillating between 0.9 and 1. Here is an example of a passage, with a few sentences from segment 100:
During this part of the journey, we sailed for entire days on the surface of the waves. The sea seemed abandoned. Barely a few sailing ships, laden for the Indies, heading towards the Cape of Good Hope. One day we were pursued by the boats of a whaler who undoubtedly mistook us for some enormous whale of great value. But Captain Nemo did not want to waste these brave people's time and efforts, and he ended the chase by diving beneath the waters.
This part of the passage reveals a calm and seemingly monotonous period of the narrator's journey aboard the Nautilus. Navigating on the surface of the sea for entire days suggests a soothing routine but also a sense of solitude, as the sea is described as "seemed abandoned". The limited presence of sailing ships, mainly bound for the Cape of Good Hope, reinforces the idea of isolation. However, the element of adventure emerges when the Nautilus is pursued by the boats of a whaler. This adds an element of suspense and danger to the narration. The whaler's mistake, taking the Nautilus for a highly valuable whale, underscores the mysterious and unique nature of the submarine.
Sentimental Education
We then continued our validation quest by taking a deliberately non-representative adventure novel: Gustave Flaubert's Sentimentale Education. Next Figure shows also a quite stable but inverted trend compared to Jules Verne's novel. Indeed, most passages fall between 0 and 0.2% probability of representing an adventure scene, which is quite consistent with what we know about Flaubert's novel. However, a few outliers cases draw our attention, especially passage 121, which obtains almost 80% probability of being an adventure scene. Here is an excerpt:
Frédéric was walking on the road when suddenly a sentinel crossed bayonet with him. Four men seized him, shouting: 'He's one of them! Be careful! Search him! Brigand! Rascal!' And his astonishment was so profound that he let himself be taken to the barrier post, right in the roundabout where the boulevards des Gobelins and de l'Hôpital converge with the streets Godefroy and Mouffetard. Four barricades formed, at the end of the four roads, enormous mounds of cobblestones; torches flickered here and there; despite the rising dust, he distinguished infantrymen of the line and National Guards, all with black faces, disheveled, haggard. They had just taken the place, had shot several men; their anger still lingered. Frédéric said he came from Fontainebleau to help a wounded comrade living on Rue Bellefond; at first, no one believed him; they examined his hands, even sniffed his ear to make sure he didn't smell of gunpowder.
This excerpt clearly contains adventure elements; Flaubert here refers to the events of the 1848 Revolution in Paris, which Frédéric observes with little interest in the narrative. However, in this passage, Frédéric suddenly encounters a situation of conflict and is accused by the sentinels. There's a sense of urgency and tension, especially with the soldiers expressing suspicion and using strong language ("Brigand! Rascal!"). The mention of barricades, torches, and the aftermath of a conflict further contributes to the adventure-like atmosphere. The probability assigned to our passage is, therefore, quite well-justified in this case.
Subgenre Scale
In the following paragraphs, we present the results obtained for 10,000 text excerpts, which seemed representative compared to our base corpus (the "Chapitres" corpus, approximately 180,000 passages of 3000 tokens).
On the scale of sub-genres: Remember Tadié's quote, "Adventure is the essence of fiction". The adventure passage is not limited to the specific novel genre of adventure novels. Another form of validation for our approach would be to observe the presence of adventure-like passages in other novel sub-genres, meaning passages featuring significant displacement or scenes endangering the novel's heroes. We conducted this test using the "Chapitres" corpus, a collection of nearly 3000 French novels (Leblond, 2022). The period covered spans two centuries of novel production, from the 19th to the 20th century. A notable advantage of this corpus is that nearly two-thirds of it is annotated with sub-genre labels. This annotation is based on the classification of the Bibliothèque nationale de France. We chose to focus our analysis on the ten most predominant sub-genres in the corpus: adventure novels, romance, detective fiction, youth literature, memoirs, fantasy, historical novels, travel novels, science fiction, and erotic novels.
In previous Figure, it is interesting to note that the adventure novel is not the sub-genre that contains the most stereotypical adventure passages. Indeed, while there are approximately 50% adventure passages in adventure novels, the sub-genres of travel novels and science fiction possess 60% and 57%, respectively. This makes sense, considering that we emphasized in the adventure definitions the importance of displacement. However, we still observe that our definition might not have been restrictive enough from the perspective of endangering adventure heroes, even though our adventure definition intended to be broader. Nevertheless, these results are quite satisfactory, with good scores for historical and fantasy novels on one side and relatively low scores for autobiographical novels, erotic novels, and romance on the other.
Corpus Scale
On a 1/18 scale of the "Chapitres" corpus:
The following figure illustrates the proportion of adventure passages aggregated by year over two centuries of literature. The result is striking: Typical adventure passages decrease over time, accounting for nearly 30% of passages in the early 19th century to 15% in the early 21st century. These results should be taken with caution, but the trend seems significant and could be explained by several factors:
The first factor is the temporal distribution of the adventure sub-genre. Indeed, it is the most prolific sub-genre in the "Chapitres" corpus (over 360 texts), all of which fall between 1840 and 1880.
The second argument revolves around the potential shift in the consumption medium for adventures. It may not be that people are less interested in adventure, but rather that they no longer read literature to fulfill this need. For the latter part of the 20th century, the rise of cinema and a shift in the medium through which people consume adventure works could explain this overall trend. Thus, one might observe the diffusion of adventure into other formats and onto different platforms than purely literary ones. However, people retain a certain interest in this narrative approach, as evidenced by a kind of plateau throughout the second half of the 20th century, which could be explained by the emergence and persistence of sub-genres such as science fiction or fantasy.
Limitations and Further Work
More time should be dedicated to annotations to assess the extent to which GPT follows the instructions and to give more credit to these synthetic data. A major issue we haven't delved into is regarding the memorization of GPT 3.5. Are the synthetic annotations we retrieve only good because we present canonical texts to the LLM, which are likely in the training corpus? A more rigorous examination of this aspect seems important for future approaches, as a research team demonstrated the significance of these issues with a practical test that could be used to determine the depth of a model's memorization of a book (Chang, 2023). In essence, they ask the model to fill in a missing character's first name. By conducting the test, Chang and his colleagues found that GPT-4 recalls many books in detail. Moreover, this is related to the accuracy of some other tasks, such as the accuracy of estimating publication dates. These results may not seem significant at first glance, but they can be practically problematic if the performance declines when annotating passages from lesser-known books (i.e., >98% of historical corpora for the 19th century).
Conclusion
To wrap up, this initial experiment suggests that LLMs are interesting tools for computational humanities scholars. We have demonstrated that first automated annotation of a specific case is possible with these LLMs, involving a few limitations (we had to keep the task simple, not too abstract...). Second, a distillation or generalization of these annotations to large literary corpora is achievable.
The methodology is likely to evolve in the coming months, with the advent of open models or the exploration of fine-tuned models on specialized tasks and corpora.
However, while LLMs present exciting research possibilities, the need for openness, trustworthiness and the question of whether they are indispensable components for humanists are still pending.
Additional Resources
Code and data used for this post are available on Github here.
The ideas discussed here were previously presented in Paris at a workshop on AI for the analysis of literary corpora workshop link.
Notes
[^1]: "When we observe children, we find the anxiety more definitely intimated as a seeking after the adventurous, the prodigious, and the mysterious" (Kierkegaard, 1844).
References
Jean-Yves Tadié. Le roman d’aventures. Number 396 in Collection Tel. Gallimard, 2013. ISBN 978-2-07-013966-8.
Vladimir Jankélévitch. L’aventure, l’ennui, le sérieux. Champs - Essais. Flammarion, 2023. ISBN 978-2-08-043573-6. URL
Søren Kierkegaard. The concept of anxiety: a simple psychologically oriented deliberation in view of the dogmatic problem of hereditary sin. Liveright Publishing Corporation, liveright paperback edition, 2015. ISBN 978-1-63149-004-0. OCLC: 880566265.
Matthieu Letourneux. Le roman d’aventures: 1870-1930, 2010. ISBN: 9782842875084 Series: Médiatextes.
Boris Tomachevski. Thématique. In Cvetan Todorov, editor, Théorie de la littérature: textes des formalistes russes., préface Roman Jakobson, Paris, Seuil, 1966, p. 263‑307 Éd. rev. et corr edition, 2001. ISBN 978-2-02-049749-7.
F. Mélanie, J. Barré, O. Seminck, C. Plancq, M. Naguib, M. Pastor, and T. Poibeau. BookNLP-fr, 2022. URL
Quoc V. Le and Tomás Mikolov. Distributed representations of sentences and documents, 2014.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
Aude Leblond. Corpus chapitres, December 2022. URL
Kent Chang, Mackenzie Cramer, Sandeep Soni, and David Bamman. Speak, memory: An archaeology of books known to ChatGPT/GPT-4. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7312–7327, Singapore, December 2023. Association for Computational Linguistics. URL