A computational exploration of the Corpus Platonicum and its influence on the corpus of Ancient Greek literature
When I was only thirteen years old, I faced one of the hardest decisions in my academic life: the choice between learning ancient Greek or being trained in Computer Science. Back then I opted for ancient Greek and followed my interest in Computer Science outside the school curriculum. Little did I know that I would be able to connect those two seemingly different interests in 2017 at the Center for Hellenic Studies. Although I’d already recognised the potential of computer-aided classical philology when I wrote a very traditional text-critical commentary on parts of Petronius’ Satyrica, it was my subsequent employment at the department of Digital Humanities at the University of Leipzig and involvement in its many collaborations that allowed me to produce my own theories on Historical Language Processing and Complex Computational Philology. Now, in 2017, and thanks to the support of CHS, I finally have the chance to resolve my teenage dilemma by applying my computational and philological knowledge to one of the most influential philosophical corpora in history: the Corpus Platonicum.
The Corpus Platonicum is one of the most well-known works of ancient literature and yet it still has unresolved challenges regarding its tetralogical form and the authorship of some works. For scholars, tracing its ideas through two millennia of Greek literature can be a daunting task that not only requires intimate knowledge of the Corpus Platonicum, but also the reading and manual analysis of several million words of Greek. Fortunately, the last decade of Natural Language Processing has developed promising automated analytical methods to process huge amounts of texts. Furthermore, over this same time period, a lot of work has been going on behind the scenes: the Perseus Digital Library and the Open Greek and Latin Project have digitised and curated the preponderance of the Corpus Platonicum. In collaboration with CHS, in what is called the First Thousand Years of Greek Project, OGL has also made the Definitiones of the Corpus Platonicum machine-actionable. My project is therefore timed to build on the work of Perseus, OGL, and CHS, because now, for the first time in history, it is possible to use computational analyses for the whole Corpus Platonicum. This means we can try to disclose complex patterns and intratexuality in this corpus and can help train a machine to detect platonic thinking in a huge corpus of unclassified ancient Greek text.
Often, when Classics scholars think of this kind of digital research, they first think of simple search functionality; yet, it offers so much more. Put simply, complex computational textual research can be broken down into a number of analysis and synthesis processes: researchers distil information from the text in individual analyses and can combine them to form more complex arguments. For instance, individual analyses of functional words, co-occurrence of words, topic-modelling, metrical analysis, or morpho-syntactical analysis can be combined for clustering or classifying text. Additionally, the robust citation architecture provided by CTS further allows us to combine the results of relatively simple analyses into a complex decision pattern that can help to cluster the Corpus Platonicum efficiently, revealing information and producing the statistical basis for decisions regarding authorship and intratexual relations. Therefore, my hope is that this will produce a deeper understanding of the intratextual relations within the Corpus Platonicum and the intertextual relations within all Greek literature accessible through CTS citation at OGL and Perseus.
Thomas Koentges is an Akademischer Assistent (Assistant Professor) at the Alexander-von-Humboldt Department of Digital Humanities at the University of Leipzig, Germany. After completing a traditional PhD summa cum laude in Classics (University of Otago, New Zealand), his research broadened and is now located in digital humanities: in particular, the effects of the organization and delivery of cultural heritage metadata on the work of humanities researchers, the computational research of historical languages, topic modelling, digital stemmatology, and citizen science, as well as the research and production of digital editions and the curation of digital images of manuscripts. His topic-modelling methods have been used when researching Latin literature at the University of Leipzig and have since been applied to other morphologically complex languages, including Ancient Greek, Arabic, Sanskrit, and Persian. Several higher education institutions are using the alpha version of his Latent-Dirichlet-Allocation topic modelling app, (Meletē)ToPān v.0.2. During the CHS fellowship he plans to apply these methods to the Corpus Platonicum and the corpus of Greek literature collected and enriched by CHS and the Perseus Project.