The CHS team had the chance to connect with Leonard Muellner, Director of Publications and Information Technology at the Center for Hellenic Studies, to talk about the background and most recent progress on The Free First Thousand Years of Greek project.
The Free First Thousand Years of Greek project seeks to present source texts of Classical Greek in an open, dynamic corpus that will change radically the accessibility to a multitude of resources and different versions of editions and publications on Greek. This initiative is part of a larger project, the Open Greek and Latin Project.
Q. What is the mission of The Free First Thousand Years of Greek project, and why was it developed?
The Free First Thousand Years of Greek project was born about ten years ago. It was the brainchild of Professor Neel Smith of the Classics Department at the College of the Holy Cross and of the CHS Homer Multitext Project — he is one of the HMT’s two information architects. It was born of frustrations felt by him and staff researchers at the Center for Hellenic Studies with the proprietary, fee-based, and restricted nature of existing resources for computational research on Greek language and literature. The initial goal of the Free 1st 1K of Greek was to create a major subset of the corpus of Greek literature, to cover Archaic, Classical, and Hellenistic literature attested in manuscript form from the beginnings until the 3rd Century CE, more or less — later works that are generally considered indispensable for the study of the earlier ones, like Stobaeus or the Suda, will be included. The collection would be for anyone and everyone to use freely. Another goal was to create tools that will encourage people who make use of such a corpus to participate in its development, improvement, and expansion. The study of Ancient Greece does not need barriers to access in this day and age.
Q. Who have been the main contributors to this project and how has it evolved over time?
A lot of preliminary work, such as creating an elegant and powerful system (called CTS/CITE) for making texts in corpora machine-actionable and developing an electronic catalogue of the works in the corpus, has been carried out by Professor Smith and his colleague Professor Christopher Blackwell (Classics, Furman University), with the help of student interns at Holy Cross who worked under their direction and with financial support from the Center for Hellenic Studies. By “machine actionable” we mean, for example, that one can create an electronic pointer (in the form of a URN) down to the level of a letter of a word in a text or a specific edition of a text.
This spring, the project has new life breathed into it, first by a grant of $50,000 from the Arcadia Fund through the effort of Rhea Lesage, the librarian for Hellenic Studies and Coordinator for the Classics at the Harvard College Library, then by a grant of the same sum, $50,000, from the Center for Hellenic Studies, thanks to the support of the Administrative Director of the Center, Ms. Zoie Lafis, and its Director, Professor Gregory Nagy. The First 1K of Greek has now been adopted as a subset of the Open Greek and Latin (=OGLP) project under the direction of the Humboldt Professor of Computer Science at the University of Leipzig, Professor of Classics at Tufts University, and Editor-in-Chief of the Perseus Project, Gregory Crane. (The goal of the OGLP is to digitize and translate into multiple languages all the works in Greek and Latin from antiquity forward.) With these combined grants and with the help of a growing team of researchers, interns, and technicians, Professor Crane believes that the digitization of the remaining works for the project can be completed soon. A large number of items have already been digitized and encoded by the OGLP and the Perseids group in connection with the coming new edition of Perseus itself, though many need to be updated in their encoding and verified with a suite of Python-based software tools developed by Thibault Clérice, a doctoral candidate at the University of Leipzig. Work will also soon begin in August on designing a user-interface for accessing the corpus that will allow users to download texts, search the corpus, and manipulate it and its elements for analysis to an unprecedented degree. Another crucial development is the extraordinary sophistication and success of sofware for Optical Character Recognition of Greek and for its verification by Professor Bruce Robertson of the Classics Department at Mt. Allison College, New Brunswick.
Q. Have you encountered any obstacles to this process? Where does the project currently stand?
Fragments of larger works embedded in other more complete works present a particularly complex set of problems for computer access, so they were at first ruled out of the collection, but there has been much progress on the technology for making precise references to them accessible by Monica Berti, assistant professor at the University of Leipzig and a long-time staff member of the Perseus Project at Tufts University. The result is that major collections, such as the fragments of Greek historiographers or of Greek comedy and tragedy, are now being incorporated in the corpus. The same is true of papyri and inscriptions, which we hope will also become part of the corpus, though they are not yet part of the project’s workflow.
As of last week (the week of May 12, 2016), the digitization and encoding of the remaining works in the catalogue of the Free First 1K of Greek has begun; next week the first group of undergraduate interns in publications at the Center for Hellenic Studies in Washington will be arriving, and their training in a variety of tasks that are part of this project will begin. Professor Berti will be part of the team to train the CHS interns in the handling of fragmentary texts later in June. Also last week, five graduate and undergraduate students were chosen as interns at the University of Virginia under the auspices of Arts Librarian Lucie Stylianopoulos, and they will be trained and begin work on the project in August and September of this year.