News from CHS | The Free First Thousand Years of Greek over the summer


The CHS team is pleased to share news of the forthcoming The Free First Thousand Years of Greek project. This project seeks to present source texts of Classical Greek in an open, dynamic corpus that will change radically the accessibility to a multitude of resources and different versions of editions and publications on Greek. This initiative is part of a larger project, the Open Greek and Latin Project.
Director of Publications and Information Technology at the Center for Hellenic Studies, Leonard Muellner, shares the most recent progress on The Free First Thousand Years of Greek project.
Q. We are very excited to share the latest news about this important project! Can you tell us a bit about the team and what you accomplished this summer?
We had a talented group of summer interns (Caitlin Miller, Yale University; Josh Blecher-Cohen, Harvard University; and Jack Duff, University of Massachusetts) working at CHS this summer with the OGLP (Open Greek and Latin Project) team at the University of Leipzig, especially Matt Munson, Annette Geisner, Monica Berti as well as Bruce Robertson and Francesca Patten at the Mt. Allison University in New Brunswick, who work on the OCR (Optical character recognition) of Greek texts and their textual apparatus. All teams worked with lists of authors and works that have been compiled by Alison Babeu Jones of the Perseus Project and with a github repository of texts that have been digitized and marked up in what should be compliant EpiDoc XML—they have already been tested electronically for their validity. Many of the major works by the major authors are already digitized and marked up, but there are many smaller texts that need to be worked on, including a few texts that were mistakenly omitted from standard, proprietary corpora.
The major task that the interns accomplished this summer was to find texts in the repository that were not correctly marked up, find what errors they contained, fix those errors, then re-validate them and get them checked back into the repository. They also worked on correcting texts for the OCR process by way of the OCR Challenge site, a publicly accessible workspace, and on reconciling the competing numbering systems for works like Athenaeus’ Deipnosophistae, a huge text that has been digitized and corrected but that inherited two very different numbering systems that need to be reconciled so that they are both accessible to computers. Meantime, with money from a grant to the Harvard University Library, work has been continuing in the marking up and correcting of newly OCR’d texts to 99% accuracy.
Q. What are your goals for the coming months, and when do you hope to complete the digitization phase of this project?
Next week, the team of CHS interns will help train a group of five new undergraduate and graduate interns from the University of Virginia, Charlottesville, in what they learned to do in June and July so that the work can continue throughout the academic year. In addition, the OGLP is applying for a grant to complete the larger project, but starting in January, it will also begin to draw on funds from CHS to develop a user interface into the corpus of texts, both for the Free First Thousand Years of Greek and for the larger corpus of the OGLP and perhaps of other partners in Europe and elsewhere. The funds will also be used to digitize, correct, and mark up whatever texts remain to be done once the grant from the Harvard University Library is expended. We expect the digitization of texts to be complete within about a year’s time, and we hope to begin testing user interface designs this coming spring.
Find out more information about the project on the dedicated webpage.