Q&A with Kyle P. Johnson and Luke Hollis of the Classical Language Toolkit


The Classical Language Toolkit (CLTK) has just been accepted as a mentor organization in the Google Summer of Code program for 2016. Through the Google Summer of Code (GSoC), post-secondary students aged 18 and older spend their summer learning to code while working for a mentor organization. Over the last 11 years, over 11,000 students have participated in creating open source content through the program. The student application window for this summer is March 14 to 25.

CHS had a chance to sit down with Kyle P. Johnson, Ph.D. and Luke Hollis, CLTK leaders and GSoC mentors, to learn more about the project and their participation in this innovative, community-building program.

Q. What is the Classical Language Toolkit? Can you tell us a bit about your vision?

Kyle: The Classical Language Toolkit (CLTK) is free and open source software for doing natural language processing in ancient languages (namely, the surviving literature of the entirety of Eurasia and north Africa, from roughly 3000 B.C. to A.D. 1500). Among its goals are to (a) make data sets for particular languages, (b) be a framework for scientific humanist research, and (c) act as a platform for interdisciplinary research. Its ultimate goal is to facilitate an integrated, multidisciplinary study of the ancient world.

Luke: To help enable those who study classical languages, we are currently developing a modern reading interface that provides tools specifically designed to address the challenges that reading classical languages entails. A web-based application presents an ideal medium for including metadata and linguistic information to assist in both serious and casual study of classical texts. The reading interface will also provide a simple and thoughtfully-designed interface for studying text with the CLTK’s advanced natural language processing capabilities.

Q. What prompted you to apply for this grant?

Kyle: The CLTK’s vision is global in nature, so Google is a natural partner. Since the application period opened, we’ve been flooded with hundreds of emails from dozens of countries. It is heartening to learn there are so many young people who share our inclusive understanding of the Classics.

Q. What is it that you seek in potential participating students? How might a participating student contribute to your project over the summer?

Kyle: We welcome students from all disciplines and at all skill levels. For language students, we help tailor tasks which are philological in nature. We guide those from computer science to important linguistic problems and explain how AI can solve them. While for GSoC the bar is especially high, the CLTK has important work for every type of student.

Luke: Students interested in reading environments and annotation systems are invited to contribute to the web application, which we are developing in the JavaScript and Python languages. Students will be involved in building core components of the application such as retrieving text from classical documents and textual metadata from our application programming interface (API), which also enables access to natural language processing tools in the CLTK.

Q. You are also seeking additional mentors. Why is that important to your vision for this project?

Kyle: The CLTK needs experts in all ancient languages and across the breadth of NLP technologies. Since no one person knows more than a few subjects deeply, we are reaching out to a wide range of scholars. For GSoC, we will need at least one, and perhaps two, mentors for each student project. We also have a board of Academic Advisors, comprised of leaders in their fields, who offer sage advice.

Q. What do you hope to accomplish as a result of participating in Summer of Code program? How do you see your project moving forward after this summer?

Kyle: Technology has transformed how scholarship is done. The CLTK is not a static product made by a single scholar, but an evolving synthesis of the work of dozens of contributors. We have received terrific applications for making dependency grammar parsers and machine translation models for many languages (including Greek, Latin, Sanskrit, Old English, and Classical Chinese), and for improving the accuracy of our current Greek and Latin offerings.

Luke: We hope that student involvement in developing the reading environment will push our platform forward as students contribute code and improve the user experience of studying classical texts.  In the upcoming months, we seek to add functionality to our prototype reading interface and create an environment where anyone who is interested in programming and classical languages can work on adding features and contribute code. After this summer, we desire to add more corpora to the reading interface and continue to discover and add textual metadata to make the reading interface a useful application for reading classical languages for years to come.

Q. If a student reading this wants to work with you at the CLTK this summer, what should they do next?

Kyle: Interested students should start on the project’s homepage for information about the application process [https://cltk.org/blog/2016/02/29/cltk-participating-google-summer-code.html].