Susan Guettel Cole
From GML to XML
A project to publish a collection of inscriptions about the cult of Dionysos straddles two different methods of recording and sorting evidence. Data originally collected on index cards in the early 1980’s and encoded with structured markup in GML on an IBM mainframe in the twilight period of mainframe technology has hibernated until XML and Unicode have made possible a stable environment for preservation, retrieval , and dissemination of the material. The editors of The Stoa Consortium (www.stoa.org) have overseen conversion of over 900 documents in this project to XML, and those files may now be viewed on a website for browsing, searching, and downloading texts, translations, and commentaries on the subject of Dionysos.
Looking for Dionysos [top]
Dionysos was to be found everywhere in the lands around the ancient Mediterranean, omnipresent in literature, ubiquitous in the visible culture, and a partner in such hazardous experiences as drinking wine, viewing mimetic representations on the stage, or facing death. A god who marked the dangerous boundary between elf and other, his gifts held the promise of pleasures, but crossing that boundary without his protection could carry risks. Dionysos was perhaps the most popular divinity in all the lands where Greek influence was felt. In spite of the god’s great appeal and popularity, however, few traces of conventional ritual actually survive. Euripides’ fictional account in the Bakkhai , although taken by modern scholars as a paradigm for bacchic experience, obscures the facts. As a result, this god is as elusive for us as he seems to have been for his ancient audiences. He had few sanctuaries, and surviving temples are rare. Moreover, the sparse records for his sacrificial procedures indicate that his rituals could be abnormal. Interested in the mysteries of Dionysos, I turned to the epigraphical evidence for Dionysian cult to try to explain the social structure of his worshippers, the organization of his ritual, and the relation between private groups and public administration of his ceremonies. In 1979, with support from the NEH, I began to collect epigraphical texts relating to the activities of those who worshipped this puzzling god.
The Era of the Index Card [top]
When I began the project, I had no idea that it would become so large. I began by collecting texts that mentioned μυστ?ρια of Dionysos. I soon realized that there was no category that could be defined as “mysteries” because there was no discernible hard boundary between mysteries and other forms of Dionysian worship.
It is useful at this point to recall how we organized our research in the long period of scholarly achievement Before Computers. This was the Era of the Index Card. No one I knew had ever touched a computer. I used the largest cards I could find and began by making a card for every text I found. I sorted them by city. I read every book about Dionysos I could get my hands on. My method was determined in part by the available bibliographic tools. The great obstacle for epigraphical research in the mid-seventies was the hiatus in the publication of the SEG. The only bibliographical tools available on a year by year basis were L’année philologique and Louis Robert’s Bulletin épigraphique. These are fine tools, but they are not set up to isolate individual texts. I spent the summer of 1979 in the library, going through several decades of these resources just to supplement the bibliographies of the standard works of 20th century scholars who had written extensively on the epigraphy of Dionysian cult, people like Quandt, Nilsson, and Burkert. In 1979 at the Center for Hellenic Studies I read and sorted the texts themselves. I spent the summer of 1980 at the American School of Classical Studies in Athens, where I systematically took every epigraphical publication off the shelf and went through it searching for texts. The cards filled up with references, and the files of papers continued to grow.
Community Computing [top]
In 1982 Michael Alexander, a colleague at the University of Illinois at Chicago, introduced me to my first computer. ActualIy, I never actually saw this machine. What I faced was a giant terminal, one of dozens in a large university lab filled with students and noise. The terminal weighed over fifty pounds and had a black and white screen. I began to learn how to edit texts in a program called Wylbur using codes from Waterloo Script. All emphasized words had to have their own line beginning with a code. Originally, changes could be made only calling up individual lines in the command line, but we did not know how a file was formatted until it came from the printer.1 We had to wait hours for a printout, which was printed on dot-matrix printers entirely in upper case letters on large sheets of connected pages. The IBM mainframe computer that powered all of this was the size of a small battleship; it filled rooms full of glass cases and whirling tapes.
Notes to this section
1. Michael Alexander recalls,“With Wylbur you typed in your line-by-line text, formatted it, and then printed it out. So when you proofread it and wanted to make changes, you needed to go back and find the number of the line you wanted to change, and then call up that line specifically in order to change it (e.g., m 325). You did your Wylbur composing and editing in one system (Wylbur), and then used CMS to issue commands like Script and Print.” [back]
Digital life was greatly improved when one of the directors of the computing center gave me a flat keyboard (similar to the flat keyboard on a cash register at McDonald’s) that could be attached to a television set and hooked up to the telephone. I no longer had to sit in a noisy lab, but could type at home. There were problems with this arrangement, however. The editing line on the television screen was only forty characters wide, and a single sentence could fill the entire screen (our television was a portable black and white, 12 inch model). When I was working, my children could neither watch television nor use the telephone.
In 1982-83 I began to write the commentaries on the texts of Ionia, and I spent 1983-84 at the Institute für Altertumskunde at the University of Cologne, with its rich store of epigraphical resources. When I boarded the plane in Chicago, I carried the index cards, now heavier than a large cement construction block, with me. The bag was so heavy I could hardly walk up the ramp. When the attendant tried to lift it to put it in the overhead bin, she was not even able to move it. Today I would probably be put off the plane. In Cologne, I began to type the commentaries on an old IBM selectric typewriter with Greek and Latin fonts on little exchangeable balls.
When I returned to Chicago, the files were larger and the bag of index cards much heavier. In 1984 the Computer Center introduced X-Edit, an economical and well-designed IBM editing program with a mobile cursor that could be moved with the arrow keys.2 The improvement was considered a phenomenon. The screen was efficient, global changes were easy, and the terminals were now light in weight, becoming cheaper (although still costing more than $300), and beginning to be sprinkled in ones and twos around the campus. In 1984 I was able to have a terminal in my own office. Still powered by the IBM mainframe, this terminal could handle more than one font only with long, complicated entity codes for each character not on the keyboard. Nevertheless, I was able to prepare the framework for the Dionysos project. Modelling my city files on the Cologne publications, I entered all the material on the index cards except the Greek texts. The next year I met Michael Sperberg-McQueen, now on the staff of the UIC Computing Center. Already hatching the Text Encoding Initiative, he was interested in the encoding of non-standard scripts. He showed me how to design a template in GML (General Markup Language).
Notes to this section
2. Michael Alexander recalls, “With X-Edit you could just move along through the lines, pick out what you wanted to change, and change it on the spot.” [back]
GML was one of the backbones of Waterloo Script. These were the programs the University provided. In 1986, at UIC there was no institutional funding for personal computers in the Humanities. As it was, I had to apply to the campus Research Board for a grant, just to have a terminal in my office. GML was a very well designed program. It required both starting and ending codes and permitted a sufficiently flexible structure to handle a variety of situations and needs. I set up each inscription with a description, bibliography of editions, bibliography of scholarly works commenting on the particular text, date, text, translation, and line by line commentary.
Thomas Corsten, who had recently finished his doctoral work with Reinhold Merkelbach at Cologne, joined me in 1986. He was supported by the Feodor Lynnen Program of the Alexander von Humboldt-Stiftung and the Packard Humanities Institute. The Packard Humanities Institute loaned us an Ibycus computer, and from this time on we were able to enter the Greek texts. The PHI CD was not yet finished, but other scholars shared their material. William West at UNC sent a computer tape of the first volume of IGI3 (which he had typed in code himself). Donald McCabe, working on Ionia and Karia at the Institute of Advanced Studies at Princeton, sent me tapes of texts from this project. These were important because the texts of many π?λεις had not been collected since the CIG was published in the late 19th century. Nancy Kelly at Cornell, working on the IG digitization project, sent me disks for Delos and Athens. At this time, any Greek we used, except for that on the Ibycus, was in Beta Code. When I was invited to teach at the University of Michigan in 1988, we packed up the Ibycus, hired a truck, and moved both families to Ann Arbor for the spring semester.
At just about this time, IBM began to market a terminal with a dual keyboard able to handle two character sets. Michael Sperberg-McQueen encouraged me to ask the Research Board to fund one for my project, and he showed me how to to design my own screen font. We set up the second keyboard in Beta code with the same keyboard arrangement as the Ibycus computer. A student programmer linked the keyboard with the ASCI codes for each symbol, so that the Xerox 8700 Laser Printer could print it out. This printer, which resided in the Science and Engineering building, was the size of a small car. We were now able to convert all of the files from the Ibycus to mainframe format. Colleagues working elsewhere on electronic projects were generous with their help and service. We sent the Ibycus disks to Cornell to be translated into DOS so that they could be mounted and added to the growing collection of digitized epigraphical texts already encoded in the GML master text on the mainframe.
I finally abandoned the index cards when Michael Sperberg-McQueen told me about setting reference codes. I wrote driver files, and the whole manuscript could now be printed at once. These were the glory days. The coding was accurate, corrections were fairly simple, and global search and change commands made it easy to find errors.In 1992 I had over 900 texts collected in a 680 page manuscript.
The Obstacle Course [top]
When I moved to the University at Buffalo in 1992, the computing world had changed. Personal computers had appeared even on the desks of humanists. Soon after I arrived in Buffalo, the university abandoned the IBM mainframe. I was left to write macros to convert my files to Word Perfect for the Macintosh. The pre-Power Mac in my office was so slow that I started macros when I left for class, so that they would be finished when I returned. The SE-30 I had at home was even slower.
I eventually finished the conversions, but I was disappointed in the result. I had chosen Word Perfect over Microsoft word because Word Perfect allowed the user to see the Word Perfect codes. I was dismayed, however, by the loss of the carefully prepared GML codes and the lack of detail in the word processing files. I decided not to continue the project until I could preserve the markup of the GML files. (When Word Perfect for the Mac was abandoned when the company was sold, I knew I had done the right thing.) At any rate, the decision seemed sensible at the time. I knew about SGML, but there was no professional consultation readily available on campus, and the task of yet another conversion seemed formidable. Gradually drawn into administrative tasks for a department in the throes of rebuilding and distracted by other research projects, I was convinced that eventually I would find someone on the local level who could rescue my files and put Dionysos back on track. What I did not realize was how long this would take and how far away from home I would have to go.
The University at Buffalo is content to let humanists use their computers like typewriters. Administrators are happy to spend as few dollars as possible on research that will never win large outside grants. We are a Microsoft campus, and structured markup is definitely not a priority. Learning HTML when the Internet hit the campus was as far as most of the consulting staff wanted to go.
The Happy Ending [top]
I had long been confident that the solution to my file conversion problems was to be found at the TEI website. When the Stoa appeared, I knew that other classicists were blazing the trail. I realized that XML and Unicode were the tools I needed, but I was not able to figure out how to validate a file. Nevertheless, I did the preliminary work for conversion, preparing translation tables for the Greek characters and defining the GML tags used to mark up the text. In 2002 Sandra Boero-Imwinkelreid converted the files to HTML Transitional encoding and mounted both the original GML files and the HTML files on a University at Buffalo website. Shortly after this the Center for Hellenic Studies announced an XML workshop for the summer of 2003. I wrote in to describe the project, now easily accessible on the Internet, and imagine my surprise in March, 2003, to hear from Michael David Jones, then a senior at the University of Kentucky, majoring in Classics and Computer Science. As part of the Stoa team led by Ross Scaife, Michael Jones was writing a Perl script to convert the Dionysos GML files to TEI-conformant XML. The rest of the story is easy to tell.
The Stoa team has provided technological skills to convert over 900 documents to XML, and we are now starting to develop ideas for a web site where my project can be available for searching, collecting, and downloading texts, translations, and commentaries on the subject of Dionysos. New texts can easily be added as they are finished, and the entire collection can also eventually be formatted for a conventional publication in book form.
Since I began this project, there have been many new finds, and scholarship has not stood still. Riedweg has edited the gold tablets, almost certainly produced by worshippers of Dionysos; Le Guen has collected a corpus of texts about the Dionysiac τεχν?ται; and Jaccottet has published a study of texts issued by Dionysiac groups. Their work will enrich the larger project. For its progress I am, as ever, indebted to Sandra and the three Michaels, all of whom at one time or another have guided this project through the Early Electronic Era and into the Digital Age.
A brief selection of recent works on the epigraphy of the cult of Dionysos.
To refer to this please cite it in this way:
Susan Guettel Cole, “From GML to XML,” C. Blackwell, R. Scaife, edd., Classics@ volume 2: C. Dué & M. Ebbott, executive editors, The Center for Hellenic Studies of Harvard University, edition of April 3, 2004.