Four URLs, Limitless Apps: Separation of Concerns in the Homer Multitext Architecture
D.N. Smith, C.W. Blackwell
This paper summarizes both the underlying scholarly model and the implementation as network services of the scholarly repository of the Homer Multitext project (HMT). We fully expose the rich data repository of the HMT in four network services, keyed by citation of objects using URN notation.
While Gregory Nagy’s prolific publication record will be familiar to all readers of this Festschrift, we offer this article as a small token of gratitude for a less visible aspect of his pioneering professional work. The Homer Multitext project originates both in his scholarship on Homeric poetry, and in his teaching of students like Casey Dué and Mary Ebbott, who formulated together with Nagy the idea of a digital multitext of Homer. As Director of the Center for Hellenic Studies, Nagy has been a mentor to the Homer Multitext project, and has provided intellectual, moral and material support for the development of the digital architecture summarized here. Over the decade that we have worked on the Homer Multitext project, we have tried to respond to a number of questions that are fundamental to many kinds of digital scholarship, but that can only be effectively addressed through sustained, intense engagement, in consultation with the best scholars working in digital classics today. We were able to do this only because of the support that Nagy gave us through the CHS.
Scholarship begins with citation. A scholarly citation identifies an object of study uniquely, with precision, and completely. A citation is independent of any implementing technology. “Homer, Iliad, edition of Wolf (1804), Book 1, line 1” can refer to only one string of Greek words; the citation is meaningful whether a scholar consults a printed text of Wolf’s edition, or a scanned copy in Google Books. A citation like this is an ontology, capturing key semantics of the object it points to: there is a group of works under the heading “Homer”; the Iliad is one of them; there is a particular edition of the Iliad by Wolf and from 1804; we are pointing to a unit, Book 1, and a sub-unit within it, line 1. If we are discussing the words, Μῆνιν ἄειδε, θεά, Πηληϊάδεω Ἀχιλῆος, as Friedrich August Wolf asserted in his 1804 edition, there is nothing we could take away from that citation, and nothing we need to add.
That citation is scholarly in that it is complete, precise, and independent of implementation, but it is not machine actionable
. The international standard of “Uniform Resource Names” (or URNs
) allows us to make scholarly citations that are unique, complete, precise, and machine actionable. URNs can also be hierarchical, allowing us to cite at greater or lesser degrees of precision.
The URN notation is widely used across many disciplines and industries that require the exchange of information across networks. The standard requires that a URN begin with the string urn: followed by an identifier for the domain covered by this URN. For the Homer Multitext, URNs that identify passages of text fall in the domain cts, for Canonical Text Services: urn:cts:…. Data objects in collections—manuscript folios, images, and so on—are in the cite domain: urn:cite:….
CTS URNs illustrate the potential of URN notation to express a data model in a citation. CTS URNs refer to texts as an ordered hierarchy of textual units. This model itself is a revision of an initial hypothesis which proposed that the true nature of a text is an “ordered hierarchy of content objects”: book, chapter, section, paragraph, sentence, clause, phrase, word, character (DeRose, 1990). This hypothesis (the OHCO hypothesis) was abandoned by the original authors, who later recognized that different perspectives on the content of a text might require conflicting, overlapping hierarchies: no single structure can capture the ”true" nature of a text (Rennear 1996). We adopt a pragmatic modification of the OHCO theory based on our practice of citation (Smith and Weaver 2009). No matter what analytic perspective on a text we choose, we must agree on a canonical form of citation. These citation units form an agreed upon ordered hierarchy of citation objects: even if we want to analyze units that overlap citation units (such as sentences overlapping poetic lines), we will refer to the analytical units in terms of citation units. We refer to this model as OHCO2.
In the OHCO2 model, texts are composed of citable units that have the following properties
- each unit has a place in a sequence
- each unit has a place in a citation hierarchy
- each unit belongs to a version of the text situated in an ontological hierarchy of group-author-[edition/translation]-exemplar
- citation units may include further structural information (markup), in addition to the explicit identification of the citation structure
A CTS URN identifies itself as a URN in the Canonical Text Services domain (urn:cts:…), and further specifies the namespace for subsequent identification of texts. Works in Ancient Greek that are preserved through manuscript tradition are identified (and their identifications are guaranteed to be unique) by the namespace greekLit (urn:cts:greekLit:…). Following the namespace is a hierarchy identifying the work, analogous to the hierarchy of the Functional Requirements for Bibliographic Records (FRBR). Texts exist in a “text-group,” such as “Homeric poetry.” A text-group contains one or more “works”: Iliad, Odyssey. In the greekLit namespace, identifiers for groups and works follow, where possible, the identifications in the Thesaurus Linguae Gracea: Canon of Greek Authors and Works (Berkowitz, 1990).
A CTS URN can stop here, pointing to a “work” in a “text-group”, e.g. urn:ctsl:greekLit:tlg0012.tgl001. This is a notional work, the “_Iliad_” considered as an abstraction that includes every edition and translation of the poem. To refer to a specific edition or translation, we can add a level, the edition-level, to the URN: urn:ctsl:greekLit:tlg0012.tgl001.msA, referring to our transcription of the Iliad as it appears on the Venetus A manuscript. At this point, our CTS URN scheme of citation captures the ontology of a work that was largely sufficient in a world of physical libraries of printed books.
We expect a digital library service to be able to reach inside a text, to find, and deliver passages of text; we expect to be able to do this for large passages, (in order to display a book of the Iliad in a web browser), or for individual characters (for example, to discuss the reconstruction of a damaged fragment of text). The CTS URN standard follows the version-level element with an additional citation string identifying a passage, in the citation hierarchy defined for that work. For the Iliad, we cite by book and line, so urn:ctsl:greekLit:tlg0012.tgl001.msA:1.1 refers to our edition of the Venetus A’s text of the Iliad, Book 1, line 1. Both elements of the citation 1.1 are arbitrary labels. In this case, they resemble numbers, but there is no implicit or explicit promise that the next passage will be 1.2. Different texts are cited differently, in hierarchies of different depths. Hesiod’s Theogony would have a one-level citation; Euclid’s Elements has citations like 1.proposition.21, and so forth.
A CTS URN can identify a range of passages: urn:ctsl:greekLit:tlg0012.tgl001.msA:1.610-2.2 refers to the four lines of poetry in one edition of the Iliad that include the last two lines of Book 1, and the first two lines of Book 2. Our CTS URN scheme of citation can reach inside the most granular citable unit of text to identify specific words and characters: urn:ctsl:greekLit:tlg0012.tgl001.msA:1.1#μῆνιν(1), refers to the first instance of the string “μῆνιν” in Book 1, line 1, of our edition of the Venetus A’s text of the Iliad.
Since the sole function of our citation values is to identify a passage uniquely, we can apply them equally well to all versions of the text. An edition of a papyrus containing only a few lines of the Iliad does not have to supply a text for the entire poem, and lines appearing in a different order from that of Byzantine manuscripts can simply be labeled with the appropriate identifier. We can leave it up to our text management systems to enumerate the sections of a text, and determine which citable passage follows which in a particular edition.
Since our citations are independent of any technology, we can even use them to refer to texts that are not yet online. In the course of our editorial work, we often want to state that a particular range of lines occurs on a particular folio of a manuscript before that text has been edited. We can use a CTS URN even at that stage of our work: it will remain a valid citation when a digital text becomes available.
This exemplifies an important guiding principle for all digital scholarship: separation of concerns. How we cite a text does not depend on the progress of our editorial work: those are separable problems. In choosing an archival format for our data, we need to ensure that we represent the structural information reflected in a CTS URN, but we do not need to concern ourselves with how an application will retrieve this information. In designing retrieval services using CTS URNs, we do not need to define how end-users will see this passage.
This brief overview of the CTS URN has shown that this standard does the following:
Citing discrete objects
collaborators spent several years developing the CTS URN standard. We then broadened our original work with texts to encompass citation of other kinds of material, in an architecture we have called the CITE architecture. (For an overview of the CITE architecture, see http://www.homermultitext.org/hmt-doc/cite/index.html
.) We developed a URN syntax for citing collections of objects with a similar structure, called CITE Object URNs, or just CITE URNs for short.
Each CITE Collection belongs to a CITE namespace, or group. The Homer Multitext project’s collections are in the group hmt (urn:cite:hmt). Since all objects in a CITE Collection share a common structure, individual collections resemble (and could be implemented with) databases. urn:cite:hmt:msA identifies a collection in the Homer Multitext group for pages of the Venetus A manuscript, and already implies that every object in that collection will have the same properties. urn:cite:hmt:msA.msA-12r uniquely identifies a single object in that collection, the folio 12 recto of the Venetus A manuscript.
An essential part of the Homer Multitext is our extensive collection of the project’s digital photography of manuscripts in the Biblioteca Marciana in Venice, and the Escorial Monastery near Madrid. We have organized our archive so that every passage of text in our diplomatic editions is related to citations of visual documentation. (See below, on “Graphs of relations.”) Citing the unique image is not enough for these purposes: we need to be able to indicate more specifically a region of an image where we observe a feature or passage of text.
The CITE URN syntax allows an optional, type-specific extension to the canonical identifier for a discrete object. For images, we extend the CITE URN notation with a scale-independent rectangle bounding a region of interest (RoI). The extended RoI element is made up of four numbers in the range 0 <= n < 1.0. Their values give:
- left edge of the region
- top edge of the region
The URN urn:cite:hmt:chsimg.VA001VN-0503
identifies a unique object in the chsimg
Collection; the URN urn:cite:hmt:chsimg.VA001VN-0503:0,0,0.5,0.5
specifically cites a region on that unique image. (The region is the top left quarter of the image: that is, beginning at the top left corner — 0,0 — and extending across 50% of the image in width, and 50% in height.)
Using this RoI extension, we can easily cite the visual evidence for specific observations. The URN urn:cite:hmt:chsimg.VA012RN-0013:0.045,0.2225,0.135,0.0938
defines a rectangle on a digital image that includes the large initial letter Mu, the first letter of the first word of the Iliad
on the Venetus A manuscript, for example. Submitting this URN to an image service, we can retrieve a slice of the binary data illustrating that feature: see http://amphoreus.hpcc.uh.edu/tomcat/chsimg/Img?&request=GetBinaryImage&urn=urn:cite:hmt:chsimg.VA012RN–0013:0.045,0.2225,0.135,0.0938&w=3000
(Blackwell 2011). (See below under “Retrieval,” including notes below on how we can mask complex URLs for human readers.)
Our decade-long engagement with digital scholarship has brought us to a point where we can discuss the entire contents of the HMT archive — texts, data documenting objects, and images — at any level from the most abstract to the most fine-grained, using concise, unique, human-readable and machine-actionable citations.
From the beginning of the Homer Multitext project, all the principal contributors have wanted to rethink how applications working with a digital multitext could support more fully reproducible and verifiable scholarship than print media allow: we want to publish not just conclusions or results, but a replicable process. We are designing a Homer Multitext machine more than a Homer Multitext publication
The most fundamental scholarly action that all higher-order or analytical actions depend on is retrieval: that is, looking up a digital representation of an object. Human users will often require no more than looking up an object — a passage of text to read, or an image to view — but software will also require an automated way of retrieving digital objects. When software and end users alike are joined together by the global internet, the obvious way to implement retrieval is with a network service (discussed further in relation to citation by URN value in Smith 2009). Because every digital object we create in the Homer Multitext project can be unambiguously identified with a CTS URN or a CITE URN, the only parameter that a retrieval service requires is a single URN value. The services are RESTful: in exchange for a single http parameter giving a URN value, they return an XML response validating against a schema defined as part of the service.
We have two fundamental URN notations (CTS URNs for text, and CITE Object URNs for collections of objects), so we need one retrieval service for each URN type. We have defined a Canonical Text Service for identifying and retrieving passages of text by URN value, and a CITE Collection Service for identifying and retrieving discrete objects by URN value. The Homer Multitext
project documentation includes further technical details on the Canonical Text Service at http://www.homermultitext.org/hmt-doc/cite/texts/cts.html
, and on the CITE Collection Service at http://www.homermultitext.org/hmt-doc/cite/collections.html
Separately defining a citation syntax and a retrieval service follows the basic design principle of separation of concerns. Following this principle, we can base a wide range of applications on this simple foundation. For example, while the Canonical Text Service and the CITE Collection Service specify schemas for their XML replies that might appear rebarbative to end users, in implementing the services, we have opted to include hints linking to XSLT stylesheets that format the raw XML of the service reply as web pages designed for human readers. (See below, “Clients and end-users.”) Of course the XML source text is unchanged, so programs interoperating with one of the services still receive the specified XML reply: the question of formatting replies for human readers can be cleanly and completely separated from the explicitly defined schema for replies that other software can work with.
Just as we can transform replies for the convenience of human readers, so, too, we can transform the format of requests to either retrieval service. The URL for a request to retrieve a passage of text from a CTS, for example, looks like this:
We can use URL rewriting support in web servers like Apache Tomcat to reformat in the form
This concise form is easy to write, and easy for human eyes to interpret.
We saw that the CITE architecture provides a mechanism for extending citation of unique objects with further type-specific reference information, such as the RoI notation we can use with images. In parallel with this, we can extend the CITE Collection service with requests that understand the extended URN notation. This includes requests to automate citation and retrieval of the binary data for images.
The result is that we can retrieve any piece of textual or binary data in the HMT project archive using one of three URLs: a retrieval request to a Canonical Text Service, a CITE Collection Service, or a CHS Image Extension to a collection service.
Relations of objects and semantic navigation
Describing related pairs of objects
If the most fundamental piece of scholarly work is simply to identify the objects we are studying, the next challenge is to specify how identified, citable objects relate to each other. This demands an understanding of a scholarly domain that may often be internalized and applied implicitly in traditional scholarship, but in our digital work needs to be specified explicitly.
URN notation can carry much of the burden of this work: we can express a relation between any two objects as an association of two URNs. To express the fact that lines 1–25 of Iliad 1 occur on folio 12 recto of the Venetus A manuscript, for example, we could map a URN for Iliad 1.1–1.25 to a URN for folio 12r, and we will already have captured the idea that we are relating a passage of text to a an object in a Collection. These associations or mappings of pairs of URNs are always typed: the type of the previous example might express the general idea “text passage appears on physical artifact”. The typed mapping of URNs forms a triplet that we can think of as a sentence, where the type of association is a “verb”, and the two nouns identified by URN values as subject and object of the verb.
The idea of expressing relations between objects as a triplet analogous to subject-verb-object is not new: we do not even need to invent our own notation, since the World Wide Web consortium has defined a language called the Resource Description Framework (RDF) that we can directly apply. RDF is conceived primarily as “a language for representing information about resources in the World Wide Web” (http://www.w3.org/TR/rdf-primer/
), but fortunately for our purposes the RDF notation is defined in a way that can relate two URNs, not merely addresses on the web (or URLs). This is not so trivial a point as it might appear: in the example above, we can express the idea that a passage of text appears on a particular folio page whether or not that text has ever been published on the WWW
. By restricting our use of RDF to describe only URN values, linked by verbs in an HMT
vocabulary, we create a formal description of relations that is technology independent (it is not
limited to resources on the WWW), but is machine actionable, much as URN notation allows us to create formal identifiers that are technology independent but machine actionable.
Graphs of relations
Triplets can also be thought of as a pair of nodes joined by an edge in a graph. The full set of triplets describing relations among objects in the HMT data archive can then be considered a graph modeling the domain studied by the Homer Multitext project. In 2012, as the quantity of material collected in the HMT project archive has grown rapidly, we have added two components to the project’s digital architecture: an automated system for building a graph of the project, and a network service for retrieving a graph of relations to a requested object.
Some objects in our archive are inherently related by their structure to other objects. Because citable nodes of text occur in an ordered hierarchy, for example, they have an inherent sequential relation to preceding or following nodes, and an inherent hierarchical relation to containing or contained passages of text. Similarly, while all CITE Collections are unique sets of objects, some may be further specified as ordered sets. A CITE Collection modeling the folios in a manuscript might define an inherent order based on their order of appearance in the current binding of the manuscript, or in a rebound manuscript might even define an order based on a reconstruction of an earlier or original sequence of folios.
When editors add semantic markup to their edition of a text, they are always relating the content to its context. Identifying a personal name or a place name with a URN value implies a relation between that URN and the URN of the text passage where the name occurs.
One basic principle the HMT
project has adopted in editing manuscripts is that we aim to record every feature, textual or graphic, that we see on a folio. In the Venetus A manuscript, for example, we see a series of numbers written in large figures in the exterior margins that, as Dindorf observed in the nineteenth century, mark epic similes. In our edition, we record every occurrence of such a number in a notebook that relates the annotation both to the folio where it is seen, and to a region of interest on a documentary image. (For an interesting interpretation based on this systematic record, see Roughan 2011
Our automated build system takes all of these kinds of relation into consideration as it processes the entire data archive to generate hundreds of thousands of RDF statements about each manuscript we are editing. We define a namespaced vocabulary that is continuing to evolve as the range of edited material expands. (The current definitions are available from the Homer Multitext
project’s documentation at http://www.homermultitext.org/hmt-doc/standards/rdfvocabulary.html
.) The build process writes a simple text file of RDF statements that again separates concerns, by isolating the new question of how objects relate to each other from our prior concerns of how to identify and retrieve objects.
By expressing these relations in a standard language, we immediately have access to the world of supporting technologies for working with RDF. We can, for example, directly import these RDF statements into a triplet store supporting querying using SPARQL, a query language for RDF developed by the W3C. (For more information about SPARQL, see http://www.w3.org/TR/rdf-sparql-query/
Querying a triplet service with SPARQL is a practical way to work with RDF data, but to maintain the clear separation of concerns in our design, we need to be able to work more abstractly with a graph of URNs. We use a SPARQL server as a back end to our fourth network service, a service for working with the HMT project graph. Like our three retrieval services, the HMT Graph service is a RESTful design accepting a single URN as its only parameter. The Graph service returns an XML description of all nodes linked to the requested URN. As with our three retrieval services, we include hinting links to XSLT stylesheets that can format the XML data as a web page for human readers. Because all nodes in the graph are identified by URN values, we can easily include links to recenter the graph on any listed node, or alternatively to retrieve and display the requested object, whether it is a text, an image, or an artifact such as a manuscript page.
This opens up a kind of semantic navigation of the Homer Multitext archive. For a traveler at any node location in the graph, navigation can be reduced to finding adjacent nodes. Since all nodes are citable objects, they are identified by URN, and our the same Graph service that found the adjacent nodes can recenter the graph on any related node.
So, in addition to our three retrieval URLs, a fourth URL for our graph service can satisfy all our navigational needs for a hypertext of the HMT data archive. But our separation of concerns helps us beyond this: the rich web of relations that the Graph service works with can be viewed in different ways that effectively create new HMT applications. The simple architecture of two basic URN formats supported by four service URLs enables us to create a nearly limitless world of HMT applications.
Clients and end-users
Our four Homer Multitext URLs—Texts, Objects, Images, Graph—return responses in well-formed, namespaced XML. These responses are easily processed using standard tools and methods for a variety of purposes. The expected recipient of these XML responses, the “client” for these services, is of course a web-browser.
Modern web-browsers (as of autumn 2012) are well-equipped to receive XML and process it in useful ways for human readers. A standard technology for passing to a browser instructions for how to handle an XML reponse is an XSLT Stylesheet
. An XSLT stylesheet is a document, itself in XML, that provides a script for transforming each element of an XML document into some other dialect of XML, including HTML. As a stylesheet takes XML and creates an HTML page out of it, the stylesheet can also include links to CSS stylesheets
scripts, which can turn a static HTML page into a dynamic interactive web-based application.
Each of the four services—our four URLs—can include a link to an XSLT stylesheet in their XML response. A URL that includes a CTS URN, requesting a passage of text, will generate an XML response that begins by pointing to a Stylesheet that does the following:
- Formats the requested passage’s metadata (work-group, title, publication information) for display to human readers;
- Formats the passage itself, including line-numbers, editorial content, and any other embedded markup or features we choose to support;
- Finds the <prev> and <next> elements, which contain CTS URNs to the passages preceding and following the requested passage, in the requested edition of the text, and turn those into URLs, links for navigation.
For the CHS Image Extension, the default stylesheet reads the (very simple) XML returned as a resonse to the URL and transforms it into a sophisticated web-application that draws a dynamic view of a high-resolution digital image that the user can zoom and pan dynamically in the browser.
The HMT Graph service, which takes a Text, Object, or Image URN and returns a list of all other objects (identified by URNs) and their particular relationships, expressed as namespaced verbs. This service can also include a link to an XSLT stylesheet that can turn this simple list of URN → [verbs + object-URNs] into a dynamic and interactive web-based application. The HMT Graph service accepts an app= parameter, and can return different XSLT depending on the value of app. By default, HMT Graph returns a stylesheet that allows navigation from object to object. If app=facsimile, the HMT Graph service will return a stylesheet that turns the output of the service into an application for navigation a digital facsimile of a manuscript. If app=multitext, the service will return XSLT that focuses on relationships among different editions of a text.
For each of these applications, the stylesheets present the requested object and the objects related to it; the stylesheet generates links that allow users to choose any of the related objects as the “center” of the view, thus allowing navigation of the entire graph of the Homer Multitext, hundreds of thousands of relationships (and growing).
One important element of all of these end-user applications is our ability to “resolve” a CTS or CITE URN in place. Our applications can place references to URNs in a web page of HTML, and then automatically retrieve the text, object, or image pointed to by the URN: within the context of a web page, we can automatically turn a citation into a quotation. We have published, and are regularly updating, the package of code that makes this possible in our applications at http://www.homermultitext.org/hmt-apps/html-ctskit.html
It is important to emphasize that the services of this architecture always return XML. A client that is designed for human readers will automatically apply these stylesheets and thus create human-friendly environments for reading, study, and exploration. A client that is not interested in generating such an application will ignore the stylesheets. Such clients might include automated processes of indexing and analysis, or end-user applications designed by scholars outside of the Homer Multitext, who would like to draw on this data and present it in different ways.
The CITE Architecture of the Homer Multitext
is built on a strict separation of concerns:
- We can identify any object of study—text, data-object, image—by URN …
- … at any level of granularity
- … whether or not the resource we cite is on line
- We can assert relationships among texts, objects, and images with a simple triplet of URN + verb + URN.
- We can generate a list of all relationships for any object as a series of triplets expressed as RDF XML.
- We can turn that XML expression of a point in the graph of relationships into any number of dynamic, navigable end-user applications.
- We can “resolve” any URN with one of three URLs, turning the citation into a quotation.
At the time of this writing, the graph of objects in the HMT
numbers in the hundreds of thousands. As we add more edited texts of scholia, more information about morphology and syntax, information about the three-dimensional shape of manuscript pages, alignments between different photographs of manuscript pages, and more scholarly commentary (each comment stored as a CITE Object with a CITE URN) this number will rapidly reach millions.
But what is required to navigate a graph of millions of objects and follow relationships from facsimile page to image to text to multitext to morphology to commentary will not change. Our four URLs will continue to suffice: a URL for a list of an object’s relationships, a URL for retrieving text, a URL for retrieving data, a URL for retrieving images.
As the HMT grows, and as technology evolves, our expectations and desires for presenting and working with these texts, images, and object will inevitably change rapidly. By virtue of the separation of concerns, however, the challenge of displaying the data of the HMT will remain tractable, since it will remain merely a matter of transforming well-known XML structures. The tools for performing those transformations are so central to the global information economy, that their quality, ease of use, and power will grow under the care of others. The HMT will be able to profit from a wide spread need for such technological infrastructure as efficient triplet stores for massive sets of data.
This separation of concerns leaves the editors of the HMT and their collaborators free to focus on the intellectual challenges of studying the vast wealth of data on the history of Greek epic poetry. The HMT’s editors can remain focused on creating a digital multitext of Homeric poetry, one that is possible only now, and only after the process of study and experimentation among an intergenerational group of collaborators that Gregory Nagy assembled and set in motion.
But even as the CITE Architecture serves the Homer Multitext, it is being adopted elsewhere as the most appropriate conceptual framework for scholarly study of many aspects the ancient world. Projects currently exploring the potential of CTS/CITE include collaborative work on Greek morphology, on the fragmentary Greek historians, on automated discovery of quotations of Greek and Latin in large libraries of scanned printed books, on integrating archaeological records from archives with ongoing textual data, commentary, and (both two-dimensional and three-dimensional) digital imagery from ongoing excavations, the manuscript tradition of Propertius and Athenaeus, and Greek epigraphy and palaeography.
In the late 1990s, the first conversations on the subject of a “Homer Multitext” recognized that a sufficient treatment of Greek epic must necessarily include such a wide body of material, from different times, different genres, different media, and different disciplines, that it was impossible to plan a “Homer Multitext Application”; what was called for was a true, and truly generic, digital library infrastructure that could provide both simplicity and virtually unlimited scale. We believe that we are on the path to achieving this goal. This has been possible because of Greg Nagy’s vision and scholarship, and also his generosity, dedication to collaboration, and wisdom to grant to a complex challenge a long gestation.
Berkowitz, L., and K. A. Squitier, eds. 1990. Thesaurus Linguae Graecae: Canon of Greek Authors and Works. Oxford.
DeRose, S. J., D. G. Durand, E. Mylonas, and A. H. Renear. 1990. “What is Text, Really?” Journal of Computing in Higher Education 1:3–26.
Renear, A., E. Mylonas, and D. Durand. 1996. “Refining our Notion of What Text Really Is: The Problem of Overlapping Hierarchies.” Research in Humanities Computing (eds. N. Ide and S. Hockey). Oxford.
Saur, K. G. 1998. “Functional Requirements for Bibliographic Records.” IFLA Study Group on the Functional Requirements for Bibliographic Records. UBCIM Publications, new series, 19.
Smith, N., and G. Weaver. 2009. “Applying Domain Knowledge from Structured Citation Formats to Text and Data Mining: Examples Using the CITE Architecture.” Text Mining Services: Building and Applying Text Mining Based Service Infrastructures in Research and Industry (ed. Gerhard Heyer) 129–139. Leipziger Beiträge zur Informatik 14. Leipzig. Reprinted in Dartmouth College Computer Science Technical Report series, TR2009–649, June 2009.