Making electronic publication easier, faster, and more powerful with Hydra, a drag-and-drop TEI publishing environment
Hydra is an experimental drag-and-drop electronic publishing environment for TEI-conformant XML texts. This article introduces the capabilities and limitations of the system.
The Need for Hydra [top]
There have been a number of tools written that can transform TEI-conformant XML into display for the web, but these tools often tax the skills of new users. Hydra was created so that authors of TEI-conformant XML can simply drop their file into a folder and then view the text in HTML or PDF format immediately. Thus the goal is to encourage the use of TEI-conformant XML as a standard by making it far easier to transform that XML into more readable formats.
A Description of Hydra [top]
Hydra is a custom distribution of Cocoon, an XML publishing framework. Hydra builds on the flexible publishing environment of Cocoon and its separation of concerns between content, logic, and style. Cocoon incorporates these concerns using components and pipelines where each component in the pipeline executes a particular function. In a typical example, a generator component reads and parses an XML file, a transformer component converts the XML markup into a different XML markup using XSLT, and a serializer component produces the resulting output. Within the base distribution, Cocoon includes a number of generators, transformers, and serializers that anyone can use in their application. Some of these included components are generators which can read from native filesystems and XML as well as serializers which can output PDF, RTF, SVG and other formats. In addition to using these components, it is also possible to build custom components. Hydra uses one such custom-built transformer, Transcoder, written by Hugh Cayless. The Transcoder transforms TLG Betacode-encoded Greek into a number of different Greek encodings, such as SPIonic, Sgreek, and Unicode.
Hydra aggregates many open source projects in a single distribution. Two of the most important pieces of software Hydra uses are Sebastian Rahtz’s XSL stylesheets and the Apache Project’s Formatting Object Processor (FOP). Rahtz’s stylesheets transform TEI XML documents to HTML and to XSL Formatting Objects, later used to create PDFs with FOP. Not every TEI tag is converted to XSL-FO or HTML. This may result in Hydra not properly displaying the text. However, all of the texts in the base distribution of Hydra can be displayed in PDF and HTML and so these may provide a useful reference platform. The second piece of open source software that Hydra uses, FOP, is a Java application that takes XSL-FO and transforms it into a number of output formats, including PDF, SVG, Postscript, or just plain text. In particular, Hydra takes advantage of the PDF capabilities of FOP.
All of the components in Cocoon are controlled by the sitemap, a file which resolves URLs and calls the appropriate generators, transformers, and serializers.The logic and display concerns are handled within the Hydra sitemap so that the user may be free to focus only on the content of the electronic publication. The sitemap handles the logic by reading directories and files and then dynamically generates a listing of anthologies (or a collection of texts) and texts. The user can choose to display the output of the text in a number of formats including HTML, PDF, and the native XML format. Hydra is capable of displaying Greek encoded in Betacode using the user’s choice of font. In addition, the Greek words can be linked to the Perseus morphological parser. Finally, each collection of texts may be displayed with a custom template by modifying the stylesheets included in the base distribution.
A Demonstration of Hydra [top]
Vicus Unguentarius, a project dedicated to the study of the Roman epigraphic record pertaining to the scent industry, was initally implemented using Hydra. In a number of ways, this project demonstrates the range of customizations that Hydra provides. For example, the XSLT stylesheets have been modified with new colors and images, the front page has a navigation structure suitable for the particular project, and the ability to choose a Greek font has been removed as unecessary in this case. In order for these customizations to occur, the author of the project, Sandra Bolero-Imwinkelreid, in addition to writing her files in TEI-conformant XML, also needed to have a familiarity with XSLT to modify the parameters in the appropriate stylesheets.
Current Limitations and the Future of Hydra [top]
One limitation of Hydra stems from its use of “off-the-shelf” XSL stylesheets freely distributed by the TEI Consortium, though these may be modified by anyone with an understanding of XSLT in order to support the requirements of particular documents. On the other hand, a major advantage of Hydra (as of Cocoon generally) lies in the modularity of its constituent elements. Whenever a new version of FOP, Transcoder, or the stylesheets appears, it can be readily downloaded and patched into the system.
Another limitation concerns the way Hydra displays texts. It needs to be better at chunking, the process of breaking up texts within one file. It should also provide a built-in capacity for searching through the texts, possibly taking advantage of the Apache Project’s Lucene search engine. These improvements, along with more substantial documentation, would make Hydra a much more satisfactory electronic publication environment, possibly even suited to some public uses.
Here are some links to projects that Hydra uses in its implementation:
To refer to this please cite it in this way:
Michael Jones, “Making electronic publication easier, faster, and more powerful with Hydra, a drag-and-drop TEI publishing environment,” C. Blackwell, R. Scaife, edd., Classics@ volume 2: C. Dué & M. Ebbott, executive editors, The Center for Hellenic Studies of Harvard University, edition of April 3, 2004.