|Search Options | Help | Site Map | Cultivate Web Site|
|Home | Current Issue | Index of Back Issues|
|Issue 2 Home | Editorial | Features | Regular Columns | News & Events | Misc.|
By Gregory Crane, Brian Fuchs, Amy C. Smith and Clifford E. Wulfman - October 2000
The Perseus Digital Library   already enjoys strong affinities with many projects being developed in Europe today. Mirror sites for Perseus have been maintained in Oxford and Berlin for several years, and we have worked extensively with the Max Planck Institute for the History of Science, Berlin  since 1998. Most recently, we have begun to collaborate with the Center for the Study of Ancient Documents and the Beazley Archive at Oxford University as well as with the team at Cambridge now writing a new intermediate Greek Lexicon. European collaborations are natural for us; while most of the technical research in digital libraries being done in the US is readily applicable to European efforts, the Perseus Digital Library Project is unusual in that, technology aside, its efforts to date have focused on a cultural heritage shared by the US and Europe alike. Given the magnitude of the task before us all, such US/European partnerships are essential, and we are eager to expand our ties to colleagues in Europe. We are therefore grateful for the opportunity to contribute to Cultivate Interactive.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Perseuss goals are twofold. First, we seek to contribute to the growing set of materials available in electronic form, and we are committed to providing access not only for scholars but for the widest possible audience. For us, electronic media provide powerful tools with which humanists can reach beyond the academy and democratize access to the shared cultural heritage of humanity . Second, we are struggling to understand how best to structure information for electronic environments. Simple word-processed or HTML representations of traditional documents are only a first step towards true electronic publication. Not only must we develop protocols for radically new kinds of documents (e.g., archaeological reports that include databases and virtual-reality reconstructions); we must also rethink many of the humanists traditional tools: for example, what is the relationship between new research lexica and such non-hierarchical, semantically organized databases as EuroWordNet ? Our larger goals, and those of the international community of scholars and researchers in humanities computing, must be broader, however. Ultimately, we believe that digital libraries have the potential to revolutionize intellectual life across the world, but only if humanists vigilantly monitor the rush of technological change. The World Wide Web has already begun to realize the dazzling potentials of richly networked electronic resources, but the success of the WWW also highlights its limitations: its content is uneven, its organization is poor, and its overall interoperability is rough and fragile. Even more than their colleagues in the sciences, humanists must assimilate the implications of new technologies, for during the next few years a new electronic infrastructure both technical and institutional will take shape, one that will very likely constrain what we can and cannot do for a generation.
By building concrete electronic collections and by working with humanists from many different disciplines and countries, we seek to help the humanities influence this new developing infrastructure. Until we know how to organize and create documents that will be useful over the long term, we lavish work on projects that will not conform to shifting best practices and that will become prematurely obsolete. Publications in some fields particularly in the scientific disciplines rapidly become outdated because the fields information infrastructure its current best wisdom is in constant flux. The "hotter" a discipline and the more resources its technological infrastructure attracts, the shorter its chronological horizon. Intellectual work in the humanities typically enjoys a longer productive life, however. We develop editions, reference works, and other documents that are often designed to serve for generations. As we contemplate the staggering task of converting the record of humanity into digital form, we need to design an infrastructure that can grow and evolve with technology and create documents that will exploit the capacities of systems and uses that none of us can yet anticipate. Humanists thus must themselves take the lead as they peer toward a receding horizon.
Even where the technology is available and its utility clear, scholarly practice lags. Most photographs collected by field archaeologists today, for example, are obsolete before they are developed. With todays digital technology, we can easily stitch pictures into 360-degree panoramas that give viewers a radically different sense of a buildings contextual space than isolated snapshots of structures and commanding views can provide. To create such panoramas, however, we need many different views ideally at least twelve wide-angle photos shot from a tripod-mounted camera and few slide collections taken before the advent of technologies such as Quick Time Virtual Reality (QTVR) contain such depth of coverage. Furthermore, many of the most useful images provide overviews of terrain surveys of a plain or a site from a hill or some neighboring point. In May 2000, the US government removed its restrictions on satellite data and allowed inexpensive hand-held Global Positioning System units to provide data accurate to within ten meters. By using such GPS units, archaeologists can now "geo-reference" any outdoor image that they take, allowing the images to be plotted on a map so that those who subsequently use them can study the terrain with much greater precision. Yet although archaeologists visited thousands of sites around the world in the summer of 2000, virtually none of them systematically collected images that were geo-referenced or suitable for panoramas. It will not always be practical to drag a tripod to a remote site and we may not even geo-reference every picture, but the scholarly community needs to reevaluate the costs and benefits and to rethink its "best practices."
Abstract extrapolation only carries us so far, however. We can build a digital environment that mirrors and indeed replicates many of the limitations of print (many on-line collections currently do so), but the new environment is or should be a radical step beyond print standards. In our experience, the best method forward is to develop solid collections that serve real communities and then evaluate not only how these new collections support preexisting needs but how they open up new modes of inquiry. Even the most basic features of an electronic environment (text searching, for example, or the presence of very high resolution images) may well over time have a profound impact on the questions that we ask. At Perseus, we seek to develop collections that exploit at least some element of the electronic environment and that attract real use from students in the field. We can then study the real-world usage to determine what does and does not work, thus refining our models and guiding redesign of the collection. And while we have created an integrated digital library environment , we remain focussed on the back-end structures by which the data is organized. Systems, however elaborate, are ephemeral: they evolve and can be replaced much more easily than massive and expanding contents.
Figure 1: The standalone Perseus CD-ROMs.
From the beginning of our research, the Perseus Project has been interested primarily in the interplay between technology and intellectual inquiry: already in the mid 1980s it was clear, first, that we could represent in digital form every category of data that was available in a library and, second, that the electronic environment would influence the questions that individuals posed. Our initial focus was on the ancient Greek world: we were able to develop a critical mass of heterogeneous information, including source texts, modern translations, linguistic analysis tools, lexica, and commentaries, site plans and geospatial data, thousands of new catalogue entries for objects and sites, and tens of thousands of images. These materials were published together in several CD ROMs: Perseus 1.0 in 1992, Perseus 2.0 in 1996, and Platform Independent Perseus in 1999. Support from the US National Endowment for the Humanties allowed us to begin expanding into the Roman world in 1997. Roman and additional resources have been published on our Web sites, which served 300,000 pages for 30,000 individual sessions within peak twenty-four hour periods in spring 2000.
While our work on Greco-Roman civilization continues, major government support from the National Science Foundation and the National Endowment for the Humanities under the Digital Library Initiative Phase II  has allowed us to approach the broader issues confronting digital libraries in the humanities. Our method has been to collaborate with experts in various subsets of the humanities, capitalizing on and then expanding our existing expertise. We concentrate in particular on coherent but heterogeneous collections, where automatically generated cross-references and links create an interactive environment supporting new modes of data discovery and visualization. We have just completed the first of five years of support under the DLI-2. Since the DLI-2 does not support the development of collections per se, our efforts under this program are analytical: these collections serve as tools with which we can study the organization and use of digital libraries. We are also, however, eager to work with those who are creating new content. Working with the Modern Language Association of America  and the Max Planck Institute for the History of Science in Berlin we have, for example, been able to create substantial new digital content. Over the next year we will bring into production new collections on the History of Mechanics, the Histories of London  and the United States, the archaeology of Giza (Egypt), and works of Shakespeare, each of which should appeal to a different disciplinary audience and will confront different technical challenges. Our collaborative project with the Museum of Fine Arts, Boston, on their excavation materials from the ancient necropolis at Giza, for example, must integrate traditional documents, such as field notebooks, with the photographic record of the excavation. We would welcome collaboration with other projects to which our expertise could contribute and that are dedicated to making their materials available to the widest possible audience.
Figure 2: Perseus Digital Library on the Web
From this roster of new activities, it is clear that our development is both incremental and opportunistic. The technologies that we developed for managing Latin and Greek, for example, gave us immediate leverage with early modern European materials, many of which either are in Latin or cite passages in classical languages: we thus began projects on electronic editions of Christopher Marlowe and Shakespeare and on early modern scientific texts. Our experience integrating textual and geospatial materials helped us design a digital library on the history and topography of London and its environs, and our work on archaeological sites led us to explore the possibilities of extensive documentation for the Roman port city of Ostia and virtual reconstructions of tombs at Giza. These projects currently cluster around the following topics:
Well structured digital libraries make it possible for individuals to extend their intellectual range: computational tools that link source texts to lexica, translations, and so on allow non-specialists to make much more effective use of limited linguistic training. Indeed, if the tools are sufficiently powerful they can enable those with no knowledge of a particular language to identify key terms and linguistic structures. Such tools, however, require documents that are extensively structured, and scholarly labor is often much more abundant than the technical expertise needed to apply a complex tagging scheme or to create a document that integrates many different databases. We are working closely with the Max Planck Institute for the History of Science, helping them to develop a working environment for a team studying the history of mechanics from antiquity through the early modern period. Though focused, this project, called the Archimedes Project, still requires direct analysis in texts from more languages (including Greek, Latin, and Arabic) than most professional scholars could ever master. We are working with our colleagues at the Institute so that nontechnically-minded scholars, without extensive training or support, will be able to add a wide range of data to Archimedes' corpus of documents. A specialist in Arabic, for example, will be able to structure Arabic texts on the history of mechanics so that they can be used by those with no knowledge of Arabic; the Arabic specialist can, however, then use similarly prepared documents in Greek or Latin. Each member of the team will thus help others extend their range while benefiting him- or herself in the same way. Mutual enrichment is the focus of the project in another way as well, in that, unlike many digital libraries, it is built on the notion that its content will be continually deepened by the commentaries of scholars working on the texts in the corpus. Metadata generated by scholars will become available to users in the form of visualizations, which will in turn become the starting point for new investigations or even the subject of commentary themselves. In this way a rich superstructure of metadata will be encouraged to grow up around the source text, without the kind of prior determination of content or direction that is inevitably the result of digital encoding projects that begin by drawing up exhaustive DTDs. The challenge here has been to design a working environment that allows for the continual addition of heterogeneous and sometimes even contradictory metadata without forfeiting its heterogeneity. The project hopes that this experimentation with open-ended data formats will contribute to an expansion of the existing functionality of digital libraries.
Figure 3: Comparing versions of Christopher Marlowe's Dr. Faustus
Building on our experience with creating an electronic edition of the complete works of Christopher Marlowe, we are harnessing Perseus's chief strength the ability to create, automatically, interconnecting cross-references among texts, commentaries, maps, timelines, and other information sources to build a comprehensive textual resource for scholars of Shakespeare. As a part of this more general effort, we are working with the Modern Language Association of America to establish an electronic format for the massive New Variorum Shakespeare, a series of critical editions of Shakespeare's texts and important scholarship. Shakespeare is an attractive choice for several reasons: besides providing a high-profile subject with broad readerly and scholarly appeal, building the resource poses several interesting technical difficulties with which to challenge our methods and practices. How, for example, does one transform a self-standing book into an interactive electronic document while maintaining the contours of its traditional form? In our view, so-called new media should not break with old media but rather become a natural extension of it. Because the New Variorum Shakespeare will exist in both codex and electronic forms, it is important to maintain interoperability. Readers of one medium should be able to work easily with the other, so we must carry over the conventions of the print form of the New Variorum Shakespeare to the electronic edition while extending it to include global searches, hypertext linking, coordination of multimedia resources, and other functionality available only in an electronic resource.
Another challenge lies in developing new methods of structuring a large heterogeneous body of scholarly resources. The New Variorum Shakespeare will contain, in addition to critical editions of the primary texts, an extensive selection of important Shakespearean scholarship which will, in turn, refer to the massive body of scholarship on Shakespeare and the early modern period. While Perseus has long had the capacity to build links between primary texts and extratextual resources (commentaries, maps, images, and so on), we will need to build new tools to support links among the secondary materials themselves, both within the digital library and in the wider universe of scholarly discourse. Online archives of scholarly writing, such as JSTOR , Project Muse , and others, for example, might contain articles that are referenced by secondary materials in the Variorum Shakespeare; these articles could easily be retrieved and displayed in a networked system. References to materials not available online in full-text form may still exist citationally in online catalogues and other indices; users of the digital library should be able to link directly to these citations as a first step to retrieving the materials by traditional means. Thus the New Variorum Shakespeare project will, like other Perseus projects, pose questions about the design and distribution of academic resources in an open environment.
In building databases of sites and architectural monuments for Perseuss initial coverage of ancient Greek civilization, it quickly became apparent that to be effective, a digital library should have an organization that is scalable. In a scalable system, the wealth of information and materials available for larger, well excavated and/or well documented sites could be presented at whichever level each user required for his/her purposes, rather than being artificially restricted to the scale at which smaller sites were covered. Perseuss initial coverage of archaeological sites, for example, comprised extensive hyperlinked catalog entries on the individual sites, many of the buildings on each site, some of the other monuments found therein, and linked plans and other images, but such a system is unwieldy for larger sites, such as Delphi and Olynthus, not to mention cities such as the ancient port of Ostia or nineteenth-century London. Not only do the sizes and chronological spans of each of these places vary, but the materials that pertain to each of them whether materials found in them or materials documenting them differ greatly from site to site. The necropolis, or cemetery of Giza, for example, is like a city in that it comprises many architectural monuments that may be represented in still photographs or in QTVR walkthroughs, which in turn should be linked to other images, QTVR files, and text descriptions of objects found within those monuments. For the bigger picture, one must also provide a geospatial reference for each building, so that users may understand how monuments relate to each other, to the city plan as a whole, and in turn to the location of the city in relation to the world around it.
In this vein, our initial work on London has necessitated our building tools with which various information pertaining to any city or citylike complex, modern or ancient, might be approached through a variety of digital media. In addition to constructing an atlas that allows a user to view the whole city or to zoom into a particular building, we are building three-dimensional walkthroughs which are based on 18th century drawings and maps of city streets. With these tools, a user may access the maps in either 3D or 2D format and then follow links to still images of the streets and buildings as they appear today. Perseuss architecture also enables users to follow links between literary references to urban artifacts and their visual representations.
Figure 4: Hadrianic Capitolium
Whereas our virtual 19th-century London is based primarily on archival data, supplemented with contemporary images, we have approached Ostia from quite the opposite perspective. A massive photographic campaign garnered 10,000 new images of the site as it appears today, including at least a hundred QTVR walkthroughs. When sufficiently documented and linked to a detailed georeferenced plan of this city, this body of photographic information comprising, like the London materials, both 2D and 3D formats will allow users, whether scholars, students, or mere passers by, to visit the archaeological site of Ostia from anywhere in the world. Eventually, as with all of the cities covered in the Perseus Digital Library, detailed virtual reconstructions of some architectural complexes will further allow users to flesh-out the architectural spaces that might not be actually reconstructed: complex political, environmental, and structural issues, as well as the prohibitive costs, combine to leave most archaeological monuments as heaps of rubble.
Although the above discussion concentrates on architectural complexes, the wealth of catalog entries in the Perseus Project relate to individual art works (ca. 1500 coins, 1800 sculptures, and 2000 vases). In most cases these materials may not be georeferenced, per se, but many have known findspots either particular cities or, more informatively, the actual tombs, rooms, or buildings in which they were found. These findspots may be integrated with our city- or site-level documentation, both to enhance the inherent interest in the city or site and to expand ones appreciation of each art work itself. It is access to a wealth of supporting information that makes the Perseus Digital Library an attractive content partner for the Beazley Archives pottery database , while the Beazley Archive, with its database of more than 65,000 known vases in what aims to be a complete database of Attic painted pottery from the Archaic and Classical periods, adds breadth to Perseuss depth of coverage. Both teams have welcomed the technological opportunity to maintain two databases that are distinct in appearance as well as function but may be searched as one.
3D models, which serve as databases of geographic and architectural information in their own right, may also serve as the underpinning for a contextual presentation of architectural and freestanding sculptures. We are taking a first step with coverage of the lavishly decorated pan-Hellenic sanctuary of Apollo at Delphi, where the greatest challenge is in integrating text and 2D photographic documentation with 3D reconstructions. Projects such as ARCHEOGUIDE  push such limits of interactive technology; current bandwidth limitations with the WWW make the large-scale superimposition of 3D images on top of 3D reconstructions impossible. We are investigating the more tractable solution of providing links to QTVR movies of selected objects that are only shown from one side in conventional publications. The visual placement of each artwork in its known findspot and/or original location would of course enlarge the users understanding of the purpose of these art works, but even more intriguingly, 3D models might also allow us to glimpse the monuments of antiquity that have been long lost not only from their original locations but entirely from the world and are now known only through copies at best. The 3D photographic scanning technique invented by Marc Levoy and the Michelangelo Project at Stanford  reported in Science Daily Magazine  and elsewhere is an incipient technology that might be useful in reconstructing these lost works while presenting them faithfully. Recently Levoys team has worked on ancient materials such as the Laocoon in the Vatican Museums and the Forma Urbis Romae or Severan marble plan . Although the Stanford scientists have worked on recreating these objects in digital media, it is up to scholars to make use of them and to disseminate them for use in research and teaching, through digital libraries.
While all these projects differ substantially, they are united by our consistent effort to study the ways in which documents which are distinct in print libraries begin to merge with one another in a digital library, dissolving their individual structures and supporting new patterns of intellectual inquiry.
Digital libraries integrate many different kinds of data, and they involve many different aspects of system design. The back-end data structure affect the features that the front-end system can support. Experts in a discipline need to ponder the interaction between system and data structures. Research on a wide range of topics can contribute to our understanding of what digital libraries can or could do to support various audiences. For us at Perseus, the following topics stand out as areas of particular interest and collaboration:
The development of new integrated collections: This might include federating existing resources (e.g., a text corpus and a lexicon, or museum collections and archaeological records) related to materials in Perseus or entirely new projects. This may entail integration of new data into the Perseus Digital Library Interface or looser federation with separate collections. We are particularly interested in architectures for intensive back-end transactions that seamlessly integrate data from geographically separate digital libraries: a user on a relatively slow connection might call up a page that required hundreds of transactions between servers linked on a high speed connection.
The cognitive effects of digital libraries: What happens when we can link historical maps, modern GIS data, images, and texts? Can we change the role that geographic structures play? How do links between source texts and dictionaries affect text comprehension? How do patterns of information discovery change as large collections become public? How do the intellectual lives of various communities specialists, interdisciplinary researchers, students at various levels and the general public change?
Integration of modern computational linguistic techniques (e.g., machine translation, cross-language information retrieval) to historically significant languages: Natural Language Processing remains a dynamic subject, with theories in rapid flux and data structures constantly being redefined. Nevertheless, we can now begin to identify structures of persistent value that would be suitable foundations for long term reference tools in the humanities. EuroWordNet, for example, builds upon WordNet, which masterfully balances theoretical insight with practicality. Machine translation systems may come and go, but bilingual corpora are likely to remain valuable parts of any language analysis system. A relatively modest investment (e.g., 3-5 years of labor) would allow us to bring Latin and Greek to a point where they could exploit a wide range of computational linguistic techniques. Other languages could follow suit and share much of the same infrastructure.
Information Extraction and Visualization: We have done substantial work automatically extracting dates and placenames both from database records and from text, as well as readily identifiable features such as money. Our research has focused on the problem of disambiguation (e.g., does "Wellington" refer to a person or a place; if a place, which Wellington is meant?) and then on visualization. We have been developing automatically generated timelines and maps to help users grasp at a glance the chronological and geographic coverage of a single document or a collection. We are looking to extend our capabilities for both back-end feature extraction and front end display.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Perseus Digital Library Project
124 Eaton Hall
Medford, MA 02155
Phone: (617) 627-3830
Fax: (617) 627-3032
Max Planck Institute for the History of Science, Berlin
Amy C. Smith
Department of Classics
University of Reading
Clifford E. Wulfman
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
For citation purposes:
Crane, G., Fuchs, B., Smith,A.C. and Wulfman, C.E. "The Symbiosis Between Content and Technology in the Perseus Digital Library", Cultivate Interactive, issue 2, 16 October 2000
Date of Article: 16 October 2000
Copyright ©2000 - 2006 University of Bath | Published by UKOLN | Design by ILRT | Contact Us