Cultivate Interactive Home Page *
*

Search Disabled

  Home | Current Issue | Index of Back Issues
  Issue 4 Home | Editorial | Features | Regular Columns | News & Events | Misc.

OAI Open Meeting

By Rachel Heery - May 2001

The Open Archives Initiative (OAI) develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content. In February Rachel Heery attended their Open meeting held in the Berlin State Library (Staatsbibliothek zu Berlin). The meeting marked the start of a validation period for the specification.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Open Archive Initiative (OAI) designers and early adopters launched the recently released OAI Metadata Harvesting Specification to a packed meeting in the Staadsbiblothek , Berlin, in February. Following on from a parallel event in Washington, DC in January, this meeting marked the start of a ‘validation period’ for the specification. Over the next year experimental implementations of the specification will inform the OAI and the wider community as to the possibilities offered by the OAI model for metadata exchange. This brief article will only give a short summary of the many presentations from the interesting and varied programme. Readers are referred to the OAI Web site [1] where there are copies of presentation slides. In this short report I will merely highlight some of the themes that emerged and note some issues of particular interest.

After a warm welcome from Diann Rusch-Feja, Max Planck Institute, who is one of the European members of the OAI steering committee, the programme got underway. Presentations for the day included views from a number of stakeholders representing the OAI executive, implementers both information services and software development backgrounds, existing e-print archives, and vendors of library management systems.

Carl Lagoze, executive director of the OAI, led off with an overview of its history and an account of progress to date. The origins of the Open Archives Initiative were in the e-print community. The impetus for the initiative was a desire for effective interworking between e-print archives. In the early days the e-print community’s efforts were concentrated on enhancing interoperability between e-print archives, culminating in the Santa Fe convention in 1999 [2]. The work of the initiative continues to be relevant to this community, however as time went on it became clear that the fundamental enabling technology for simple metadata exchange is relevant in a much wider context.

Carl explained that its harvesting protocol positions the OAI independently from any specific content or economic model. OAI’s future ambitions promise to have much broader relevance in opening up access to a wide range of digital materials. The ambition is to enable 'interoperability that will work', and at a low cost so that the entry level for providing interworking services is lowered.

Paul Ginsparg, director of arXiv.org the well known e-print archive at Los Alamos National Laboratory (LANL) gave a perspective from the longest established open archive. Serving the physics community this pre-print archive is central to scholarly information exchange, and has been successfully built on the model of author self-archiving. This analysis of both author and end-user interactions with the archive gave a fascinating insight into the patterns of user behaviour that can be gleaned from statistics. The LANL archive does not provide open access to robots at present and has no plans to change this policy. Paul explained this was primarily to exclude adverse impact on performance, but also indicated that such 'diffusion' of the target audience might not be beneficial. If this policy were to change it would be interesting to compare the way users of search engines, for example Google, interacted with the site compared to the behaviour of users who made direct access.

The rapid development of the OAI specification is certainly impressive, as is the early focus on a very specific well-scoped implementation area. Carl went on to give a detailed consideration of the harvesting protocol and how it fits into the overall OAI interoperability framework. Drawing on work carried out with Herbert Van de Sompel, Carl gave a detailed presentation of the core concepts in the OAI metadata harvesting specification and how these are built into the protocol. The model is of a number of 'service providers' using the OAI protocol to harvest metadata from 'data providers', the protocol allowing a limited number of simple requests to be made within the gathering transaction. In order to facilitate interoperability data providers must provide their metadata in simple Dublin Core using XML encoding, although they may choose to provide metadata compliant to other schemas in addition if they wish.

The emphasis within OAI is on simplicity, and it will be interesting to see how far this simplicity will be retained in operational services, or whether there will be an imperative to provide refinements and differentials which will require adding complexity to the simple exchange of simple metadata.

The next part of the programme involved a number of first hand accounts of implementation experience from representatives of the group of alpha testers of the OAI specification. The alpha test period ran from November 2000 to early 2001 and involved participation from institutions in a variety of domains. There were a number of approaches to alpha testing, some looked at making metadata available for harvesting (acting as data providers), others looked at the role of service providers gathering metadata from repositories, and some focused on developing compliant software building on existing systems.

Kurt Maly, Old Dominion University, gave an account of the experience of testing OAI harvesting from the perspective of a federated service of e-print archives. The alpha test involved harvesting data from arXiv, cogprints, Virginia Tech Thesis/Dissertation collection and several other institutional repositories. In a summary of lessons learned Kurt noted that the expense of maintaining a quality federation service is highly dependent on the quality of metadata declared by data providers. Using a unified controlled vocabulary, or at least defining mapping relationship, is important in a federated archive service. Also he noted that in using XML syntax and character encoding a single error could influence large set of data, and such character encoding errors occur frequently in many data providers. Service providers also need to consider the trade-off between data freshness and harvest efficiency

Heinrich Stamerjohanns and Susanne Dobratz explained testing of the protocol at the Humboldt University, Berlin, which runs an eprint archive service for theses, dissertations, and scientific publications. The archive contains text in a variety of formats (SGML, XML, PDF, PS, HTML) as well as non-text data (video, simulations). The archive is now compatible with OAI version 1.0.

Jean Yves Le Meur told of his experience at the CERN library. This involved a test collection of books and eprints. Metadata was provided for these using three metadata formats: the mandatory Dublin Core, plus MARC and RFC 1807. One issue was scoping the collection to limit the metadata declared for the OAI repository, which meant trying to identify a sub-set of the whole CERN collection. Within the declared metadata there was also some question as to the best identifier to use. CERN also considered how full text identified by OAI metadata might be exchanged, at present the OAI protocol does not specify procedures for linking to full text.

Eva Krall, Ex Libris, outlined implementation of the OAI protocol in the library management system Aleph 500. Ex Libris were successful in using the OAI protocol to provide a simple means of maintaining a union catalogue as an alternative to message based replication of data between systems. However Eva noted that in the context of libraries there were some issues such as lack of authorisation mechanisms, and the need to transfer holdings data, so some refinements and enhancements might be required.

Andy Powell, UKOLN, carried out a test implementation of the OAI protocol within the Resource Discovery Network, a co-operative network of UK subject gateways giving access to high quality Internet resources. Within the RDN cross searching has been implemented using Z39.50 but because of performance issues and difficulty with building flexible browse interface there is interest in looking at a record sharing solution. One of the issues that emerged from testing was the richness, or lack of richness, of the simple Dublin Core schema for records. Simple Dublin Core does not support all the elements included within RDN records; e.g. it does not indicate the subject classification scheme in use. It may be that a richer metadata schema would be more appropriate. Andy indicated that issues of authentication and branding might also need further exploration.

Les Carr, University of Southampton, reviewed the eprints.org software which facilitates institutional and author self-archiving. The CogPrints Cognitive Sciences E-print Archive alpha tested the OAI protocol and is now OAI v1.0 compliant. The eprints.org software is freely available and is designed to be as flexible and adaptable as possible, so that universities world-wide can adopt and configure it with minimal effort for their institutional self-archiving needs. Les went on to consider ideas for building a citation database derived from analysis of use of e-print archives, and considered how analysis of use of archives might suggest the tools needed to support archive administration and user interfaces.

Future plans for implementation of the OAI protocol are being drawn up in different application areas. Donatella Castelli, Istituto di Elaborazione della Informazione, gave a brief overview of the Cyclades project. This is a recently funded project as part of the EC IST programme. Its aim is to support scholars in inter-acting with multi-disciplinary archives as members of networked scholarly communities. The project intends to develop a working space for groups to have shared access to their own documents, to other collections, and to related links and annotations. It will test whether such a quality service can be built on the OAI low barrier interoperability framework.

Jeff Young, OCLC, then outlined activity within the ALCME project at OCLC. ALCME is working with the National Digital Library of Theses and Dissertations (NDLTD) to develop a name authority linking mechanism. Participants will create authority records in their local repositories and share them with other repositories using OAI protocol for metadata exchange. Jeff intends to explore the use of RDF to enable participants to annotate each other's records. Further details of this ambitious application are available from the OCLC Web site.

During question time there was some reflection on already existing alternatives to the OAI framework. Inevitably a comparison with Z39.50 was suggested. But how can one compare OAI and Z39.50? In such a comparison to consider all the functionality of Z39.50 would be far too wide a scope. However realistic comparison should include more than the capability of both protocols to gather all metadata instances from a compliant repository.

The event also gave the audience some insights into options for 'OAI next steps'. Presentations during the day prompted ideas (in this member of the audience) ranging from facilitating shared metadata creation, in effect collaborative cataloguing, to more specific implementation matters such as working towards recommendations for the optimal size of a metadata repository. Identifying criteria to guide the harvesting process also seem of significant importance, in order to achieve a balance between distributed and centralised repositories. Of major interest is the business impact of the OAI model, where is the burden of work located for services following the OAI 'technical framework'? The next year will give data providers and service providers the opportunity to explore some of these issues. Already further meetings have taken place and are planned to take this forward.

References

  1. Open Archives Initiative Web site
    URL: <http://www.openarchives.org/> Link to external resource
    Please note that links to information about alpha tester are available from the OAI site, so are not listed here.
  2. The Santa Fe Convention of the Open Archives Initiative, Herbert Van de Sompel and Carl Lagoze, D-Lib Magazine, February 2000.
    URL: < http://www .dlib.org/dlib/february00/vandesompel-oai/02vandesompel-oai.html> Link to external resource

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Author Details

UKOLN logoRachel Heery
Research and Development Team Leader
UKOLN
University of Bath
BATH
BA2 7AY
United Kingdom

r.heery@ukoln.ac.uk Link to an email address
<http://www.ukoln.ac.uk/> Link to external resource

Phone: +44 1225 826724

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

For citation purposes:
Heery, R. "OAI Open Meeting", Cultivate Interactive, issue 4, 7 May 2001
URL: <http://www.cultivate-int.org/issue4/oai/>