![]() |
Search Options | Help | Site Map | Cultivate Web Site | |||||
|
||||||
| Home | Current Issue | Index of Back Issues |
| Issue 7 Home | Editorial | Features | Regular Columns | News & Events | Misc. | ||
By Robin Yeates - July 2002
Robin Yeates reports on the investigation by the COVAX Project into the suitability of XML in providing integrated access to collections and materials in libraries, museums and archives. He also draws conclusions on the Project's prototype and work with trial users.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
One of the three research priorities of the European Commission funded IST Programme since 1999 has been 'ensuring integrated access to collections and materials held in libraries, museums and archives'.
How much progress have we made since then, and what are the current prospects for achieving what might be called a European Information Environment?
In the European Information Environment, everyone would have access to a range of appropriate and seamlessly accessible digital networked content and services provided by libraries, museums, galleries and archives. This would range from secondary indexes, catalogues and finding aids to full texts and multimedia objects and resources. These would be provided by the vast majority of existing institutions that have adapted their management practices and systems to participate in the environment.
One major technology component of the European Information Environment will be eXtensible Markup Language (XML), since this is rapidly becoming a basis for software application - software application and even to some extent software application - human interoperability throughout the quasi-global Information Society. This article looks at the policy context, but concentrates on the activities and findings of COVAX, (Contemporary Culture Virtual Archive in XML)[1], one of the few large-scale projects so far to look at the practical effects of a move to XML-based networking in European libraries, museums and archives. This article does not consider the related question of how far existing Z39.50 based solutions can meet the requirements, since this was not the focus of COVAX.
The European Commission has stepped back from its considerable activity in the field of innovation in digital heritage and cultural content, and commissioned an extensive and valuable study led by Salzburg Research entitled "Technological Landscapes for Tomorrow's Cultural Economy [2002]", known as the DigiCULT Report [2] . This tries to assess 'the way Europe's cultural institutions should approach technology-driven mutation' and make recommendations.
The DigiCULT Report Executive Summary begins 'Being digital for many European archives, libraries and museums (ALMs) is no longer an option but a reality. They have turned into "hybrid institutions" that take care of both, analogue as well as digital cultural resources. The conversion of all sorts of cultural contents into bits and bytes opens up a completely new dimension of reaching traditional and new audiences by providing access to cultural heritage resources in ways unimaginable a decade ago'.
Many of these cultural heritage or memory institutions have for years been managing their collections using sophisticated and expensive commercial software systems. They have been considering how they might migrate or replace their software systems to ensure that their resources are available to those who need them in the new era of the Information Society.
Others, especially smaller institutions, may not be so concerned about existing or future software investments, but they need to ensure that their own holdings become more visible and accessible in the networked environment. Large commercial endeavours that compete more effectively on the web may eventually threaten to substitute alternative, poorer quality services for those traditionally provided by small institutions.
From a supplier perspective, it is becoming increasingly important to offer products and services that inter-operate effectively with those of other suppliers, at least at a basic level. Alliances and partnerships need to be developed, particularly between management systems suppliers and content aggregators and large publishers. Moreover, they need to be based on workable architectures and practical ways of migrating from current infrastructure and customer/client environments to those expected to become more widely available. Smaller publishers will need to make their primary or secondary works available through larger intermediaries. If this is to happen, all the stakeholders involved need to develop a shared understanding of the issues so that they can contribute to successful innovation.
There is no doubt that a strategic political demand exists to encourage sharing of cultural data, although it has yet to be fully recognized that such work can only indirectly be self-sustaining, through education, social cohesion, personal motivation and self-fulfilment. At present the emphasis has been on technologies, rather than on these indirect benefits of wider access to and use of cultural resources. One of the Lund Principles of 4th April 2001 taken up within the eEurope digitisation strategy is that the Member States could make progress on the eEurope objective to 'create a co-ordination mechanism for digitisation programmes across Member States' if they 'worked in a collaborative manner to make visible and accessible the digitised cultural and scientific heritage of Europe.' (Lund Principles, 2001)[3].
A European Information Space may develop as a logical extension of national policies. For example the Heritage for All projects of the 5th Framework Programme CHIMER, CIPHER, COINE and MEMORIAL all intend to develop new and more powerful tools and services that involve cultural heritage organizations such as museums, archives and libraries more in end user learning and allow users to interact more deeply with content held by or delivered via them. Programmes such as the public lottery funded Peoples' Network in the UK will produce far deeper understanding within the cultural sector of the more complex issues relating to the management of digitisation and may generate support for future integration with other European programmes. However, COVAX has demonstrated that there still remains a large number of technical issues to be resolved simply to enable the sector's 'legacy' resources to be made visible and accessible to web users.
In Issue 3 of Cultivate Interactive, January 2001, Carlos Wert and Francisca Hernández described the aims of the COVAX project at its start [4]. The main objective of the project that ended in December 2001 was to define the different phases and procedures that need to be followed to transform current management and information systems used in archives, libraries and museums to an XML environment.
Here we outline the actual creation of a prototype resource discovery system containing a wide range of content types, its internal formative assessment and a summative evaluation of the outcomes of the project, some six months after completion, from the point of view of one of the partners.
One obstacle to technical development is that a realistic pool of digital data for modelling future systems is not always readily available to researchers, since priorities and formats have not yet been fully defined by local managers. Since the kind of information required to be discovered and managed by network discovery tools is in a state of constant flux, pragmatic approaches have to be taken during projects of fixed, tight time-scales.
One answer to this problem has been to focus on existing, often subject-based communities. These will have more clearly defined aims and target user groups, and will offer a clear vision that builds on the present.
If we are to expect new forms of interdisciplinary learning to develop, however, we must develop ways for new communities to be built that are founded on new ambitions and opportunities created through the network itself. These communities will have to set their own technical standards and guidelines, and potential members will need to be able to accept and adhere to them without causing prohibitive local disruption.
In practice, COVAX content used to build the two prototypes to date does not form a coherent dataset for any particular community. Instead, we have used samples that enabled development of solutions for what will become widespread problems. In effect we have taken a worst-case scenario, and considered the surrounding issues, rather than creating a finished product. The resultant in-depth learning, however, has meant that all partners feel confident in their technical planning, and indeed partners intend also to continue working together on their future systems, as they found the processes involved in technical integration and development so beneficial.
The actual data used consisted of mainly text and textual metadata, with some related images, as follows:
There are two main approaches to the use of existing data for XML based delivery. Data can be exported from existing systems in batches and converted directly or indirectly to XML. Alternatively, it can be left in an existing, typically relational database, and converted dynamically on demand.
Neither of these main approaches is likely to provide a complete solution for all memory institutions. One reason for this is that the size of collection and range of data management options varies enormously. Contributors may only require to publish a small number of records, or may require a complete separation of their management system from the network for security reasons, making a dynamic interface impractical. Conversely, large datasets that must be made fully accessible from existing systems may be impractical to handle using batch transfers.
A further issue is how to maintain interface compatibility across numerous disparate sites, especially now, when standards are still being extensively revised and developed all the time. COVAX solved this problem by introducing its own control over the range of open standards used, and by developing an agreed architectural framework for expansion. The system was able to layer services and content transformations so that contributors could be fully supported, whether they had no local XML systems or skills at all, or whether they had newly established, advanced, multimedia XML repositories, or whether they fell, like most institutions, somewhere in between.
The data conversion efforts undertaken have been described elsewhere with examples (ELPUB, 2001)[5]. It is sufficient here to note that we found that the existence of a strong service support network for data management was crucial to content providers to make their content available. This support ranged from basic and advanced XML/XSL skills to specialist knowledge of the source data formats. These were mainly MARC-based formats in the COVAX case, but several variants were used by partners, and it was decided early in the project that conversion to MARC21 should be undertaken to simplify conversion to XML. There is in general a distinct lack of bulk record conversion and validation tools that work with cultural schemas.
This meant we were able to use existing facilities to batch convert data, and we could leverage work done by the Library of Congress and others. A policy decision to use existing tools wherever possible led us, therefore, to make use of a fully reversible MARC21 based LoC XML format, rather than invent our own simplified solution. Local projects should not need to undertake such technically complex work, and we feel that we are now in a good position to utilise newer schemas and DTDs as they become available, without being required to develop them ourselves. This latter course of action might have severely restricted our capacity to integrate with future developments, although it may have led to some short-term benefits, such as improved performance of the COVAX prototypes. A huge benefit of this approach is that it is possible to include new data conversion and dynamic interfacing service providers into the consortium network, and to migrate practices over time as content providers gain skills and local systems capabilities, such as XML query handling.
Most of the time, memory institutions, particularly libraries and larger museums or archives, that want to publish large amounts of content will already have management systems using SQL-accessible relational databases such as MS Access, Oracle or MS SQL Server. Z39.50 techniques have already allowed integration of such systems to some degree for resource discovery. COVAX was tasked with determining whether so-called native XML databases might also be used. They would allow any XML resources to be held and managed in purpose-built repositories that provide access to objects, documents, statistics and other functions via web browsers and XML clients. In this way bibliographic information, finding aids, metadata and full-text documents and related multimedia assets can be retrieved in whole or part using not only SQL but also XML-based queries. These native XML databases are now becoming widespread as the basis of new content management systems, and as they become more sophisticated and robust, they will either replace or provide additional options for data management and security.
![]() |
| Figure 1: COVAX Deployment |
COVAX began in 1999 when few native XML database options were available, and no partners currently had one installed. It was not our intention to explore the full potential of these systems during the project, but we did need to create a distributed network of them to provide our testbed.
A key partner, Software AG, offers the Tamino product [6], and this was offered to the content provider partners, some of whom installed it, running under both MS Windows and Solaris on a Sun platform. A high-end solution, such a platform is intended for enterprise level applications, but we had no serious difficulty setting up and using it for five servers in Madrid, London, Rome and Salzburg. For the project these sites supported some ten production databases and five test databases. The whole system is managed via web browsers apart from some bulk processing and similar scripts.
Content providers felt the need for a simpler lower-cost solution for add-on repositories to existing systems. Only one suitable product was found at the time, lthough more have become available since. TextML™ [7] from Ixiasoft was used to develop an additional seven production databases and one test database in Barcelona, Karlskrona, Sweden and Graz. AIT needed to provide some additional software for this system, so that the COVAX meta-search engine could use a single query format to query both Tamino and TextML repositories. XPath [8] and XQL were used for the query language in COVAX, but there are still some issues surrounding the immaturity of these standards.
In the future improved XPath/X Query standardisation and wider-take-up by suppliers is likely to reduce or even eliminate the requirement for adaptor software for each native XML database system. This will be essential for full interoperability of systems.
Each database holds a collection of XML documents that use a particular schema. Schemas were issued centrally to consortium members by the technical partners, Software AG, Madrid, Salzburg Research and AIT in Graz. These included Marc, Amico, TEI headers and EAD schemas using open externally published schemas adapted only where absolutely essential in minor ways for operational reasons or to correct errors. A great deal of work was required by content provider staff experts to develop suitable mappings from all the accepted COVAX formats to the required index access points, based on the Dublin Core Metadata Element Set to allow cross-domain searches. In addition to this work, of course, each partner also had to provide appropriate mappings and conversion from a much wider range of local formats.
In addition the COVAX holds XML records based on the principle of Z39.50 Explain, describing content providers, systems information and collection information (collections were referred to as databases during the project). For these, a set of new schemas was prepared and content was supplied in one of six native languages and English, then translated into all the others by the relevant language partners. Users therefore have access in their native language to collection level information content at least.
![]() |
| Figure 2 : Part of a COVAX XML Explain document |
Altogether we created some 17 production databases and 6 test databases containing the collections in Figure 3:
![]() |
| Figure 3: COVAX Content |
One of the main requirements and benefits of the project was to develop our understanding of XML at both technical and information professional levels. This we achieved by carrying out a survey of the state of the art of XML handling software (available on the project website), and by making use of market-leading tools. The most important such tool was XML Spy, an IDE (Integrated Development Environment) from Altova GmbH. Available free or open-source XML tools were not found particularly suitable or easy to use, especially compared with tools available for HTML, web authoring, java and JavaScript related purposes. XML Spy supports the full XML syntax, parsing, well-formedness, validation, encoding; DTD definition; schema definition; XSL and XSLT management; HTML and XHTML rules (this last is a superset of HTML4.0 rules that adds to markup a more rigorous syntax and compatibility with XML environments); syntax highlighting; interoperability with other external applications (imports for example from MS Word, MS Access); some of ASP etc. Such a tool was found suitable for technical staff and skilled authors, providing us with the means to ensure only valid content reached the repositories, and to test and check COVAX meta-searching. However, we did find problems when trying to validate large batches of records, typically exported from existing management systems, since most tasks required all records to be held at once in main memory. For this reason, and because these tools are generic, partners also used other techniques to validate and correct specific types of content and to convert character encodings where necessary.
In order to evaluate usability and design issues, two prototype COVAX versions were built in Java code, using XML files for configuration and storage information and XSL stylesheets for transforming XML from one form to another. The second version contained the final set of project COVAX features. A shared gateway user interface for resource discovery was created allowing browsing of collections in six languages and cross-searching of all the distributed repositories, although a public version has not yet been made available. It is possible for users to select their preferred interface language, and the system architecture is designed to hold group or personal profiles and persistent storage between sessions, along with search histories and statistics. However, where possible, such facilities would be provided using existing authentication or storage services, and the open architecture also allows search aids such as thesauri or XML transformation and enhancement services to be added at a later date.
COVAX is essentially 'middleware', not necessarily visible to end-users, but capable of enhancing portal or local web services by delivering an integrated stream of Dublin Core compliant XML or HTML formatted records for diverse types of cultural content from a consortium of content providers. It provides:
![]() |
| Figure 4: COVAX Architecture |
A fuller description of the user interface has been published in Program (Yeates, 2002), but the figures following show a logged-in search forms and brief search results. Full search results displays vary depending on the resource type, but are displayed at least partially in sequence on a single results page for speed of in-page navigation.
![]() |
| Figure 5: COVAX Prototype 2 User Interface |
![]() |
| Figure 6: Swedish language display of bibliographic results from an Italian collection |
![]() |
| Figure 7: Results from an Austrian museum image collection (AMICO format) |
COVAX has implemented a complex demonstration of a fully XML-based resource discovery network that has taken great account of the wide variations in cataloguing practices throughout several European countries. However, it is not yet a complete product.
Users should drive the design of any system, although hidden systems, such as much of COVAX is, present design collaboration challenges. Part of COVAX consists of elements specified by cultural and information professionals. Other parts were designed by expert web technologists. Therefore it was important to involve outside stakeholders and users in shaping further development of the system. Expert usability assessment advice was provided from outside the project team but within one of the partner organisations, Salzburg Research.
A complete usability assessment framework and usability toolkit were created, through a project workshop followed by individual partner development work. Then we had a clear target groups matrix, some usage scenarios for each and instructions and worksheets for carrying out interviews, observations and questionnaire pre- and post-trial surveys at a wide range of sites internationally. Work was undertaken over a short time period, but generated much useful information as a result of the careful planning, especially as we could directly compare independent results.
Feedback was contributed by many stakeholders, ranging from those responsible for national digitisation policy to web researchers, cataloguing experts, non-specialist academics and the general public. Groups studied were:
The main conclusions of the user assesment were:
Overall, the issues arising from these assessments were no surprise to the consortium, because the main problems had already been identified: long waiting time for answers, time outs, results ordering and revision of the interface design. These modifications have been discussed during consortium meetings and kept for future developments, some of which will depend on general improvements in XML networking.
We have shown that it is feasible to migrate legacy cultural services to an XML environment, and that there are benefits for users if this comes about. They may gain more immediate access to deeply linked, high quality content held in a multitude of European, and indeed global, repositories. Awareness of materials will rise as certain multilingual access and customisation facilities can be implemented relatively easily using XML and XSL. However tools are not yet fully mature, especially within the cultural heritage sector. Services can however be built now that encourage staff development and stakeholder involvement.
The COVAX partners continue to develop their repositories, but we expect much to change in terms of access arrangements and ultimate service design, as we develop understanding of new professional and commercial opportunities.
Silke Grossmann, Vic Haesaerts, Gerda Koch and Walter Koch [2002] have reported on the REGNET Project [9] which aims to set up a functional network of service centres in Europe, providing IT-services dedicated to Cultural Heritage organisations. This may be one useful way forward, and there are other initiatives of a similar scope underway.
We recommend urgent attention is paid, however, by all institutions, large and small, to XML. The complex MARC21 DTD used by COVAX is likely to be replaced by more appropriate XML based information models for bibliographic data. Presentation of full-text documents and lengthy finding aids requires improved techniques for adapting content for resource discovery to improve performance. Too much nesting of elements in XML documents obstructs mapping of access points and indexing. The standards and protocols for searching distributed databases need to be improved, and adaptation of Z39.50 for HTTP and XML is a promising approach.
The COVAX architecture is not just applicable to cultural heritage applications, but applies also to distribution of information about elearning products or tourism information. The principle of cross-domain searching was strongly endorsed by COVAX trial users, but much more work is needed by everyone to provide appropriate content and system performance so that a full European Information Environment can be achieved.
So, what of the future? An increased emphasis is likely on support for the autonomous learner, in order to support the concept of lifelong learning and not merely formal education whilst at school, college or university. Learners, as opposed to teachers, need to be able to interact more deeply with resources, and teachers want to capitalise on new digital resource provision, in order to gain the benefits of improved student motivation and self-confidence that these resources can generate.
We certainly need to include legacy materials in the mix of learning opportunities. However, it may be more important to explore how we might build new innovation platforms for the creation and development of new cultural heritage services that will attract future learners.
Evidence from Covax shows the value of XML in resource discovery, but also the need for agencies to provide ongoing data conversion services. It shows the value of developers working with intermediaries, but also the challenges of delivering meaningful services without wider partnerships being created.
Robin Yeates
Associate Director
LITC
South Bank University
103 Borough Rd.
London SE1 0AA
United Kingdom
URL: <http://www.sbu.ac.uk/litc/
Email: yeatesrb@sbu.ac.uk
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
For citation purposes:
Yeates, R. "COVAX: Making Visible the Culture of Europe", Cultivate Interactive, issue
7, 11 July 2002
URL: <http://www.cultivate-int.org/issue7/covax/>
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Related articles:
If you would like to view similar articles to this one click on a key word below:
< - COVAX - libraries museums and archives - XML-based networking - middleware - >
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
Copyright ©2000 - 2001 Cultivate. | Published by UKOLN | Design by ILRT | Contact Us |