Cultivate Interactive Home Page *
*

Search Disabled

  Home | Current Issue | Index of Back Issues
  Issue 6 Home | Editorial | Features | Regular Columns | News & Events | Misc.

Going Beyond Traditional Digital Libraries for Cultural Heritage: The COLLATE Collaboratory

By Adelheit Stein, Ulrich Thiel and Jürgen Keiper - February 2002

Project COLLATE develops a new type of a WWW-based collaboratory for cultural heritage information. The implemented system provides access to a newly constructed digital library on rare sources of historical film documents. Professional domain experts use the COLLATE system to analyze, evaluate and collaborate on the interpretation and indexing/annotation of the digital repository documents. The hereby provided metadata are managed by an advanced XML-based content manager and an intelligent content and context based retrieval system.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Introduction

Collate logoIn September 2000 an international team of cultural content providers, film domain experts and technology developers - together with a designated evaluation partner - started out to develop and put into practice a new type of collaboratory in the domain of cultural heritage. The EU-funded project “COLLATE – Collaboratory for Annotation, Indexing and Retrieval of Digitized Historical Archive Material” (IST-1999-20882) is part of the EU DIGICULT programme, and runs for three years: see the project Web site [1].

During the last few years the idea of collaboratories emerged in the Natural Sciences area and was applied by several disciplines and research groups. As defined by Kouzes et al. [2] a “collaboratory” (merger of the terms collaboration and laboratory) is a virtual centre on the Web, where professionals and lay persons are provided with the means for interacting with colleagues, accessing instrumentation, sharing data and computational resources, and accessing information stored in digital libraries and archives.

Various collaboratories have been employed since the early 90s, mainly in Natural Sciences, but so far we have found – aside from some systems with very limited functionality – only a few similar efforts in Arts and Humanities. Whereas the organisation and preservation of historical knowledge in Arts and Humanities are still comparable, some of the work processes in the more interpreting sciences are different and need to be supported by appropriate system functionalities. Large collections of important historic and cultural sources are scattered in national archives with no electronic versions available, so that immediate access to and work with this material are severely impeded. On the other hand, there exist many – but mostly informal and non-institutional – contacts between cultural archives constituting specific professional communities. However, effective and efficient technological support for collaborative knowledge working is still missing. Technologically, the World Wide Web can serve both as a standard communication platform for such communities and as a gateway for document-centered digital library applications.

The COLLATE system not only provides the functions of a traditional digital library, but on top of that it employs a generic approach to collaborative knowledge working with cultural sources, i.e. supporting users in their analysis, interpretation, evaluation of the sources and the creation of new knowledge as a result of this work with the material. Therefore, the system and its components are adaptable to heterogeneous cultural domains. As an example application COLLATE focuses on the film domain in general, and in particular on specific questions of historic film documentation like comparative film censorship analysis in several countries. Three major European film archives are providing several thousand digitized multi-format documents on European early 20th century films for COLLATE’s data repository. But in principle the tools and user interfaces developed can easily be adapted to other content domains, types of applications and users.

The COLLATE Approach

Designed as a content and context based knowledge working environment for distributed user groups, the COLLATE system supports both individual work and collaboration of domain experts who are analyzing, evaluating, indexing and annotating the material in the multimedia data repository. It continuously integrates the user knowledge thus derived into its metadata repositories, and on this basis can offer improved content-based retrieval functionalities within the information system [3]. Users can therefore both access and create valuable knowledge about the cultural, political and social contexts, which in turn allows other end-users to better retrieve and interpret the material.

Technology development on the one hand and extended empirical investigation of the acceptance of a collaboratory in the cultural domain on the other are the two backbones of the COLLATE project:

Results from both areas of project work strongly influence each other to enable an iterative, dynamic system development. Evaluation steps are explicitly built in, and the users themselves are actively involved throughout the various development cycles.

A first prototype of the COLLATE system was implemented at the end of the first project year, allowing formal cataloguing, content-based indexing and annotation of digitized text documents and document passages [4]. Subsequent system versions will incorporate advanced document preprocessing modules for automatic analysis and indexing of multimedia data, and especially more extended support of the collaboration between users based on an explicit collaborative task model.

COLLATE focuses on the film domain and has built up a large digital repository of rare historic film censorship documents from the 20s and 30s, most of them as yet unavailable in electronic form, but scattered in various archives; and a proportion is not yet even analyzed and catalogued. These documents are highly relevant for film historians analyzing censorship history, as well as for any social scientists. For a subset of significant films the data repository offers in addition enriched documentation, including digitized newspaper articles, photos, stills, posters and film fragments. In-depth analyses and comparison of such documents provide, for example, evidence about different film versions and cuts, which can be used for the reconstruction of lost or damaged films or for the identification of actors and film fragments of unknown origin.

As a prerequisite for collaborative work all material is analyzed, indexed, partly translated, annotated and interlinked by film experts. The COLLATE system provides them with appropriate task-based interfaces for in-depth indexing/annotation and other tasks as well as with supporting knowledge management tools (indexing aids and special keyword lists). End-users may also take an active part in evaluating sources and adding valuable information through annotations. In this way, a growing body of metadata is emerging over time. The system exploits this data by employing advanced XML-based content management and advanced retrieval methods [5]. The final version of the online collaboratory will integrate cutting-edge document processing and management facilities, e.g., XML-based document handling, digital watermarking and semi-automatic segmentation, categorization and indexing of digitized text documents and pictorial material (photos, posters, film fragments).

By combining the results from the manual and automatic indexing procedures, elaborate content-based retrieval mechanisms can be applied. This helps users find what they are actually looking for, to combine evidence from various sources and to interrelate so far unrelated sources and knowledge. Thus, not only the size and richness but also the quality, affordability and acceptability of the information repository are constantly being improved.

The COLLATE System

The system features innovative models and techniques in the following areas:

Offering content and context based access to the data repository is a crucial feature of COLLATE. To implement an information system with advanced retrieval functions, we must go beyond current practices of merely providing digital reproductions of and simple online access to historic sources. Instead, results from current and previous scholarly work such as evaluating and indexing these sources must be incorporated into the information system, e.g., in the form of metadata and annotations.

COLLATE users are directly and indirectly involved in system development because they participate in enriching the document repository through new sources and successive annotations and indexation. In particular, the focus on annotation as a typical task in the domain of the humanities is central to the COLLATE concept. Annotation as a multifunctional means of in-depth analysis can be done individually but also collaboratively, for example in the form of annotation of annotations, collaborative evaluation and comparison of documents. As a result, a large amount of value-added information is being provided in addition to the digitized documents.

Support of collaborative work goes beyond contemporary groupware products, offering innovative functions such as:

The dynamic accumulation of value-added information through annotations requires the data structures to be scaleable and extensible. In order to capture these dynamics we chose XML as a de facto standard for the encoding of generic document and metadata representation schemata. Through the use of XML we are able to guarantee the generality of our approach, since these schemata can be enriched and tailored to additional sources and knowledge incorporated into our system without any need for re-modeling the whole system. In addition, XML is the basis for the integration of knowledge processing methodology and retrieval functionality in the system. Therefore, the system is capable of capturing the dynamics of collaboration without neglecting the necessary flexibility of scaleable and extensible representation schemata, which can be transferred to other content domains as well.

The COLLATE collaboratory is a multi-functional software package integrating a large variety of functionalities, which are realized by inter-related software modules. It comprises several databases and different document representation schemata. XML is used as the uniform internal representation language for the documents in the repository and the associated metadata as well as for the implementation of the communication protocol among its system modules.

Figure 1: The COLLATE system is structured into several functional layers
Figure 1: The COLLATE system is structured into several functional layers

Three document pre-processing modules are being developed and will be incorporated into the final COLLATE system version:

As shown in Figure 1 the COLLATE system is structured into several functional layers:

Operational Layer – The distributed digital data repository comprises a variety of data, ranging from scanned-in text documents to multimedia data and the accumulated annotations related to one or more of these original data sources.

Domain Metadata Layer – In order to organize the stored data in a way that supports the complex knowledge-intensive tasks users perform suitable tools for metadata management are being provided. The knowledge structures, which are represented by specific XML schemata, constitute the Domain Model. They comply with the TEI (Text Encoding Initiative) and CES (Corpus Encoding Standard) metadata standards, but we needed to extend these in order to cope with the rich structure of our domain.

Collaborative Task Layer – The COLLATE system allows a wide variety of user types to access, work with and evaluate the digitized material. A generic task model has been developed for complex working tasks like source edition, identification of lost or cut film scenes, preparation of a virtual exhibition, etc. As some of these tasks can be performed collaboratively, e.g., collaborative inspection and interpretation of source material, a collaboration model is being developed which builds the basis for offering context-dependent interface functions for collaboration between users.

Interface Layer – In order to support the users in accomplishing their tasks COLLATE provides appropriate interfaces for convenient work with the digital documents. In future system versions, these interfaces can be semi-automatically derived from the underlying task model. Certain specialized interface components for annotation, mark-up, editing, search and retrieval are used to facilitate user interaction. The specification of the interface structure also utilizes XML to allow for generic mapping to concrete instantiations (e.g., Java Swing).

As indicated above, communication between these layers is realized through XML-based communication protocols. The implementation employs XML transformations (XSLT) as a basis for the communication infrastructure in the COLLATE system.

Conclusion

The COLLATE system represents a new type of collaboratory supporting work with cultural document sources by innovative task-based interfaces for in-depth content indexing and annotation. In this way, the access and retrieval of content from traditionally scattered and electronically unavailable sources is significantly improved. Furthermore, advanced features of the system - such as digital watermarking, semi-automatic document segmentation and picture analysis - increase the usability of the system.

References

  1. Project Web Site
    URL: <http://www.collate.de> Link to external resource
  2. Collaboratories: Doing Science On The Internet, Richard T. Kouzes, James D. Myers, William A. Wulf: IEEE Computer, Volume 29, Number 8, August 1996.
    URL: < http://www.emsl.pnl.gov:2080/docs/collab/presentations/papers/IEEECollaboratories.html> Link to external resource
  3. Brocks, Holger; Thiel, Ulrich; Stein, Adelheit & Dirsch-Weigand, Andrea (2001) Customizable Retrieval Functions Based on User Tasks in the Cultural Heritage Domain, in Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries (ECDL '01), September 4-9, 2001, Darmstadt, Germany, Berlin: Springer, 2001, 37-48.
  4. Keiper, Jürgen; Brocks, Holger; Dirsch-Weigand, Andrea; Stein, Adelheit & Thiel, Ulrich (2001) COLLATE – A Web-Based Collaboratory for Content-Based Access to and Work with Digitized Cultural Material, in Proceedings of the International Cultural Heritage Informatics Meeting (ICHIM '01, ) ed. Bearman, D. & Garzotti, F., Milano: Politecnico di Milano, 2001, 495-511.
  5. Stein, Adelheit, Gulla, Jon Atle, Müller, Adrian & Thiel, Ulrich (1998) Abductive dialogue planning for concept-based multimedia information retrieval, in Integrated Publication and Information Systems. 10 Years of Research and Development, ed. Fankhauser, P. and Ockenfeld, M., Sankt Augustin: GMD – Forschungszentrum Informationstechnik, 1998, 129-148.

Author Details

Dr Adelheit SteinDr. Adelheit Stein
Project manager
Fraunhofer IPSI (Institute for Integrated Publication and Information Systems)
Dolivostrasse 15
D-64293 Darmstadt
Germany

Phone: +49 6151 869-841

stein@ipsi.fraunhofer.de Link to an email address
<http://www.ipsi.fhg.de/~stein> Link to external resource

Dr. Adelheit Stein is the head coordinator of the COLLATE project. She has been a senior researcher at Fraunhofer-IPSI (formerly GMD-IPSI) since several years. Her background is in Sociology and Philosophy, with a special focus on cognition and social interaction. At IPSI she was involved in several European IT projects and university teaching. Her current research interests include: human-computer interaction, collaboration support, dialogue planning, intelligent user interfaces.

Dr. Ulrich ThielDr. Ulrich Thiel
Project manager
Fraunhofer IPSI (Institute for Integrated Publication and Information Systems)
Dolivostrasse 15
D-64293 Darmstadt
Germany

Phone: +49 6151 869-855

thiel@ipsi.fraunhofer.de Link to an email address
<http://www.ipsi.fhg.de/~thiel> Link to external resource

Dr. Ulrich Thiel is responsible for the technical coordination of the COLLATE system. He holds both a diploma in Computer Science and a PhD in Information Science. Since several years he has been a senior researcher at Fraunhofer-IPSI (formerly GMD-IPSI). He was manager of many EU-funded projects. His research interests include: intelligent information retrieval, dialogue planning, conversational and adaptive information systems.

Juergen KeiperJürgen Keiper
Project manager
Deutsches Filminstitut – DIF
Schaumainkai 41 60596
Frankfurt am Main
Germany

Phone: +49 69 96 12 20 0

keiper@deutsches-filminstitut.de Link to an email address
<http://www.deutsches-filminstitut.de> Link to external resource

Jürgen Keiper is the scientific coordinator of the content-providers and film archives of the COLLATE project. He holds a masters degree from the University of Frankfurt, and his special background is in Theater, Media and Film Sciences. Since several years he has worked for the DIF (German Film Institute) in various research projects. His research interests include: film theory and criticism and social history of film. He is editor of the film journal Film und Kritik.

Bibliography

  1. Altamura, Oronzo; Esposito, Floriana & Malerba, Donato (2001) Learning to Correct the Layout.Extracted from Document Images, in Proceedings of the Workshop on Artificial Intelligence, Vision and Pattern Recognition in the 7th Congress of the Italian Association for Artificial Intelligence (AI*IA '01), ed. A. Chella, D. Malerba, Bari: 24 September 2001, 63-73.
  2. Brocks, Holger; Dirsch-Weigand, Andrea; Keiper, Jürgen; Stein, Adelheit & Thiel, Ulrich (2001) COLLATE – Historische Filmforschung in einem verteilten Annotationssystem im WWW, in Information Research & Content Management - Orientierung, Ordnung und Organisation im Wissensmarkt. Proceedings der 23. DGI-Online-Tagung 2001, ed. R. Schmidt, Frankfurt am Main: DGI, 2001, 183-196.
  3. Brocks, Holger; Thiel, Ulrich; Stein, Adelheit & Dirsch-Weigand, Andrea (2001) Customizable Retrieval Functions Based on User Tasks in the Cultural Heritage Domain, in Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries (ECDL '01), September 4-9, 2001, Darmstadt, Germany, Berlin: Springer, 2001, 37-48.
  4. Caneppele, Paolo (2001) Beschnittene Schaulust. Entstehung und Entwicklung der Filmzensur in Österreich. Ein Abriß. (1900-1938). Medien und Zeit, 16 (2), 2001, 22-34.
  5. Ferilli, Stefano (2001) Management of Cultural Heritage Material: The COLLATE project, in Proceedings of the Workshop on Artificial Intelligence for Cultural Heritage and Digital Libraries in the 7th Congress of the Italian Association for Artificial Intelligence (AI*IA '01), ed. L. Bordoni, G. Semeraro, Bari: 2001, 29-33.
  6. Ferilli, Stefano; Fanizzi, Nicola & Semeraro, Giovanni (2001). Learning Logic Models for Automated Text Categorization, in Advances in Artificial Intelligence AI*IA, ed. F. Esposito, Lecture Notes in Artificial Intelligence 2175, Berlin: Springer, 2001, 81-86.
  7. Keiper, Jürgen; Brocks, Holger; Dirsch-Weigand, Andrea; Stein, Adelheit & Thiel, Ulrich (2001) COLLATE – A Web-Based Collaboratory for Content-Based Access to and Work with Digitized Cultural Material, in Proceedings of the International Cultural Heritage Informatics Meeting (ICHIM '01, ) ed. Bearman, D. & Garzotti, F., Milano: Politecnico di Milano, 2001, 495-511.
  8. Malerba, Donato; Esposito, Floriana; Lisi, Francesca A. & Altamura, Oronzo (2001) Automated Discovery of Dependencies Between Logical Components in Document Image Understanding, in Proceedings of the Sixth International Conference on Document Analysis and Recognition (10-13 September 2001), Seattle: 2001, 174-178.
  9. Malerba, Donato; Esposito, Floriana & Altamura, Oronzo (2001) Learning Rules for Layout Analysis Correction. Paper presented at Workshop on Document Layout Interpretation and its Applications (DLIA '01), Seattle: 9 September 2001.
  10. Semeraro, Giovanni; Ferilli, Stefano; Fanizzi, Nicola & Floriana Esposito (2001) Document Classification and Interpretation through the Inference of Logic-Based Models, in Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries (ECDL '01), September 4-9, 2001, Darmstadt, Germany, Berlin: Springer, 2001, 59-70.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

For citation purposes:
Stein A, Thiel U and Keiper, J. "Going Beyond Traditional Digital Libraries for Cultural Heritage: The COLLATE Collaboratory", Cultivate Interactive, issue 6, 11 February 2002
URL: <http://www.cultivate-int.org/issue6/collate/>

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Related articles:
If you would like to view similar articles to this one click on a key word below:

< - collaboratory - film - indexing - annotation - >

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -