Cultivate Interactive Home Page *
*

Search Disabled

  Home | Current Issue | Index of Back Issues
  Issue 6 Home | Editorial | Features | Regular Columns | News & Events | Misc.

This page is intended for printing purposes.

Cultivate Interactive Issue 6: Regular Articles

At the Event:

Praxis:

Metadata:

-------------------------------------------------------------

DIGICULT Column

By Ian Pigott - February 2002

This section aims to provide news of the European Commission's initiatives in the field of digital heritage and cultural content. Its objectives are to pinpoint the latest developments in programmes, projects and activities and to give a clear picture of progress in the area since the last issue. It certainly does not pretend to be a comprehensive account of what the EC is doing in the area but rather a short summary of some of the key items. The content is based largely on the information provided in the eCulture Newsletter, published by the European Commission, DG Information Society, Cultural Heritage Applications Unit [1].

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Since the last column in October, we have had an active time on all fronts. The IST programme has issued its last Call for Proposals under the present framework programme and we are all beginning to look at the 6th Framework Programme with a view to identifying key research themes for the future. During the last six months we have been working on the Lund Action Plan and on the Brussels Quality Framework. The Belgian Presidency has been very supportive of our activities and very forward looking in the Council Resolution on culture and the knowledge society. Contacts with the Spanish Presidency have been very positive and we hope to continue to develop with them the political vision for eCulture over the next six months.

The Lund Action Plan provides a framework for the implementation of the Lund Principles, a set of recommendations produced and agreed by representatives nominated by Member States to drive and stimulate digitisation of cultural and scientific content across Europe. Significant headway has already been made on plans for implementation: 1- wide diffusion and consensus building around the initiative, both at political and professional level, to be achieved in part by translation into all the official EU languages; 2- the National Representatives Group (NRG) has been established by all Member States for the coordinating group; 3- some specific working groups of experts nominated by Member States have been launched under the coordination of the NRG, with a view to building a common platform on key issues.

In regard to opportunities for projects, the 8th IST call [2] is open until 21 February 2002. While our sector has completed most of the focused work in the area of RTD for scientific and cultural heritage, there are openings for proposals in a number of more general areas: Action Line III.5.1 provides opportunities for disseminating the results from KAIII actions; Action Line III.5.2 looks ahead to RTD roadmaps and collaborative schemes for FP6 while Action Line III.5.3 aims at future paradigms for next-generation knowledge and interface technologies. The Cross Programme Activity V.1.15 CPA15 concerns technology platforms for cultural and arts creative expression.

We would like to draw special attention to Cultivate-list, an email discussion list for anyone in the cultural heritage sector interested in the Information Society Technologies (IST) Programme. If you need details of calls for proposals, if you are looking for partners, if you just want to find out about IST projects - this list is for you. The Archives of the Cultivate E-list can be accessed online [3]. Cultivate has also found a prominent place on the Information Society Discussion List in connection with the launching of the "Your Voice in Europe" initiative [4]. There has been considerable progress on the DigiCULT study. Indeed, after one year of research and extensive input from Cultural Heritage experts, the study on "Technological Landscapes for Tomorrows Cultural Economy", DigiCULT, is in its final phase. The primary objective of this study is to provide archives, libraries, museums and other institutions with recommendations on how to adapt to the technological and organisational challenges of the digital environment. The main topics considered are national policies and initiatives, organisational change, exploitation and technology. Within a few weeks, you will find the executive summary and final report for downloading [5]. The DigiCULT study is commissioned by the Cultural Heritage Applications Unit of the DG Information Society and paid for by the European Commission. DigiCULT co-organised the session on Culture and Community-Building at the IST Conference 2001, held in Düsseldorf in December last year [6].

DLM -Forum has announced an important conference on "@ccess and preservation of electronic information: Best practices and solutions". This third multidisciplinary European DLM-Forum on electronic records will take place at the Palacio de Congresos de Cataluña, in Barcelona, Spain, on 7 and 8 May 2002. The opening of the exhibitions and preconference activities will take place on 6 May 2002. The DLM-Forum 2002 welcomes specialists and executives representing different disciplines from public administrations, archives, ICT industries and research. A large number of participants from the EU Member States, regions, candidate states and other European countries are expected. The DLM-Forum 2002 will be organised by the Secretariat for the Information Society of the Catalan government together with other Catalan institutions and departments of the Spanish central government. Support is being provided by the European Union Presidencies of Sweden (1st half of 2001), Belgium (2nd half 2001) and Spain (1st half of 2002), the European Commission (Secretariat General, DG Information Society) and representatives from the ICT industry. The aspects of short- and long-term preservation, transparency, access and openness of public information will play an important role at the DLM-Forum 2002. The forum aims to achieve concrete results in this area. Notably, it will examine the creation of a Europe-wide network of excellence on electronic archives in order to achieve an even wider cooperation in this area between Member States, regions and at Community level [7].

The DELOS Network of Excellence, funded under Digital Heritage and Cultural Content, in co-operation with the Digital Library Initiative (DLI) of the US National Science Foundation (NSF), is organising a two-day concertation workshop of all projects funded by the EC IST program and the US DLI program in the field of Digital Libraries (DL), also including invited representatives of other relevant initiatives in Europe. The "EC/NSF DL All Projects Workshop" will be held in Rome on 25 and 26 March 2002. For further information contact Tarina Ayazi [8].

Forthcoming Research Programme

Discussions are currently underway on the EU's future research programme (a 6th Framework Programme for Research and Development). The Cultural Heritage Applications unit of the European Commission's DG Information Society is contributing to the cultural part of this discussion. Relying on the presence of many key experts at the successive EVA 2001 conferences in Florence (March), Glasgow (July) and Berlin (November), responses and comments to some basic questions were collected and collated in the form of a "Florence Agenda" followed by the "Glasgow Response" and then, finally, the "Berlin Conclusions".

In line with a general focus on consolidating the development of tools and services for an inclusive approach to raising scientific and cultural awareness for education, quality of life and tourism while highlighting common and specific aspects of cultural identity, more detailed recommendations were made in a number of areas.

In regard to the key aspects of access, content and users, the primary concern is the quality of content, the challenge for the cultural sector being to create high quality and pertinent digital resources (including a stable infrastructure for delivery). This includes the creation of truly integrated digital archives (providing seamless/dynamic access to large volumes of cultural objects and documents of various types, often in the form, of distributed resources). Specific attention should also be given to research investigating methods of improving user-friendly access to scientific and cultural content in accordance with a wide variety of personalised user requirements. The unanswered question of how, why and when users (individuals or communities) interface with digital collections requires detailed assessment, including consideration of emotional and intellectual access to content, especially for those who might otherwise be excluded (cultural poverty). Interaction with digital collections and the ability of individuals or communities to interact with and to add their own creativity also deserves consideration. Care should be taken to attract young people, both as users in their own right and as a vector of future trends. Finally, attention should be paid to access across languages, establishing a dialogue of cultures and building of new online communities.

It is also considered important to develop a common, European view on standards, promoting development and use of open standards, particularly in the context of widely applicable international solutions. Here, efforts should be made to promote a more active role for Europe. Another key issue for unlocking the full potential of scientific cultural resources is the creation of a harmonised European approach to a legal system with adequate protection of IPR.

On the technical front, mobile services will be the next great challenge, requiring incorporation of 3G technologies for services meeting the needs of the cultural institutions and operators. Further enhancement of 3D and VR representations will provide a basis for enhancing representation of real objects. Another challenge will be to support preservation and long term availability of digitally born content. And cultural content will contribute to speeding up technological developments, making the vast resource of authentic materials in digital form available for enhanced and focused research into digital collections as an integral part of an intellectual infrastructure for Europe's future research area.

From an economic viewpoint, there is a need for institutional knowledge resources and skills to be improved and sustained. In addition, it will be important to work towards lowering the cost of ICT applications in order to encourage competitive growth. Co-operative efforts across the whole of Europe should include not only culture but also research and creativity and should target job creation is an important objective. Culture is increasingly considered a strategic element for business in the context of culturalising the economy. Institutions must start to consider themselves not just as cultural actors but should take on an economic role. The economic potential of broadband access to the internet should also be borne in mind.

The importance of the European dimansion was also stressed. Only a truly co-operative and integrated approach would allow for a meaningful European digital cultural and scientific landscape. All European countries, not just the existing Member States, have key assets for cultural tourism calling for a co-ordinated approach across the continent.

Finally, the key stakeholders - whether from universities, research institutes, cultural institutions and large and small enterprises - should be seen as participants in the creation and evolution of pan-European laboratories of knowledge.

Project Developments

Turning to news of our projects and related developments, Europe's public libraries and cultural organisations have a vital role to play in the development of an e-Europe. The PULMAN Network of Excellence (Public Libraries Mobilising Advanced Network) now includes representatives of 26 European countries. The PULMAN approach is inclusive and participation will be extended, in the first instance by the establishment of wider groups of activists in each country. Activities and plans of the PULMAN Network are presented on the web, including guidelines manuals, a policy conference, international cooperation agenda, training workshops [9]. PULMAN Express, the first issue of its newsletter, is published at regular intervals throughout the year. free. To subscribe, visit the PULMAN web server and fill in the on-line form [10]. The first newsletter is now available [11].

The SCHEMAS project is recently hosted a workshop to review ways in which projects can share information about metadata use, and consider how collaboration can be rendered more effective. Participants were updated on recent developments regarding use of metadata schemas and application profiles. The role of registries in providing access to information about schemas was reviewed, and the SCHEMAS registry demonstrated. Presentations from the day are available on the Schemas site [12].

RENARDUS is a collaborative project that aims to improve academic users' access to a range of existing Internet-based information services across Europe. The latest project developments are available in the news digest [13].

The DIFFUSE project has developed an excellent and potentially comprehensive reference to projects dealing with standards issues, including short but effective project descriptions. All projects should visit the site and make sure that the information presented is correct. Projects are able to register or submit updates themselves [14].

The TRIS Accompanying Measure is to provide services aimed at increasing the coordination, impact and dissemination of the 25 take-up trials selected under the 4th IST Call for Proposals and funded under Digital Heritage and Cultural Content. In particular, it will provide projects with assistance on clustering, maximising impact, exchanging experiences and success stories, and supporting and facilitating the execution of IST TRIAL actions by encouraging standardisation, synergy, technology transfer and exploitation [15].

The MULTIMOD project started on October 2001 and will last three years. It will focus on improving the human-machine interface with particular reference to biomedical applications. The target of the demonstrator applications will be the musculo-skeletal apparatus [16].

References

  1. eCulture Newsletter
    URL: <http://www.cordis.lu/ist/ka3/digicult/en/newslett er.html> Link to external resource
  2. 8th IST Call
    URL: <http://www.cordis.lu/ist/calls/200104.htm> Link to external resource
  3. The Archives of the Cultivate E-list
    URL: <http://lists.ukoln.ac.uk/cultivate-list/> Link to external resource
  4. Your Voice in Europe initiative
    URL: < http://europa.eu.int/information _society/services/discussion/index_en.htm> Link to external resource
  5. DIGICULT Study
    URL: <http://www.salzburgresearch.at/fbi/digicult/> Link to external resource
  6. Culture and Community-Building, IST Conference 2001
    URL: <http://2001.istevent.cec.eu.int/december_3-5/session.asp?id=42> Link to external resource
  7. DLM Forum
    URL: <http://europa.eu.int/historical_archives/dlm_forum/index_en.htm> Link to external resource
  8. Tarina Ayazi, email address tarina@iei.pi.cnr.it Link to an email address
  9. PULMAN Web
    URL: <http://www.pulmanweb.org/about/about.htm> Link to external resource
  10. PULMAN newsletter registration form
    URL: <http://www.pulmanweb.org/news/register.asp> Link to external resource
  11. PULMAN newsletter
    URL: <http://www.pulmanweb.org/pulmanexpress/October2001. pdf> Link to external resource
  12. Schemas 4th Workshop presentations
    URL: <http://www.schemas-forum.org/workshops/ws4/progr amme.html> Link to external resource
  13. Renardus News Digest
    URL: <http://www.renardus.org/news/digest10.html> Link to external resource
  14. DIFFUSE
    URL: <http://www.diffuse.org/projects.html> Link to external resource
  15. TRIS
    URL: <http://www.trisweb.org/home.php> Link to external resource
  16. MULTIMOD
    URL :<http://www.ior.it/multimod/> Link to external resource

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Author Details

Ian PigottIan Pigott
Project Officer

Ian.Pigott@cec.eu.int Link to an email address

With the assistance of the Cultural Applications team http://www.cordis.lu/ist/ka3/digicult/en/our_team.html Link to external resource in Luxembourg.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

For citation purposes:
Piggott, I. "DIGICULT Column", Cultivate Interactive, issue 6, 11 February 2002
URL: <http://www.cultivate-int.org/issue6/digicult/>

-------------------------------------------------------------

The TRIAL projects and their accompanying measure TRIS

By Monika Segbert - February 2002

Monika Segbert introduces the TRIAL projects and enlightens us on ACTIVATEd BEASTS that KIST MATAHARI IN VALHALLA. Interested...well read on.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Background

TrisUnder Action Line III.1.5 (Trials on new access modes to cultural and scientific content) the Cultural Applications unit of DG Information Society launched as part of its 4th call in 2000 a programme for ‘Take-Up Actions’. The intention was to launch trial actions across Europe, which will encourage take-up of results and stimulate the implementation of innovative products and services in the cultural heritage sector. The focus of the Action Line was on exploring and experimenting novel ways of creating, manipulating, managing and presenting new classes of intelligent, dynamically adaptive and self-aware digital cultural objects, either held by memory institutions (archives, libraries, museums, etc) or directly involving digitally born objects or art forms. The proposal were to be user-centred and include:

The workplan recommended that focus should be given to the sustainable development of valuable digital repositories in Europe’s libraries, museums and archives, on models for future virtual collections and on guidelines for integrating real and virtual objects and collections. Proposers were asked to provide examples of how dynamic user interaction with the cultural and scientific content can enhance the user experience, addressing the experiences of learning, exploring and entertaining for the user.

As a result of this call 25 new TRIAL projects were chosen. The average duration of the projects is about 12 months. They address a wide-range of user communities (specialist but also general such as children or tourists) and cultural heritage themes. This demonstrates that this is a very worthwhile initiative that has the potential to create a European-wide momentum for innovation not only in the larger cultural institutions, as most of the projects are driven by cultural institutions run by local authorities or by SMEs with local interests. The projects will be interesting to many potential users, as in many cases they will be improving access to cultural assets in museums, archives and libraries, through their use of innovative technologies such as mobiles, digitisation techniques, Internet support.

The TRIAL projects

Playing with the EC acronyms led to the opening line of this article: ACTIVATE will show interesting innovative solutions for accessing cultural resources. It "will also build a virtual reality model of a historic landscape, to provide a new way of accessing existing rich stores of cultural content concerning the landscape". BEASTS will benefit the tourism and leisure industry in Wales: go to the website and read that “eight of the ten SMEs who will be taking part in the trialing have been identified - and represent a wide range of tourism interests. Shops, farms, trekking centres and a riding for those with learning disabilities initiative are included”. KIST will use 3D, audio, animation and video for a digital exploration of the collections of the National Museum of Scotland, whereas MATAHARI will use portable information devices for access to information gleaned from libraries, archives and museums about outdoor objects in order to enhance the visitor’s experience. Finally, VALHALLA is going to provide a resource displaying video & explanation of historic gardens and parks.

The TRIALS meet head on the challenge of cultural institutions taking on ICT, not in an RTD setting, but by experimenting with technology and in partnership between technology providers and (in many cases) smaller cultural institutions – local archives, museums, libraries. These institutions face the challenge of responding to organisational change, of integrating new skills and competences, of meeting the digital challenge. Their motivation is to make cultural content more visible and accessible, to offer their users new ways of interaction and experiences. Those that will benefit from the results of the trials include tourists, teachers and SME’s, historians and scholars, botanists, scientists, the interested citizen, our everyday European life. Particular attention is given to the inclusion and involvement of young people through games and in the creation of content.

Prime examples of involving children are CHOSA, which is developing an interactive web game and a WAP tour, both fun and educational, for access to and awareness of an archaeological site; TREBIS - Trial and Evaluation of a Biodiversity Information System for public use in a natural history museum. This project trials a natural history museum approach to the use of multimedia techniques to enhance awareness of biodiversity, endangered species and ecology. The user community – school children – will be given access to the database and to digital maps. It is a good example of a young Austrian software firm Biogis Consulting partnering with a museum and an educational institution (Institute for Didactis of Biology at the University of Munich and Natural History Museum of Vorarlberg); TPHS is trialling an innovative approach to promoting information on architecture and heritage in that it focuses on information on buildings and related objects which children consider to be of particular interest - it directly deals with Cultura as seen and appreciated by children, in an engaging and playful way.

Another projects aims to bring the sources of history to citizens and tourists - ARCHIVIEW plans to open the resources of historical city archives to broader audiences by integrating easy tools for the management of collections with solutions for creating "narratives" around sources and publishing the results on the Web. This will make a wealth of first-hand information available to those interested in getting to know and feel the past of the towns they live in or are visiting.

A few projects are studying new attractive means to link to town history and historical collections through the resources of virtual reality. VRCHIP, VIRTUAL and HITITE bring slightly different approaches to a common core idea, i.e. showcasing in interactive and fascinating way flagship monuments and resources to make them an access point to the heritage of local identity.

VALHALLA brings an interesting outlook on historical gardens, letting users visit and live them in their rich details of garden architecture in relationship with the buildings that surround them. This is an unusual example of how "minor" but extremely relevant resources of the cultural aspects of towns can be better known and exploited.

And, as a prime example of the rich diversity of the TRIAL projects, the two projects starting with an e- : e-Islam and e-Stage. The one aiming to promote the Islamic collection of the Benaki Museum in Greece by creating digital surrogates of the exhibited items (for example items displayed currently in an exhibition of Glass of the Sultans), the other aimed at setting up a web-based resource on European puppetry.

Time and space are too short to discuss all fascinating aspects of all the other trials not mentioned above in this article (but you will hear more about them elsewhere):

BOOKS2U!: This proposal with an Austrian partnership trials a new approach to inter-library loans which intends to have far reaching impact on necessary improvements across Europe.

CTIC: The partnership, which consists of several UK museums and art galleries, is trialling an online interface enabling users to access cultural content displayed in their collections.

DOMINICO: The project features an Austro-Slovenian partnership to trial innovative technologies for networking smaller museums and exhibition designers as a basis for enhancing a series of exhibitions.

EULER-TAKEUP: Based on the pilot developed through the EULER RTD project, the trial is setting up and evaluating a European digital library for mathematics. The consortium joins partners from Germany, the Netherlands and Italy and addresses the needs of a clearly identified user community.

HYPERGUIDE: The trial builds an XML tool based on a web description methodology, for access to high-value web-based resources in order to enhance selection, filtering and usability of information resources in specific domains.

LAB-VR: The proposal is aiming to improve 3D photographic Internet access to research laboratories and their research activities. Users will be able to view the operational research environment as an interface for gaining further information from the Web..

POUCE: The trial seeks to validate a model for a common access portal to a group of French museums on the Web. The XML-based approach and the exploitation plans are targeted to a sustainable level of service.

SANDALYA: An Italian partnership which will trial the results of previous research dealing with the digitisation of manuscripts. The project has a direct focus on sustainability through both its technical background and active exploitation policies.

SEAX-DAMAS: The project trials a wide range of aspects of archive management in a regional record office relying fully on international standards.

UHI-NMS: The Scottish consortium will trial an approach designed to add value to National Museums of Scotland’s digital content for the National Grid for Learning. Special attention will be given to appropriate pedagogical approaches.

VIRMUS: This project from Latvia proposes to experiment with the use of market-ready 3DML tools in order to enable first-time users to create 3D pages in cultural heritage buildings on the Web. The project aims at a catalytic effect in expanding virtual reality in the museum sector.

More information about each project, and links to individual websites can be found on the TRIS Web site [1].

Accompanying the TRIALS: TRIS

The TRIS Accompanying Measure will cooperate with all TRIAL projects to strengthen and enhance the effectiveness and the benefits of individual actions and projects beyond their own perimeter. The co-ordination, grouping and dissemination activities of TRIS will help the projects to reach critical mass and substantial cultural, scientific and commercial impact. In particular, TRIS will:

TRIS also plans to actively foster the participation of relevant interest groups that may not otherwise be present in IST. This relates in particular to the participation of non-EU countries, mostly within the PHARE, TACIS and MEDA areas. Despite the availability of significant resources and the active promotion policies undertaken by the Commission, in fact, these areas have experienced difficulties in their involvement in European RTD activities. It is one of the working hypothesis of TRIS that the TRIALS format, because of its lightweight footprint and of its direct concern with results and technology transfer, may represent an optimal vehicle for the involvement of these countries and potentially a bridgehead for a low-risk inclusion of these areas into the 6th FP.

Within the EU, TRIS aims at providing a contribution to the core topic of transferring the results of RTD into the mainstream of territorial activities funded under the Structural Funds in Objective Areas (over EUR 120,000 million targeted at sustainable territory development and at fostering employment). The exploitation of cultural heritage and the interrelationships of culture and tourism are considered to be key to developing a sustainable culture economy that may represent, at least in a large number of areas, one of the most powerful engines for local growth and for the promotion of local identities.

A major role in this process will be played by the capability of European research to shape and to qualify structural activities, by providing models, standards, plans and best practices that can be quickly and effectively deployed throughout the Objective areas. TRIS will provide hints, contacts and active support to promote and to follow-up this process, helping consortia and local policy makers leverage the convergence between the results of the projects and the operational programmes of selected regions.

TRIS is currently planning the first major TRIALs event during the EVA Florence conference (25-29.3.2002). For details about this and more detailed information and news about the TRIALS and TRIS visit the Web site [1].

References

  1. TRIS
    URL: <http://www.trisweb.org> Link to external resource

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Monika Segbert MBE FLA(hon) Dipl.Bibl.
Project Management and Consultancy
Via Fondiglie 5-7
60030 Rosora (An)
Italy

tris@monikasegbert.com Link to an email address
<http://www.monikasegbert.com> Link to external resource

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

For citation purposes:
Segbert, M. "The TRIAL projects and their accompanying measure TRIS", Cultivate Interactive, issue 6, 11 February 2002
URL: <http://www.cultivate-int.org/issue6/tris/>

-------------------------------------------------------------

At the Event:

Cultivate-Russia Kick off Meeting, 14th - 16th January 2002

By Marieke Napier - February 2002

Marieke Napier reports on the Cultivate-Russia Kick off Meeting, held in Moscow in January.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

In January I traveled to Moscow to participate in the Cultivate-Russia [1] kick off meetings. Getting Cultivate Russia off the ground had taken over 18 months of hard work by Monika Segbert and David Fuegi and I think the enthusiasm of everyone present was evident.

All the meetings and workshops were held in the Darwin Museum in central Moscow. Employees of the Russian Cultural Heritage Network (RCHN) are based at the museum and it was a great venue for the launch. We actually got a tour round the place on the Tuesday and I can honestly say I've never seen so many stuffed animals in such a small place!

Red SquareThe main participants at the meeting were the British Council team from Moscow, the British Council is the principal contractor and will be responsible for the administrative/financial co-ordination function, representatives from the five Russian partners, the European technical partners and members of the organising group.

Monday was just a general kick off meeting mainly for the partners and some representatives of regional museums, libraries and archives. Links with Cultivate-CEE and Cultivate-CEE were also discussed. Having Russia participate in an EC funded IST project is a new experience and during the 3 days it became clear that there are going to be lots of financial, cultural and political differences to the usual set up, which will hopefully make Cultivate-Russia a really interesting project.

On Tuesday Walter Koch of CSC-Cultural Service Centre Austria talked about the obligations of a National Node in this type of networking project. He also demonstrated his Document server. In the afternoon Jorunn Hesjedal and Sigrid Tollefsen from RBT in Norway talked about the policy document they have been working on and gave some pointers to useful EC/IST Web sites. There was also a presentation from Kirill Nasedkin of RCHN on the www.museum.ru [2] Web site. Two articles have previously been published in Cultivate Interactive on RCHN but their Web site has been substantially updated recently and is worth revisiting [3]. The RCHN attempts to bring together information professionals and promote culture by the use of new technologies. The Web site gives information on over 3000 Russian museums, provides a database of 6000 museum professionals and hosts an impressive, highly active discussion group. All of us European 'experts' were all very impressed, I know we would be very lucky to have such a resource in the UK. Throughout the day all presentations were translated into Russian or English by an excellent translator who was not only fluent in both languages, but also had a fantastic memory and did a great job on some very long speeches.

On Wednesday I started the morning with a presentation and demonstration of Cultivate Interactive. We (the technical partners from the Cultivate-CEE and Cultivate-EU projects) had been asked over to Russia to explain and demonstrate the work we have carried out so far because Cultivate-Russia is intending to do its own technical work. This means that they will be creating a Russian version of Cultivate Interactive in Cyrillic, it will probably be managed by Olga Puchnina (RCHN). We will be providing more information on the Russian magazine in future issues of Cultivate Interactive. There were lots of questions on all aspects of the Web magazine work and there are definite plans to translate some of our articles into Russian. In the afternoon Martin Belcher and Paul Smith from ILRT, based at the University of Bristol, showed the main Cultivate site.

At the end of the 3 days the Russians gave us all a hearty send off, made us all promise to come back and surprised us with presents of Cognac and wine.

Dinning at o'Pirosmani Restaurant
Dinning at O'Pirosmani Georgian Restaurant

Moscow city was beautiful, I got a chance to visit all round red square and managed to see Swan Lake at the Bolshoi Theatre. The food was very good, and I particularly enjoyed the Georgian meal at O'Pirosmani, a Georgian restaurant opposite the Novodievitchi monastery; where we even had our own live musical accompaniment. The weather was a lot warmer that we'd all expected at around 0 degrees and Aeroflot got me there and back safely!

I would just like to thank all the members of Cultivate-Russia for making all of our trips to Moscow so enjoyable. Good luck with the project!

References

  1. Russia Joins the Cultivate Family, David Fuegi, Cultivate Interactive, issue 6, 11 February 2002
    URL: <http://www.cultivate-int.org/issue6/russia/>
  2. Russian Museums Online
    URL: <http://www.www.museum.ru/> Link to external resource
  3. Cultural Heritage Networking in Russia: Permanently Upcoming Perspectives by Dmitriy Luchkin, Cultivate Interactive, issue 2, 16 October 2000
    URL: <http://www.cultivate-int.org/issue2/russian/>
    Developing Russian Museums Online by Dmitriy Luchkin, Cultivate Interactive, issue 3, 29 January 2001
    URL: <http://www.cultivate-int.org/issue3/russian/>

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Author Details

Marieke NapierMarieke Napier
Information Officer
UKOLN
University of Bath
Bath
England
BA2 7AY

m.napier@ukoln.ac.uk Link to an email address
<http://www.ukoln.ac.uk> Link to external resource

Marieke Napier is editor of Cultivate Interactive Web magazine.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

For citation purposes:
Napier, M. "Cultivate-Russia Kick off Meeting, 14th - 16th January 2002", Cultivate Interactive, issue 6, 11 February 2002
URL: <http://www.cultivate-int.org/issue6/moscow/>

-------------------------------------------------------------

Praxis

-------------------------------------------------------------

Content-Based Multimedia Information Handling: Should we Stick to Metadata?

By Paul Lewis, David Dupplaw and Kirk Martinez - February 2002

Paul Lewis, David Dupplaw and Kirk Martinez discuss retrieval and navigation as ways of accessing multimedia information and the use of content as an aid to these activities. They ask whether content-based techniques are really making a useful contribution or whether we should restrict ourselves to the use of metadata.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Introduction

First some definitions. Multimedia information is digital information which may be visual data, images or video for example; or it may be sound data, music or speech; and it may now include 3D visualisations or mixed reality experiences. Finally it will almost certainly include the medium with which we are most familiar, that is text.

In this article we use the word “document” to refer to a multimedia object. It may be a collection of text, an article or a book, it may be an image or a video, or a frame of a video, it may be a mixture of these, in fact it may be any type of basic digital information object.

The issues we are going to discuss apply to text as much as to images or other media and it will be useful to establish the ideas by talking first about text on its own. Let us begin by distinguishing between retrieval and navigation. Retrieval is the business of extracting a document from a collection in order to satisfy some query. The query may take a variety of forms. For example, we may require documents by a particular author, or about a particular subject. This sort of retrieval has traditionally been achieved by using indexed metadata that is stored with the document. Key terms in the metadata may give a controlled vocabulary to aid the retrieval.

Content-based retrieval of text is retrieval that uses the text of the document rather than any added metadata. Free text searching is a good example of content-based text retrieval. The words making up the content of the document are indexed and used as the basis for retrieval, sometimes in conjunction with quite sophisticated “intelligent” software used to satisfy the query. Search engines like Google and AltaVista offer content-based text retrieval on the Web.

By contrast, navigation is the process of moving from one document in the information collection to another because there is some useful association between them, and this is typically achieved by following pre-authored links. On the Web this is achieved by clicking on a highlighted source anchor of a link in one document in order to navigate to the destination document to which it points. Sometimes the distinction between navigation and retrieval is unclear. For example, following links that are stored in a bookmark file under a particular subject heading could be regarded in one sense as indexed retrieval and in another sense as link-based navigation. This is also true when using a search engine to retrieve documents on a particular subject. The documents are presented initially as links to be followed. In both these examples we will regard the process as retrieval rather than navigation, as the aim is to retrieve rather than to follow an association between documents.

On the Web, navigation is mainly based on fixed links that are embedded in the documents themselves. However, it is possible for hypermedia navigation to be content-based. By this we mean that the links offered are determined at link following time and selected on the basis of the content of the chosen source anchor. Link authoring for content-based navigation involves making an association between some chosen source anchor and the address of a destination document. The link information may be stored in a separate location from the document, typically a linkbase holding source anchors and destination addresses. With this content-based approach to navigation, multiple links may be made available for a given source anchor, previously authored links may be added to new documents on the fly with minimal effort and different viewers may see different link sets depending on the linkbases which are active at the time [1].

In both content-based retrieval and content-based navigation for text, the process depends on matching content. In the case of retrieval, the textual content of the query is matched with text forming the content of the document, typically indexed in some way to accelerate the retrieval process. In content-based navigation, the query (which is typically a portion of text selected from the content of the document) is matched with the text making up the source anchors of links in the linkbase.

For text, these processes of content-based retrieval and navigation are sufficiently well established and widely used for us to conclude with some conviction that content-based retrieval and navigation are worthwhile and effective approaches for text information handling. Of course metadata based searches with text are also widely used and the two approaches can complement each other well. The content matching, on which text content-based processes depend, are in many cases straightforward exact matches between words, although statistical matches between word sets, term switching or query expansion via thesauri, word stemming and other textual tricks can greatly enhance the processes to provide more powerful retrieval and navigation facilities.

Now let us turn our attention to content based retrieval and navigation with non-text media. We will use images as our example although many of the comments will apply equally to other non-text media. Can we say with the same conviction as we did for text that content-based image retrieval and navigation are worthwhile and effective approaches for image information handling? Well, in short, the answer is “No, certainly not with the same conviction”. But there are circumstances where content-based retrieval and content-based navigation may be worthwhile particularly in conjunction with metadata-based techniques. And in the longer term, as research into media processing offers up more powerful approaches, the value of content-based techniques should increase.

In the following sections we look more closely at content-based image retrieval and navigation techniques, examine why they are currently less powerful than for text and examine specific efforts to make them more effective.

Content-based Image Retrieval

The basic reason why image retrieval is more difficult than text retrieval is that the digital representation for most images is as a collection of pixels. The only information which is explicit in such a representation is the colour values at each pixel point. Although, when we look at images we, as humans, are able to interpret them automatically and see meaningful regions of colour, recognise objects and identify scenes which can usefully form the basis of effective image matching processes, we are performing substantial and sophisticated information processing which relies on a large volume of prior knowledge for its success. To achieve effective content based image retrieval (CBIR) software systems must achieve some of this extraction and interpretation in order to find something meaningful to form the basis of the content matching. By contrast, in text documents, the words themselves are explicit in the digital document and it is these that form the basis of the matching process. Hence for text retrieval in its basic form, little additional processing is required.

There have been some excellent recent reviews of content-based image retrieval [2], [3] and the reader is encouraged to look at these for further details. Querying in CBIR can take many forms but the most common is probably the query by example paradigm where the user provides a query image and asks for images from the collection that are similar to it in some way. An alternative might be to ask explicitly for images containing some particular object using a text interface to provide a description of the required image. Such an approach requires that the CBIR system can perform object recognition or scene analysis in order to find the required image and at present this is only possible in specific highly constrained application domains.

General approaches to CBIR attempt to find representations of the image which make more information explicit than simply the pixel colour values. Unsurprisingly, many of the approaches have been based on colour. The colour histogram [4] has been a simple and popular representation which captures the relative amounts of each colour in an image. But it is a global measure and does not give information about colour variations at local positions in the image. Nevertheless it provides a useful measure of some aspects of similarity between images and has been widely used in CBIR systems.

To overcome the global nature of the colour histogram the image has sometimes been divided into patches and the colour histogram calculated for each patch. This allows images to be retrieved from a collection when the query image is only similar to a sub-section of an image in the collection. This is taken further when the images are decomposed into patches hierarchically at decreasing resolutions.

A representation which also tries to capture some local colour information is the colour coherence vector representation [5] which counts separately pixels which belong to large (coherent) regions of the same colour and those which do not. We have developed an approach to sub-image matching which uses a pyramid of colour coherence vectors and which can locate details of high resolution art images in large collections of such images [6]. An example of a sub-image query is shown in figure 1 and the resulting match with the located sub-image is shown in figure 2.

Figure 1
Figure 1
Figure 2
Figure 2

A representation which captures information about colour boundaries within the image has been proposed by Matas et al [7] in an approach they call the multi-modal neighbourhood signature representation. This approach has the added benefit that sub images can be matched directly without the need for a pyramid decomposition.

Colour is not the only basis for representations in CBIR. Texture, which in image processing refers to a measure of repeating patterns in an image, has also provided a useful basis for representations. Again the representations tend to be global and only appear useful for some particular image types where repeating patterns are a central characteristic.

For the ultimate CBIR system what we need is perfect image understanding software. We need to bridge the so-called semantic gap, even to be able to address queries like “Find me images in this collection containing a building”. A simple query by example would be inadequate for satisfying this simple query without a substantial knowledge of the variety of ways in which buildings may appear in images. An even greater challenge comes from queries like “Find me images in this collection which depict acts of kindness” It is worth noting that the semantic gap also exists for text. The gap is not as wide but until we have perfect natural language understanding software it will continue to exist at some level.

For CBIR, a starting point would be to represent explicitly any objects in the image. Shape is an important cue to object recognition and many attempts to use shape in CBIR systems have been reported, even in the early systems like QBIC from IBM [8]. The big problem with this is knowing what constitutes an object. It is possible to segment images into regions and represent the shapes of the regions but the software needs to be trained to match or recognise particular object shapes which will typically be composed of several regions from a segmentation of the image. Some approaches to this have been reported in particular domains but general purpose CBIR systems using objects as intermediate representations are still uncommon. A rather simple example of shape finding comes from the Artiste project [9], a European project to develop a distributed art retrieval, navigation and analysis system. It includes a facility to detect images of paintings in frames of a particular shape. Most frames are rectangular but some are circular, some are triptych etc. A border finder locates the boundary of the frame in the image and a neural net classifier has been trained to use the border to deliver the frame type.

Bridging the Semantic Gap

The search for approaches to the extraction of higher level representations from images is an active area of research. Associating features extracted from images with semantic concepts has been reported [10] and in Southampton we have developed the idea of a multimedia thesaurus in our MAVIS 2 multimedia information system as an attempt to bridge the semantic gap [11].

In a traditional thesaurus, different textual representations of the same concept are associated with one another. In the multimedia thesaurus (MMT), different multimedia representations of the same concept are associated with that concept. The MMT is a multi-layer data structure used for storing the multimedia information in the system. At the highest level there is a semantic layer which records concepts and the relationships between them in the application domain. At the next level down in the simplest form of the architecture are selections from media which in some way represent the concept. For example if the concept is a vase, an image selection containing a vase is a visual representation of the concept. Associated with the image selection are the extracted signatures, for example, giving shape, texture and colour information about the vase. Also at this level we may have textual representations of the vase, so the word vase may be stored and associated with the concept vase in the semantic layer. Textual synonyms such as “amphora” may also be stored in the second level as may other visual representations or sound clips of the word vase being spoken. At the lowest level we have the raw media from which the representations have been selected and by keeping pointers to the raw media for the selections rather than a duplicate we can minimise the storage requirements.

This structure provides some valuable additional functionality in the multimedia system, For example, if query by example is being used for content based image retrieval, and the query can be matched with a representation in the MMT, the system may be able to identify the concept forming the basis of the query and from that it may find alternative representations of the concept which may enable it to retrieve images which would otherwise have been missed. Similarly, if content-based navigation is being used and a link has been authored on one view of an object, it may be possible to follow the link using a different view as the source anchor if both views are associated with the same concept in the MMT. It is also possible to follow a link authored on the text representation of a concept from an image representation of the same concept if the user so wishes.

One of the problems with this approach is the building of the associations in the MMT between representations and the concepts they represent. Clearly a manual approach is possible but is time consuming in the extreme. In a prototype application [12], brief text descriptions associated with the concepts in the semantic layer were available and some of the images had sufficient metadata associated with them to allow the use of latent semantic indexing [13] to estimate the similarity between the concept description and the metadata description of the image. This facilitated automatic creation of some of the associations and others could be made by pattern matching between the images themselves. Images were then automatically associated with the concepts with which similar images were associated. Although not a fully automatic approach it enabled us to recognise this as a way of accelerating MMT building in particular application domains.

As the MMT evolves, it should be clear that a larger and larger number of representations associated with concept classes in the semantic layer will be available. To make an association between a query selection and a concept may take some considerable computation time as the representations extracted from the query are compared with representations in the MMT. However, at some stage in the evolution of the MMT it may be possible to develop a classifier which could allocate new representations to concepts more quickly than via brute force matching. We have made some preliminary investigations into the use of intelligent autonomous processes or agents for monitoring the MMT and clustering and classifying representations as their numbers become suitably large [14]. Existing associations in the MMT are used as the basis for learning by the classifiers.

Conclusion

Although we, and others, have made tentative steps towards bridging the semantic gap in multimedia information handling, particularly in the area of content and concept based retrieval and navigation, many problems remain. One of the key difficulties is that the signatures or representations that we are working with are crude and little prior knowledge is being utilised. Until more powerful image understanding techniques can be developed and incorporated into the image processing functions we will be severely handicapped in our efforts. This is even more true for other non-text media. But even for text, it is clear that retrieval and navigation will benefit from enhanced text understanding facilities.

Another difficulty is the computational problem associated with content based media retrieval. Many of the representations are multidimensional feature vectors of high dimensionality and there are serious problems with indexing such features for rapid retrieval. Although novel indexing strategies have been published many of them collapse at very high dimensionality. Finally, it is worth mentioning that human-computer interface problems are also associated with multimedia information handling. For example, given a query image which contains a complex scene and wishing to use one of the objects in the scene as the query object, how do you indicate to the computer the limits of the object required? Interactive segmentation is a possibility but it is slow and inelegant compared with human capabilities for reasoning over images.

In spite of these continuing difficulties, significant strides have been made in recent years in the area of content-based retrieval and navigation and although metadata will continue to be an essential aid, the increasing value of content-based retrieval and content-based navigation should not be overlooked, particularly in constrained application domains and when metadata is sparse.

Acknowledgements

The authors are grateful to the European Commission for their support through grant IST-1999-11978 and to their collaborators (C2RMF(F), NCR(Dk), Giunti Interactive Labs (I), Uffizi Gallery (I), IT Innovation Centre (UK), The National Gallery (UK), The Victoria and Albert Museum(UK)) on the ARTISTE project for image data and useful conversations.

References

  1. Les A. Carr, David C. DeRoure, Hugh C. Davis and Wendy Hall (1998) Implementing an Open Link Service for the World Wide Web. World Wide Web Journal, 1, 1998.
  2. A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta and R. Jain, (2000) Content-Based Image Retrieval at the end of the Early Years, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22 no 12 1349—1380, 2000
  3. J. Eakins and M. Graham. Content based image retrieval. Technical Report 39, U.K. JISC Technology Application Programme, Oct. 1999.
    URL: <http://www.jtap.ac.uk/> Link to external resource
  4. M. J. Swain and D. H. Ballard. Color Indexing. International Journal of Computer Vision, 7(1):11-32, 1991.
  5. Greg Pass, Ramin Zabih, and Justin Miller. Comparing Images Using Color Coherence Vectors. MultiMedia, pages 65-73. ACM, 1996.
  6. Stephen Chan, Kirk Martinez, Paul Lewis, C. Lahanier and J. Stevenson (2001) Handling Sub-Image Queries in Content-Based Retrieval of High Resolution Art Images. International Cultural Heritage Informatics Meeting p.157-163.
  7. J. Matas, D. Koubaroulis, and J. Kittler. Colour Image Retrieval and Object Recognition Using Multimodal Neighbourhood Signature. In D. Vernon, editor, Proceedings of the European Conference on Computer Vision, LNCS volume 1842, pages 48-64, Berlin, German, June 2000. Springer.
  8. M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker. Query by image and video content: The QBIC system IEEE Computer, 28(9):23-32, Sept. 1995.
  9. The Artiste Project Home Page
    URL: <http://www.artisteweb.org/> Link to external resource
  10. Carlo Columbo, Alberto Del Bimbo, Pietro Pala, Semantics in Visual Information Retrieval IEEE Multimedia, 38-53 July 1999.
  11. M. Dobie, R. Tansley, D. Joyce, M. Weal, P. Lewis, and W. Hall. A flexible architecture for content and concept based multimedia information exploration. In Proceedings of the Challenge of Image Retrieval (CIR'99), pages 1-12, Newcastle, UK, Feb. 1999.
  12. Robert Tansley, Colin Bird, Wendy Hall, Paul Lewis and Mark Weal (2000) Automating the Linking of Content and Concept. Proceedings ACM Multimedia 2000 p.445-448.
  13. T. K. Landauer, P. W. Foltz, and D. Laham. An introduction to latent semantic analysis.Discourse Processes, 25:259-284, 1998.
  14. Dan W. Joyce, Paul H. Lewis, Robert H. Tansley, Mark R. Dobie and Wendy Hall (2000) Semiotics and Agents for Integrating and Navigating Through Media Representations of Concepts. Storage and Retrieval for Media Databases 2000 p.120-31.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Author Details

Paul Lewis
Department of Electronics and Computer Science
University of Southampton
Southampton
SO17 1BJ

Phone: +44 2380 593715

phl@ecs.soton.ac.uk Link to an email address
<http://www.ecs.soton.ac.uk/~phl> Link to external resource

Paul Lewis is a Senior Lecturer in the Intelligence, Agents and Multimedia Research Group in the Department of Electronics and Computer Science in the University of Southampton. His research interests are in image and video analysis and their applications to multimedia information handling. He has been an investigator on numerous EPSRC and EU grants most recently working on the development of content and concept based retrieval and navigation tools in multimedia environments

Kirk Martinez is a lecturer in the Intelligence, Agents, Multimedia Research Group in the Department of Electronics and Computer Science in the University of Southampton. He has a BSc in Physics from the University of Reading and a PhD in image processing from the University of Essex. When he was Arts Computing lecturer in The University of London he developed image processing applications and imaging for art. His current research is content-based retrieval and museum applications of augmented reality.

David Dupplaw is a research assistant in the Department of Electronics and Computer Science in the University of Southampton working on the Artiste European project. He graduated in Computer Science from the University of Southampton and he is nearing the completion of a PhD on image representations for content-based applications.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

For citation purposes:
Lewis, P., Dupplaw, D., and Martinez, K. "Content-Based Multimedia Information Handling: Should we Stick to Metadata?", Cultivate Interactive, issue 6, 11 February 2002
URL: <http://www.cultivate-int.org/issue6/retrieval/>

-------------------------------------------------------------

Metadata

-------------------------------------------------------------

The Historical Data Warehouse

By Frans Smit - February 2002

Frans Smit reports on adapting concepts from Information and Knowledge Management (IKM) and Information and Communication Technology (ICT) into the field of organizing and giving access to metadata about historical archives and collections [1].

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

In recent decades a dramatic change has occurred in the possibilities to provide access to archives, libraries and museums. This change is partly technologically driven and partly driven by the demands of society to provide for a greater transparency and accountability of governments. In various traditional fields of cultural heritage much effort is being made to provide integrated and instant access to information about their holdings.

This article proposes the creation of a generic model for storing data and delivering information. In this model concepts like metadata and Data Warehousing are used in an integrated way. All these seemingly new concepts are really just a new perspective on what archivists and librarians have been doing for a long time. This new perspective is necessary however in order for cultural heritage professionals to cope with demands and challenges presented by ICT [2]. The technological implementation of this model requires a sound vision on which ICT-architecture is necessary. In this ICT-architecture some important issues must be defined, e.g. which databases must be used, how content should be managed and which role standards play.

The broad objective is to create well organised content of high quality in order to fulfil the needs of the public and for an organisation to enhance its surplus value in society. These issues must create the foundation for a truly effective and integrated service to the public. However the true foundation lies within the knowledge and the consciousness that is present in an organisation.

A lot of the notions and concepts in this article still need to be worked through, though as a starting point it is hopefully a contribution to the improvement of quality and quantity of services from cultural heritage institutes to the public.

An Information and Knowledge Management Model

A key issue for institutes like archives, museums and libraries is to define and to create the surplus value of a cultural heritage institute in society. This surplus value often consists primarily of giving accurate and accessible information to the public about the subjects that are the core business of the institute. This information is based upon the material that is kept in the institute, in combination with the data and knowledge about that material. In every organisation Information and Knowledge Management (IKM) is a main issue for managers. In cultural heritage institutes a good IKM is essential.

The scope of IKM includes data, information and knowledge. IKM is not a goal in itself, it is merely a necessary subject in managing an institute in order to produce accurate services to the public. As for other important issues in an organisation, like financial management and staffing policy, it is important for organisations to start off IKM with a model-like approach. This IKM-model may differ in each organisation. A useful model that benefits from notions from other branches may look like this.

diagram 1
Diagram 1

The basic notion of this model is that the surplus value of the organisation consists of extracting material out of society, filtering it and enhancing it with additional information. The result of this is that the users (whoever they may be, that is not upon the organisation to determine) get better information about the subjects that the organisation is concerned with. Due to the demand of the public, services may be added, changed or removed. Every layer in this model is connected with others and is able to influence them in a direct or indirect manner.

The model has three main entities: users, organisation and society. All entities are dynamic. The organisation should take care that it does not stand between society and the users as a barrier. Its reason to exist lies in the precondition that it must give back more to the users than that it extracts from society. To realise this, every defined layer in the organisation should be designed and executed in the context of an IKM-model. Let us take a look at the organisation layers depicted above (especially focusing on the layers of Metadata and the Historical Data Warehouse).

First and foremost the products and services of the organisation show its surplus value to the public. These products may differ considerably and may include publications, exhibitions, catalogues, merchandising, and giving access to original archival material (or reproductions of that material) in a reading room. Of course a Web site should be counted among these products, creating a digital platform for a variety of services to the public. Adding, changing and removing products and services will always cause changes in the other layers in the organisation.

All the products and services are created and maintained by the work processes in the organisation. These work processes cover the whole range of activities that give added value to products created by the organisation. This range of activities is always inter-connected and inter-related to each other and to the other layers. Traditional work processes in institutes for cultural heritage are appraisal, management, description and services to the public. If you add a new service to the public, for example creating a virtual exhibit on the Internet, it will definitely have an impact on service to the public and the metadata needed to support this function. However, the process of management and cataloguing may be influenced as well. Even the appraisal of material may be affected if there is not enough original material present to create the new service.

Creating products and services to the users always require an accurate set of metadata. There is much confusion about the definition of metadata among various disciplines. In my view the concept of metadata comprises structured data about the original material that is kept in the institute (e.g. surveys, catalogues, inventories and genealogical indexes), unstructured data about that material (e.g. publications, guidelines etc.) as well as the knowledge the people in the organisation have about that material [3]. This last category may be called mobile metadata, with all the risks mobility entails (e.g. sick leaves, retirements, vacations etc.).

The accuracy of the metadata is determined by the way in which staff prepare the information that support a particular work process. The more metadata is structured and standardised, and the more they are kept in well-structured and connected databases, the better they can support the various work processes. In order to create and maintain a sound set of metadata it is necessary to have an active policy of IKM. IKM should provide for the availability for as much data as possible in a way that benefit all work processes that may need the data.

It is important to make a distinction between metadata that can be structured and metadata that can not be structured. In order to make this clear the following cycle of knowledge is a useful tool [4].

diagram 2
Diagram 2

In this diagram the data can be viewed as metadata in the most structured way. The data are considered “facts” and are described as such. Modern ICT-concepts and technologies are most helpful to maintain the data in such a way that it can be used in every way possible. Information may be viewed as the result of a combination of data for a specific purpose. Accurate information enhances knowledge. This knowledge improves the understanding of a particular subject. This understanding results in more data or can lead to changes in existing data.

In practice it is impossible to empty everyone’s head about every subject in order to make a complete dataset of every relevant subject. The reason is that the human mind can connect and combine data and information in a way that no information system is presently capable of. It is a challenge for IKM to lay down as much information and data that is present in structured forms, so that they are easily transferable and independent of any human intervention.

For these structured forms of metadata it is possible to lay down criteria in such a way that a healthy ICT-architecture can be realised. For every organisation those criteria may differ but the following ones may be universal:

Metadata can be divided into separate categories, depending on their meaning and the work process in which they are created and maintained. Possible categories for cultural heritage organisations include:

Several accepted international standards, like Dublin Core, ISAD(G), ISAAR(CPF) and EAD, cover many or most of these categories. Not all of them however link these categories to aggregate levels and context of the metadata.

For institutes in the cultural field having the task of appraising and keeping historical material, all the products, work processes and metadata have to relate to that material. Preserving this material is on the long term the most important layer in the IKM-model presented above. In order to represent this material into a contemporary IKM-model, I have labeled it as a Historical Data Warehouse. The reason is that in ICT literature the concept of metadata is often linked to Data Warehousing. The archives and collections that should be kept in an organisation responsible for cultural heritage operate in much the same way as a Data Warehouse does in a business enterprise. It is the basis for providing accurate and inalterable information in order to facilitate decision-making and accumulate and enhance knowledge and understanding.

Marco provides the following useful definition of a Data Warehouse: “A data warehouse is a single, entreprise-wide collection of data”. This collection should fulfil the following four preconditions:

  1. A Data Warehouse is subject-oriented;
  2. A Data Warehouse provides an integrated view of an enterprise’s major subject areas;
  3. A Data Warehouse is non-volatile;
  4. A Data Warehouse holds historical views of data [6].

The concept of Data Warehousing was developed in the ICT-world for data that is kept on digital platforms. While there is a big difference with the majority of data that are kept in cultural heritage organisations this difference nevertheless does not make the comparison worthless. The preconditions of a Data Warehouse mentioned above could also prevail for archives and collections preserved on non-digital media, thus bridging a gap in concepts used in two often too separated worlds.

Another advantage to labelling archives and collections as an Historical Data Warehouse for society is that the concept of metadata falls into place. The word metadata is widely used for data that is necessary to maintain a Data Warehouse. Metadata is the data that is created, changed and removed by yourself, the data in a Data Warehouse should never be changed. As an expert on metadata and data warehousing, David Marco describes the concepts of metadata and Data Warehousing in a way that is very similar to that applied to archives and libraries: “Metadata is the card catalog in a data warehouse. By defining the contents of a data warehouse, meta data helps users locate relevant information for analysis. In addition, meta data enables users to trace data from the data warehouse to its operational source (i.e. drill-down) and to related data in other subject areas (i.e., drill-across). By managing the structure of the data over a broad spectrum of time, it provides a context for interpreting the meaning of information” [7].

The comparison of traditional archives and collections with modern Data Warehousing is an interesting way of putting an IKM-model for cultural heritage in the context of the digital age. Archivists were not considered when this concept was developed. This is both surprising and regrettable because ICT-experts could have learned a lot from them about such concepts as authenticity, reliability, readability and the context and creation of data!

Information and Knowledge Management and ICT-architecture

The above described IKM-model is meant to give an overall, broad and consistent perspective on how to handle data and information in an organisation that manages historical data and material. It is like the design of a nervous system of an organisation. The functioning of this nervous system is nowadays determined by an appropriate use of ICT-systems.

In the last decades some general shifts have occurred in the usage of ICT-tools. In almost all fields, public or private, using ICT started off bottom-up, by enthusiast specialists. This phase had a character of experimenting, making mistakes and learning. With the growth of importance of ICT and with the ever-growing possibilities ICT became in a lot of fields a matter of strategic importance with a tendency to design and implement big monolithic systems. With the rise of client-server systems and especially with the rise of the Internet various systems are being used that are interconnected through a corporate concept. This concept is commonly called an ICT-architecture. The pace in which this process has been taking place varies a lot. In the field of cultural heritage institutes the described phases occur in a lot of institutes at the same time.

What is an ICT-architecture? Applegate describes it as follows: “Just as the blueprint of a building’s architecture indicates not only the structure’s design but how everything –from plumbing and heating systems to the flow of traffic within the building- fits and works together, the blueprint of a firm’s information architecture defines the technical computing, information management, and communications platform. The IT Architecture provides an overall picture of the range of technical options available to a firm, and, as such, it also implies the range of business options. Decisions made in building the technical IT architecture must be closely linked to decisions made in designing the IT organisation that will manage the architecture, which, in turn, must be linked to the strategy and organisation design of the firm itself. Conversely, the organisation strategy, structure, incentives, and processes strongly influence how the technology will be designed, deployed, and used within a firm” [8].

When using the IKM-model described above, the consequence is that the usage of ICT in every layer should be accurately designed, implemented and connected to each other. At every layer ICT will play a role. In the layer of products and services one can think of Internet applications, e-commerce and applications to follow the behavior of the public. The work processes may need one or more applications to enter, change and remove metadata. The metadata it self, if digitally kept, will involve the use of databases and text-files. The Historical Data Warehouse may have digital material and digital reproductions (or even substitutes) of original, non-digital material.

Well-designed ICT-architecture must give an accurate answer to the question of how to use the various components of ICT in every layer of the IKM-model. There are various ways in making distinctions between various ICT-components. One of them is to present ICT-tools in a hierarchical order represented below.

diagram 3
Diagram 3

As in the IKM-model every part is interconnected. On every level a choice has to be made for using the appropriate ICT products. The choices may differ according to scope, budget and structure of the organisation.

One of the ways in which it is possible to make those choices, is to make an enterprise-wide information audit and develop an ICT-policy based on the audit. The result of such an audit varies in every organisation. I will describe some conclusions that I have drawn from my daily practice.

It is very important to use state-of-the-art tools regarding data structures. The metadata layer of the IKM-model is the key layer for being able to generate useful information. Connectivity of the data is perhaps the most essential precondition. This precondition can be met by using relational databases. They are the most modern and powerful tools for designing, implementing and securing good data structures and the data itself. The language for manipulating data in a relational database, SQL, is globally accepted and supported. Relational databases are also open, which means that they can always be connected to each other in real time. Of course the logical and technical data structures in the databases should be designed and implemented in a professional way.

Standardization of metadata is essential but it is also a misunderstood topic. To those that will handle international standards regularly it may sound like nothing new, but a lot of institutes have not implemented them yet. One of the odd things about it is that those standards like ISAD(G) are often considered as being totally new, whereas they are mostly an improved version of previous standards. It is not necessary to create a totally new set of metadata, often a conversion from one data structure to another (with eventually a conversion to modern, digital media) is the only thing to be done. Another misunderstanding about standards is about when they become important. Standardization is needed to create a common language structure among organisations to enable the exchange of information. It is not important whether the database you use is entirely structured according to a standard. As long as you are able to generate your data in the desired structure when needed, there is no need to worry. Much more important is that you choose a state-of-the-art database platform so that you are assured that you can anticipate as good as possible on future developments in databases and metadata standards.

One of the fundamental principles in the ICT-architecture is that maintaining the metadata is different from its presentation. Metadata should never be redundant and metadata that belongs together should be stored and maintained together in preferably one database. You can present them however in various ways. You can present a catalogue on the Internet through a search engine, but you can use the same metadata in a printed version and you can deliver them to a Web site where they are merged with metadata from other institutes. A further important notion is that software for presenting metadata tend to change more often than software for maintaining metadata. An ICT-architecture for maintaining and presenting metadata, that takes these aspects into account and can act as a frame of reference for developing good and efficient software, may look like this.

diagram 4
Diagram 4

Closing remarks: what is the use of IKM in practice?

The described models and concepts are useless when the implementation does not improve the performance of an organisation. And the proof of this improvement can only lie in the improvement of services to the users. Models like this can contribute to the definition of new projects and their impact on the organisation. They can also show the connection between various projects. It is a way of showing all departments in an organisation what the consequences are for all work processes if for example you decide to make a virtual exhibit on the Internet or to create an online search engine through metadata about your archives and collections. A model may help to reduce the risk of modifying work processes or metadata structures in such a way that they cannot be used anymore by other work processes. It also helps to prevent the introduction of new ICT-systems that will prevent the organisation from reaching goals like integration, standardization and presentation.

An IKM-model and an ICT-architecture provide a very effective framework for managing work processes and projects. Every work process and every project can be linked to each other. It is possible to make up a checklist for new projects to make sure that the results of the project fit into the preconditions of the organisation concerning IKM and ICT.

The most important assumption is that the management of the organisation should be aware of these instruments of improving services to the public. This awareness enables at the same time that all the work processes work together in such a way that these services can be created and maintained. It is up to management to create a policy for mid-term and long-term goals and objectives in such a way that projects can be started off in a way that is consistent with that vision.

To conclude this article, an example from the Municipal Archives of Amsterdam may be illustrative of an attempt to create new services to the public. The long-term aim is to create one single portal on the Internet to all descriptions of content and context of the archives and collections. This portal should be designed in such a way that the questions that are mostly asked by the users (questions about a person or organisation, about a subject, a location or a period) are answered in the simplest way possible. This is a huge task that will involve all work processes.

The starting point is very different from the situation that should be created. This is a fate that a lot of institutes now share. A lot of metadata is incomplete, not standardised, not kept in modern databases and not linked to each other. The Municipal Archives of Amsterdam not only hold archives but also vast collections of library material, audiovisual material, photos, drawings and maps. It is a huge task to integrate all the metadata related to this material, using appropriate standards. The work processes are not all structured in order to create integrated services to the public. In order to assemble metadata from different sources a series of projects were defined. The first step was to create a complete set of accurate metadata for the highest aggregate level based on a survey of various access tools (finding aids). This survey is the first result of this [9]. Other search engines on the Web site are not yet linked to this survey. The next step is to present a set of metadata on lower aggregation levels for the archives. The material at present was not stored in a database. In order to realise this precondition a massive data-entry project was initiated. The result of this project is a database with 350.000 records. In order to present the metadata as an integrated service to the public it is necessary to enhance create and present indexes on persons, organisations, subjects, locations and time periods. This project has a twin brother in the back office, where existing ICT systems must be altered or replaced in order to create the necessary metadata in a standardised way.

In the next few years the intention is to integrate the metadata about context and content of all archives and collections into this model. The consequences are that almost all existing ICT-systems must be reconsidered, strategic decisions must be made about the choice of standards for metadata, a lot of conversion or data-entry of existing data must take place, quality controls and different ways of working should be implemented. This cannot succeed without defining an appropriate IKM-model and an ICT-architecture.

An Dutch version of this article is also available.

References

  1. My special thanks go to Kent Haworth, York University Archivist and Head, Special Collections and Project Director and Secretary, ICA Committee on Descriptive Standards, for his comments on an earlier version of this article.
  2. In most literature the abbreviation “IT” is used, I prefer to use the more modern abbreviation “ICT” - Information and Communication Technology.
  3. Compare for example Marco, p. 5
  4. Wurman, p. 27 and Milner, p. 3. I have replaced the word “wisdom” in this model with the more modest word “understanding”.
  5. Compare for example Gilliland-Swetland, Anne J, Defining metadata, in Baca, p. 3;
  6. Marco, p. 23-24. Marco cites Inmon, W.H.: Building the Data Warehouse, Wiley, 1996, p. 33;
  7. Marco, p. 48
  8. Applegate c.s., p. 139-140.
  9. Survey
    URL: <http://www.gemeentearchief.amsterdam.nl/archieven_en_collecties/overzicht/introductie/index.nl.html> Link to external resource

Literature and suggestions for further reading

  1. Applegate, Lynda, F. Warren MacFarlan en James L. MacKenney (1999). Corporate Systems Information Management. Irwin MacGraw-Hill, Boston.
  2. Baca, Martha and others. (1998) Introduction to metadata, pathways to digital information. Getty Information Institute, New York.
  3. Cook, Terry (2001). Archival Science and Postmodernism: New formulations for Old Concepts, in Archival Science (2001-1), ed. Horsman, P., E. Ketelaar and T. Thomassen, Kluwer Academic Publishers, Dordrecht, p. 3-24.
  4. Getty Information Institute, New York, Art and Architecture Thesaurus
    URL: <http://www.getty.edu/research/tools/vocabulary/aat> Link to external resource
  5. International Council of Archives (ICA), ISAAR(CPF) standard
    URL: <http://www.ica.org> Link to external resource
  6. International Council of Archives (ICA), ISAD(G) standard
    URL: <http://www.ica.org> Link to external resource
  7. Marco, David (2000). Building and managing the meta data repository, a full life-cycleguide. John Wiley & Sons, New York.
  8. Menne-Haritz, A. (2001). Access: the reformulation of an archival paradigma, in Archival Science (2001-1), ed. Horsman, P., E. Ketelaar and T. Thomassen, Kluwer Academic Publishers, Dordrecht, p. 57-82.
  9. Milner, Eileen M. (2000). Managing Information and Knowledge in the Public Sector. Routledge, London.
  10. Records Continuum Research Group, Australia.
    URL: <http://rcrg.dstc.edu.au> Link to external resource
  11. Ribeiro, Christina and Gabriel David (2001). A Metadata Model for Multimedia Databases.
  12. Smit, F.P. (2000). Proposal for a Datamodel of Archival Descriptions, in: Atti del Summit DACE, Roma, 2000, p. 149-196.
  13. Smit, F.P. (2001), Het nieuwe Overzicht van Archieven en Collecties, in: Archievenblad (2001-1), Koninklijke Vereniging van Archivarissen, Amsterdam, p. 26-29.
  14. Society of American Archivists, Encoded Archival Description
    URL: <http://www.loc.gov/ead> Link to external resource
  15. Svenonius, Elaine (2001). The Intellectual Foundation of Information organisation. The MIT Press, Cambridge Massachusetts.
  16. Wurman, Richard Saul (2001). Information Anxiety 2. QUE, Indianapolis.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Author Details

Frans SmitFrans Smit
Manager of the section of archival descriptions and cataloguing
Municipal Archives of Amsterdam
PO Box 51140
1007 EC Amsterdam
The Netherlands

Phone: ++31 20 5720227
Fax: +31 20 6750596

<fsmit@gaaweb.nl> Link to an email address
<franssmit@planet.nl> Link to an email address
<http://www.gemeentearchief.amsterdam.nl> Link to external resource

Frans Smit is Head of the Section of Archival Descriptions and Cataloguing at the Municipal Archives of Amsterdam. He is (and has been) also engaged in various national and international projects concerning providing access to metadata about archives and collections through search engines on the web.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

For citation purposes:
Smit, F. "The Historical Data Warehouse", Cultivate Interactive, issue 6, 11 February 2002
URL: <http://www.cultivate-int.org/issue6/warehouse/>

-------------------------------------------------------------

Het Historisch Data Warehouse

By Frans Smit - February 2002

Frans Smit past concepten toe die afkomstig zijn uit Informatie- en Kennismanagement (IKM) en van Informatie- en Communicatietechnologie (ICT) ten bate van het organiseren en toegankelijk maken van metadata over historische archieven en collecties [1].

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Inleiding

In de afgelopen decennia is een fundamentele verandering opgetreden in de mogelijkheden om de toegankelijkheid tot archieven, musea en bibliotheken te vergroten. Deze verandering komt deels voort uit de technologische vooruitgang maar voor een even belangrijk deel uit de toenemende maatschappelijke behoefte aan een transparante overheid. Veel instellingen op het gebied van cultureel erfgoed zijn bezig om een meer geïntegreerde en snellere toegang te creëren tot informatie over het historisch materiaal dat bij hen bewaard wordt.

In dit artikel wordt een generiek model voorgesteld om gegevens te bewaren en informatie te verschaffen. In dat model worden concepten als metadata en Data Warehousing op een geïntegreerde wijze toegepast. Deze schijnbaar nieuwe concepten vormen in werkelijkheid een nieuw perspectief op wat archivarissen en bibliothecarissen al sinds lange tijd doen. Dit nieuwe perspectief is echter wel nodig voor professionals op het gebied van cultureel erfgoed om de problemen en uitdagingen van het ICT-tijdperk [2] het hoofd te kunnen bieden. De technische implementatie van het model vereist een degelijke visie op de benodigde ICT-architectuur. In deze architectuur moeten belangrijke aspecten worden vastgesteld, bijvoorbeeld welke databases moeten worden gebruikt, hoe de gegevens moeten worden beheerd en welke rol internationale standaarden in deze hebben.

Het doel hiervan is het creëren en in stand houden van goede en goed georganiseerde gegevens. Deze gegevens kunnen voorzien in de informatiebehoefte van het publiek waardoor de maatschappelijke meerwaarde van de organisatie wordt gewaarborgd. Het achterliggende fundament voor die gegevens bestaat altijd uit de aanwezige kennis over het aanwezige historische materiaal.

De concepten die in dit artikel worden beschreven, dienen nog in detail uitgewerkt te worden. Ze zijn als uitgangspunt voor de verbetering van de kwaliteit en kwantiteit van dienstverlening door instellingen op het gebied van cultureel erfgoed aan het publiek naar mijn mening echter nu al uitstekend bruikbaar.

Een Informatie- en Kennismanagement Model

Voor instituten als archieven, musea en bibliotheken is het van essentieel belang om te definiëren wat de maatschappelijke meerwaarde is van het instituut, om vervolgens die meerwaarde te kunnen creëren. Deze meerwaarde bestaat meestal uit het verschaffen van accurate en toegankelijke informatie aan het publiek over de onderwerpen die tot de kerntaken van het instituut worden gerekend. Die informatie is gebaseerd op het materiaal dat wordt bewaard in het instituut, de beschikbare gegevens en de kennis over dat materiaal. In iedere organisatie is Informatie- en Kennismanagement (IKM) een belangrijk onderwerp. Voor archieven, musea en bibliotheken is een goed IKM niet minder dan een bestaansreden.

IKM omvat gegevens, informatie en kennis. IKM is niet een doel op zich, het is een noodzakelijk onderwerp voor het management om de juiste producten en diensten aan te kunnen bieden. Zoals bij andere belangrijke managementonderwerpen, bijvoorbeeld financieel management en personeelsmanagement, is het ook bij IKM nuttig om een modelmatige benadering te hanteren. Een bruikbaar IKM-model dat aansluit op ervaringen in andere bedrijfstakken kan er als volgt uitzien.

diagram 1
Diagram 1

Het uitgangspunt van dit model is dat de maatschappelijke meerwaarde van de instelling bestaat uit het onttrekken van materiaal uit die maatschappij, het selecteren van dat materiaal en het ervan voorzien met aanvullende informatie. Het resultaat is dat de klanten van de instelling (wie dat ook mogen zijn, dat is niet aan de instelling om te bepalen) meer en betere informatie dienen te krijgen over de onderwerpen waar de organisatie zich mee bezig houdt. Afhankelijk van de behoefte kunnen producten en diensten worden vernieuwd, aangevuld of verwijderd. Iedere laag in het IKM-model is gekoppeld aan de andere lagen en is in staat om de andere lagen direct of indirect te beïnvloeden.

Het model kent drie dynamische grootheden: klanten, instellingen en maatschappij. De instelling moet zich ervoor hoeden om als een hindernis tussen de klanten en de maatschappij te staan. Haar bestaansreden ligt in de vooronderstelling dat het meer teruggeeft aan de klanten dan dat het onttrekt aan de maatschappij. Om dit waar te kunnen maken, moet iedere laag in de instelling worden ontworpen en gerealiseerd in het kader van een IKM-model. Hieronder worden de diverse lagen in het IKM-model binnen de instelling nader beschreven, met speciale nadruk op de metadata en het Historisch Data Warehouse.

De producten en diensten tonen de meerwaarde van de instelling aan de klanten. Deze producten en diensten kunnen zeer gevarieerd zijn, bijvoorbeeld publicaties, tentoonstellingen, toegangen, merchandising en raadpleegmogelijkheden van origineel of gereproduceerd materiaal in een studiezaal. Uiteraard is een website ook een product omdat het een digitaal platform creëert voor allerlei diensten. Nieuwe, gewijzigde en verwijderde producten en diensten zullen altijd veranderingen veroorzaken in andere lagen binnen de organisatie.

Alle producten en diensten worden gemaakt en onderhouden door de werkprocessen binnen de instelling. Deze werkprocessen omvatten alle activiteiten die meerwaarde geven aan de producten. Deze activiteiten zijn altijd onderling van elkaar afhankelijk. Traditionele werkprocessen binnen instellingen voor cultureel erfgoed zijn verwerving, toegankelijk maken, beheer en dienstverlening. Als een nieuw product wordt gemaakt voor de klant, bijvoorbeeld een virtuele tentoonstelling op Internet, dan zal dat invloed hebben op de dienstverlening en de benodigde metadata. Het werkproces toegankelijk maken kan ook beïnvloed worden. Zelfs is het mogelijk dat nieuw materiaal verworven moet worden om het product te kunnen realiseren.

Voor de realisatie van producten en diensten is altijd een correcte en volledige verzameling van metadata nodig. Er bestaat veel verwarring over de definitie van de term metadata binnen verschillende disciplines. Naar mijn mening omvat het begrip metadata voor instellingen voor cultureel erfgoed gestructureerde gegevens over het historische materiaal dat wordt bewaard in een instelling (bijvoorbeeld overzichten, catalogi, inventarissen en genealogische indexen), ongestructureerde gegevens over dat materiaal (bijvoorbeeld publicaties en handleidingen) en kennis die bij de medewerkers van de instelling aanwezig is over dat materiaal. De laatste categorie kunnen ook omschreven worden als mobiele metadata, met alle risico’s van dien (ziekte, pensioen, vakantie etc.).

De juistheid en volledigheid van de metadata worden bepaald door de manier waarop de medewerkers informatie samenstellen binnen een bepaald werkproces. Des te meer de metadata zijn gestructureerd, gestandaardiseerd en bewaard in goede en open databases, des te beter kunnen zij de diverse werkprocessen ondersteunen. Teneinde een degelijke verzameling aan metadata te kunnen samenstellen en onderhouden, is een actief IKM-beleid noodzakelijk. Het IKM-beleid dient er zorg voor te dragen dat zoveel mogelijk gegevens beschikbaar zijn op de momenten en plaatsen dat ze voor een werkproces nodig zijn.

Het is van belang om een goed onderscheid te maken tussen metadata die gestructureerd kunnen worden en metadata die ongestructureerd blijven. Om dat duidelijk te maken is de volgende weergave van de cyclus van kennis behulpzaam [3].

diagram 2
Diagram 2

In dit diagram zijn de gegevens de meest gestructureerde metadata. De gegevens worden als feiten gezien en als zodanig beschreven. Moderne ICT-concepten en –technologieën zijn zeer behulpzaam in het dusdanig bewaren en onderhouden van die gegevens dat ze op allerlei mogelijke manieren kunnen worden gebruikt. Informatie kan worden gezien als een combinatie van de gegevens voor een specifiek doel. Juiste informatie leidt tot kennis. Deze kennis vergroot het inzicht over een onderwerp. Dit inzicht kan weer leiden tot nieuwe gegevens of wijzigingen van bestaande gegevens.

In de praktijk is het onmogelijk om alle kennis uit mensen te halen teneinde een complete verzameling aan gegevens over ieder relevant onderwerp samen te stellen. Aangezien de mens gegevens en informatie aan elkaar kan koppelen op een manier die door geen enkel informatiesysteem kan worden gevenaard, is het voor het IKM-beleid een grote uitdaging om zoveel mogelijk gegevens en informatie vast te leggen op een gestructureerde manier, zodat zij eenvoudig en onafhankelijk van menselijke tussenkomst kunnen worden gebruikt.

Voor gestructureerde metadata is het mogelijk om criteria te bepalen op basis waarvan een gezonde ICT-architectuur kan worden gerealiseerd. Deze criteria kunnen per organisatie verschillen maar de volgende kunnen als universeel worden aangemerkt:

Metadata kunnen worden verdeeld in categorieën op grond van hun betekenis en van het werkproces waarin ze worden gecreëerd en onderhouden. Mogelijke categorieën voor metadata van instellingen van cultureel erfgoed zijn:

Verschillende internationale standaarden, zoals Dublin Core, ISAD(G), ISAAR(CPF) and EAD, bevatten veel of alle hierboven vermelde categorieën. Ze bevatten echter niet alle mogelijkheden om de categorieën te koppelen aan aggregatieniveaus en context van de metadata.

Bij instellingen die historisch materiaal bewaren, dienen alle producten, werkprocessen en metadata een relatie te hebben met dat materiaal. Het behouden van dat materiaal is op de lange termijn de belangrijkste taak van de instelling. In het IKM-model heb ik dat materiaal op een eigentijdse manier omschreven als een Historisch Data Warehouse. De reden is dat in veel literatuur over ICT metadata worden gekoppeld aan een Data Warehouse. Historische archieven en collecties hebben vaak dezelfde functie als een Data Warehouse in een moderne organisatie. Ze vormen de basis voor het verschaffen van accurate en onveranderbare informatie teneinde besluitvorming, kennis en inzicht te ondersteunen.

Marco geeft de volgende bruikbare definitie van een Data Warehouse: “A data warehouse is a single, entreprise-wide collection of data”. Deze verzameling dient te voldoen aan de volgende vier randvoorwaarden:

  1. Een Data Warehouse is onderwerpsgericht;

  2. Een Data Warehouse geeft een integraal beeld van de werkgebieden van een organisatie;

  3. De gegevens in een Data Warehouse zijn onveranderbaar;

  4. Een Data Warehouse bevat historische gegevensverzamelingen [5].

Het concept Data Warehouse is ontwikkeld binnen de ICT-wereld voor gegevens die digitaal worden bewaard. Hoewel dat niet opgaat voor de meerderheid van gegevens die worden bewaard in organisaties voor cultureel erfgoed, is de vergelijking erg bruikbaar.De bovengenoemde uitgangspunten voor een Data Warehouse gaan ook op voor archieven en collecties, ook al zijn die niet samengesteld op een digitaal platform. De vergelijking kan dienen om een kloof te dichten tussen twee werelden die al te zeer van elkaar gescheiden zijn.

Een ander voordeel voor het aanduiden van archieven en collecties als een Historisch Data Warehouse voor de maatschappij is dat het begrip metadata op zijn plaats valt. Het begrip metadata wordt vaak gebruikt voor het aanduiden van de gegevens die nodig zijn voor het onderhouden van een Data Warehouse. Metadata zijn de gegevens die worden samengesteld, gewijzigd en verwijderd in de werkprocessen, de gegevens in een Data Warehouse mogen nooit worden gewijzigd. David Marco beschrijft de concepten van metadata en Data Warehousing op een erg herkenbare manier: “Meta data is the card catalog in a data warehouse. By defining the contents of a data warehouse, meta data helps users locate relevant information for analysis. In addition, meta data enables users to trace data from the data warehouse to its operational source (i.e. drill-down) and to related data in other subject areas (i.e., drill-across). By managing the structure of the data over a broad spectrum of time, it provides a context for interpreting the meaning of information” [6].

De vergelijking van historische archieven en collecties met het moderne concept van Data Warehousing is een interessant perspectief voor het ontwikkelen van een eigentijds IKM-model voor instellingen voor cultureel erfgoed. Archiefspecialisten werden overigens niet betrokken bij de ontwikkeling van het concept van Data Warehousing. Dat is tegelijkertijd verrassend en betreurenswaardig omdat ICT-specialisten veel hadden kunnen leren over concepten als authenticiteit, betrouwbaarheid, leesbaarheid en context en creatie van gegevens!

Informatie- en Kennis Management en ICT-architecturen

Het hierboven beschreven IKM-model biedt een abstract, integraal en consistent perspectief op het omgaan met historische gegevens en informatie binnen een organisatie. Het is het zenuwstelsel van de organisatie. Het functioneren van dat zenuwstelsel is tegenwoordig grotendeels bepaald door een goed gebruik van ICT-systemen.

In de laatste decennia zijn een aantal veranderingen opgetreden in het gebruik van ICT-middelen. Overal is ICT begonnen door enthousiaste specialisten die veelal onafhankelijk van elkaar werkten. Deze fase had vaak een experimenteel karakter. Met de groei van het belang en de mogelijkheden van ICT werd het vakgebied een zaak van strategisch belang. Grote monolithische systemen werden gerealiseerd. Met het ontstaan van client-server systemen en vooral met de opkomst van Internet werden systemen ontwikkeld die middels een integraal concept werden ontwikkeld Dit concept wordt doorgaans ICT-architectuur genoemd. De snelheid waarin deze ontwikkeling plaats heeft gevonden is erg verschillend. In instellingen voor cultureel erfgoed komen de beschreven fases vaak tegelijkertijd voor.

Wat is een ICT-architectuur? Applegate beschrijft het als volgt: “Just as the blueprint of a building’s architecture indicates not only the structure’s design but how everything –from plumbing and heating systems to the flow of traffic within the building- fits and works together, the blueprint of a firm’s information architecture defines the technical computing, information management, and communications platform. The IT Architecture provides an overall picture of the range of technical options available to a firm, and, as such, it also implies the range of business options. Decisions made in building the technical IT architecture must be closely linked to decisions made in designing the IT organization that will manage the architecture, which, in turn, must be linked to the strategy and organization design of the firm itself. Conversely, the organization strategy, structure, incentives, and processes strongly influence how the technology will be designed, deployed, and used within a firm” [7].

Indien het IKM-model zoals dat hier is beschreven als uitgangspunt wordt genomen, dient de inzet van ICT-middelen in iedere laag correct te worden ontwerpen, geïmplementeerd en aan elkaar gekoppeld. In iedere laag zal ICT een rol spelen. Op het gebied van producten en diensten kan gedacht worden aan Internet-applicaties, e-commerce en software om het gedrag van de klant te kunnen vastleggen. Alle werkprocessen zullen een of meer applicaties moeten gebruiken om metadata te raadplegen, in te voeren, te wijzigen of te verwijderen. De metadata, mits digitaal opgeslagen en onderhouden, zullen worden bewaard in databases en tekstbestanden. Het Historisch Data Warehouse zal digitaal historisch materiaal bevatten en digitale reproducties (of zelfs substituten) van origineel, niet-digitaal materiaal bevatten.

Een goede ICT-architectuur zal een antwoord moeten geven op de vraag hoe verschillende componenten worden aangewend in de verschillende lagen in het IKM-model. Er zijn verschillende manieren waarop ICT-componenten worden onderscheiden. Een goed onderscheid wordt gemaakt in onderstaand diagram.

diagram 3
Diagram 3

Net als in het IKM-model zijn alle componenten met elkaar verbonden. In ieder deel moet een verantwoorde keuze worden gemaakt uit de beschikbare producten op de markt. Die keuzes hangen af van het doel, het budget en de structuur van de organisatie.

Een goede manier om deze keuzes te maken, is het uitvoeren van een informatie audit en het ontwikkelen van een ICT-beleid op grond van die audit. Het resultaat van die audit verschilt uiteraard per organisatie. Hieronder beschrijf ik enige conclusies die ik uit mijn huidige praktijk heb getrokken.

Het is van groot belang om moderne hulpmiddelen te gebruiken voor gegevensstructuren. De metadata nemen binnen het IKM-model een sleutelpositie in met betrekking tot het leveren van nuttige en juiste informatie. De belangrijkste randvoorwaarde daarbij is connectiviteit van die gegevens, hetgeen kan worden waargemaakt door het gebruiken van relationele databases. Zij vormen het belangrijkste en krachtige instrument om gegevensstructuren en de gegevens te ontwerpen, te implementeren en te waarborgen. De taal om gegevens in een relationele database te manipuleren, SQL, is wereldwijd geaccepteerd en ondersteund. Relationele databases zijn open, hetgeen inhoudt dat zij direct altijd kunnen worden gekoppeld aan andere databases. Uiteraard dienen de logische en technische structuren op een professionele manier te worden ontworpen en gerealiseerd.

Standaardisatie van metadata is een belangrijk maar vaak verkeerd begrepen aspect. Voor degenen die vaak internationale standaarden hanteren is het wellicht niet nieuw maar veel instituten gebruiken deze standaarden niet. Een van de merkwaardige zaken is dat standaarden als ISAD(G) vaak worden gezien als iets totaal nieuws, terwijl ze meestal een verbeterde versie zijn van eerdere conventies. Het is niet nodig om een volledig nieuwe verzameling van metadata samen te stellen. Vaak is een conversie van de oude naar de nieuwe structuur (en eventueel van verouderde naar moderne, digitale informatiedrager) voldoende. Een ander misverstand met betrekking tot standaarden betreft de vraag wanneer en hoe ze moeten worden geïmplementeerd. Standaardisatie is nodig om een gezamenlijke structuur te creëren om informatie-uitwisseling nodig te maken. Het is in feite niet van belang of een database exact gestructureerd is volgens een standaard. Zolang het mogelijk is om de gegevens te genereren in een gestandaardiseerd formaat, is er geen reden tot zorg. Het is veel belangrijker om ervoor te zorgen dat een state-of-the-art database platform wordt gebruikt die het mogelijk maakt om goed te anticiperen op toekomstige ontwikkelingen in databasetechnologie en standaarden voor metadata.

Een van de fundamentele uitgangspunten in de ICT-architectuur is dat het onderhoud van metadata andere applicaties vergt dan de presentatie van de metadata. Metadata mogen nooit redundant worden samengesteld. Ze moeten worden bewaard in bij voorkeur één database. Ze kunnen echter wel gepresenteerd worden op veel verschillende manieren, bijvoorbeeld middels een zoekmachine op Internet, een geprinte versie en via een website waarin ze worden samengevoegd met gegevens van andere instituten. Daarbij komt dat software voor presentatie van de metadata meestal sneller en vaker moet wijzigen dan software voor onderhoud van die gegevens. Een ICT-architectuur die een goede scheiding maakt tussen deze twee soorten applicaties kan er uitzien zoals in onderstaand diagram. Het biedt een kader voor het ontwikkelen van goede, doelgerichte applicaties.

diagram 4
Diagram 4

Slotopmerkingen: wat is het nut van IKM in de praktijk?

De hierboven beschreven modellen en concepten zijn volstrekt nutteloos als ze niet leiden tot een verbetering van de prestaties van een instelling. Het bewijs van die verbetering kan alleen geleverd worden door een verhoging van de kwaliteit van producten en diensten van de instelling. Een grote meerwaarde van het gebruik van dergelijke modellen ontstaat als ze gebruikt worden bij de definitie van nieuwe projecten en de invloed van de beoogde projectresultaten op de organisatie. Ze kunnen ook nuttig zijn bij het in kaart brengen van de samenhang tussen verschillende projecten. De modellen kunnen worden gebruikt om voor alle werkprocessen duidelijk te maken wat de consequenties zijn als bijvoorbeeld een nieuwe virtuele tentoonstelling op het Internet wordt gemaakt of als een nieuwe zoekmachine voor metadata over archieven en collecties wordt gerealiseerd. Ze kunnen een bijdrage leveren aan het verminderen van het risico dat werkprocessen of metadata dusdanig worden veranderd dat zij niet meer aansluiten bij de behoefte van andere werkprocessen. De modellen kunnen ook gebruikt worden om te voorkomen dat nieuwe informatiesystemen in gebruik worden genomen die de organisatie zullen verhinderen om doelen als integratie, standaardisatie en presentatie te bereiken.

Een IKM-model en een ICT-architectuur bieden zeer effectieve kaders om projecten en werkprocessen te beheersen. Ieder werkproces kan aan elkaar worden gekoppeld. Het wordt mogelijk om bijvoorbeeld een checklist te maken voor nieuwe projecten, om te waarborgen dat zij passen in de uitgangspunten van de organisatie met betrekking tot IKM en ICT.

De belangrijkste vooronderstelling is dat het management van de instelling zich bewust is van het nut van het gebruik van deze instrumenten om de kwaliteit van producten en diensten te verhogen. Dit bewustzijn waarborgt dan tegelijkertijd dat die producten en diensten worden ondersteund door werkprocessen die een duidelijke samenhang vertonen. Het is de verantwoordelijkheid van het management om een beleid te ontwikkelen en uit te voeren op de middellange en lange termijn. De gestelde doelen kunnen dan op dusdanige wijze worden geformuleerd en gerealiseerd dat projecten kunnen worden geïnitieerd die consistent zijn met dat beleid.

Aan het slot van dit artikel geef ik een voorbeeld van het Gemeentearchief Amsterdam met betrekking tot het creëren van nieuwe producten en diensten. Het doel op de lange termijn is de beschikbaarstelling van alle beschikbare beschrijvingen van inhoud en context over de archieven en collecties middels één zoekingang op Internet. Deze zoekmachine dient dusdanig te zijn ontworpen dat de meest gestelde vragen van klanten (vragen over een persoon of instelling, over een onderwerp, een locatie of een periode) zo eenvoudig mogelijk kunnen worden beantwoord. Dit is een zeer omvangrijke taak waarin alle werkprocessen moeten worden betrokken.

Het startpunt verschilt zeer van de gewenste situatie, hetgeen een lot is dat door veel instituten wordt gedeeld. Veel metadata zijn niet compleet, niet gestandaardiseerd, niet digitaal beschikbaar in moderne databases en niet aan elkaar gekoppeld. Het Gemeentearchief Amsterdam heeft niet alleen archieven maar bewaart ook grote collectives bibliotheekmateriaal, audiovisueel; materiaal, foto’s, tekeningen en karotgrafisch materiaal. Het realiseren van de integratie van de metadata over dit materiaal met behulp van de juiste standaarden, is een grote uitdaging. Niet alle werkprocessen zijn dusdanig gestructureerd dat geïntegreerde diensten kunnen worden geboden. Teneinde de vereiste metadata te verkrijgen is een aantal projecten geformuleerd. Het eerste project betrof de realisatie van een volledige gegevensverzameling op het hoogste aggregatieniveau in de vorm van een overzicht. Dit overzicht is inmiddels beschikbaar op Internet [8]. Andere zoekmachines op de website zijn nog niet gekoppeld aan dat overzicht. De volgende stap betreft het presenteren van metadata over archieven op lagere aggregatieniveaus. Deze gegevens waren niet beschikbaar in een database. Daarom is een groot data-entry project uitgevoerd. Het resultaat is een database met 350.000 records. Teneinde deze gegevens als een geïntegreerde service aan te kunnen bieden, is het nodig om ze aan te vullen met zoekingangen op namen van personen, organisaties, onderwerpen, locaties en periodes. Dit project kent een tweelingbroer in het backoffice, waar bestaande informatiesystemen dienen te worden gewijzigd of vervangen om de vereiste metadata op een gestandaardiseerde wijze te kunnen samen te stellen en te onderhouden.

In de komende jaren zullen de metadata over context en inhoud van de archieven en collecties worden geïntegreerd in één model. Het gevolg daarvan is dat vrijwel alle bestaande ICT-systemen moeten worden hergewaardeerd, strategische keuzes moeten worden gemaakt over de keuze van standaarden, veel gegevens geconverteerd moeten worden, kwaliteitscontroles plaats moeten vinden en andere werkwijzen moeten worden ingevoerd. Dit alles kan niet succesvol verlopen zonder het hebben en het uitvoeren van een passend IKM-model en een daarop afgestemde ICT-architectuur.

Een Engelse versie van dit artikel is ook verkrijbaar.

Referenties

  1. Mijn speciale dank gaat uit naar Kent Haworth, York University Archivist and Head, Special Collections en Project Director and Secretary, ICA Committee on Descriptive Standards, voor zijn commentaar op een eerdere versie van dit artikel.
  2. In de meeste literatuur wordt de afkorting “IT” gebruikt, mijn voorkeur gaat echter uit naar de afkorting “ICT”.
  3. Wurman, p. 27 en Milner, p. 3. Ik heb het begrip “wijsheid” in dit diagram vervangen door het meer bescheiden “inzicht”.
  4. Vergelijk bijvoorbeeld Gilliland-Swetland, Anne J, Defining metadata, in Baca, p. 3;
  5. Marco, p. 23-24. Marco citeert Inmon, W.H.: Building the Data Warehouse, Wiley, 1996, p. 33;
  6. Marco, p. 48
  7. Applegate c.s., p. 139-140
  8. < http://www.gemeentearchief.amsterdam.nl/archieven_en_collecties/overzicht/introductie/index.nl.html> Link to external resource

Literatuur

  1. Applegate, Lynda, F. Warren MacFarlan en James L. MacKenney (1999). Corporate Systems Information Management. Irwin MacGraw-Hill, Boston.
  2. Baca, Martha and others. (1998) Introduction to metadata, pathways to digital information. Getty Information Institute, New York.
  3. Cook, Terry (2001). Archival Science and Postmodernism: New formulations for Old Concepts, in Archival Science (2001-1), ed. Horsman, P., E. Ketelaar and T. Thomassen, Kluwer Academic Publishers, Dordrecht, p. 3-24.
  4. Getty Information Institute, New York, Art and Architecture Thesaurus
    URL: <http://www.getty.edu/research/tools/vocabulary/aat> Link to external resource
  5. International Council of Archives (ICA), ISAAR(CPF) standard
    URL: <http://www.ica.org> Link to external resource
  6. International Council of Archives (ICA), ISAD(G) standard
    URL: <http://www.ica.org> Link to external resource
  7. Marco, David (2000). Building and managing the meta data repository, a full life-cycleguide. John Wiley & Sons, New York.
  8. Menne-Haritz, A. (2001). Access: the reformulation of an archival paradigma, in Archival Science (2001-1), ed. Horsman, P., E. Ketelaar and T. Thomassen, Kluwer Academic Publishers, Dordrecht, p. 57-82.
  9. Milner, Eileen M. (2000). Managing Information and Knowledge in the Public Sector. Routledge, London.
  10. Records Continuum Research Group, Australia.
    URL: <http://rcrg.dstc.edu.au> Link to external resource
  11. Ribeiro, Christina and Gabriel David (2001). A Metadata Model for Multimedia Databases.
  12. Smit, F.P. (2000). Proposal for a Datamodel of Archival Descriptions, in: Atti del Summit DACE, Roma, 2000, p. 149-196.
  13. Smit, F.P. (2001), Het nieuwe Overzicht van Archieven en Collecties, in: Archievenblad (2001-1), Koninklijke Vereniging van Archivarissen, Amsterdam, p. 26-29.
  14. Society of American Archivists, Encoded Archival Description
    URL: <http://www.loc.gov/ead> Link to external resource
  15. Svenonius, Elaine (2001). The Intellectual Foundation of Information Organization. The MIT Press, Cambridge Massachusetts.
  16. Wurman, Richard Saul (2001). Information Anxiety 2. QUE, Indianapolis.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Author Details

Frans SmitFrans Smit
Hoofd van de Sectie Ontsluiting
Gemeentearchief Amsterdam
PO Box 51140
1007 EC Amsterdam
The Netherlands

Phone: ++31 20 5720227
Fax: +31 20 6750596

<fsmit@gaaweb.nl> Link to an email address
<franssmit@planet.nl> Link to an email address
<http://www.gemeentearchief.amsterdam.nl> Link to external resource

Frans Smit is hoofd van de Sectie Ontsluiting van het Gemeentearchief Amsterdam. Hij is (en was) betrokken bij diverse nationale en internationale projecten met betrekking tot het verschaffen van toegang tot metadata over archieven en collecties door middel van zoekmachines op Internet.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

For citation purposes:
Smit, F. "Het Historisch Data Warehouse", Cultivate Interactive, issue 6, 11 February 2002
URL: <http://www.cultivate-int.org/issue6/warehouse-d/>

-------------------------------------------------------------