![]() |
Search Options | Help | Site Map | Cultivate Web Site | |||||
|
||||||
| Home | Current Issue | Index of Back Issues |
| Issue 6 Home | Editorial | Features | Regular Columns | News & Events | Misc. | ||
By Carol Peters - February 2002
Carol Peters reports on the Cross-Language Evaluation Forum (CLEF), an initiative that provides an infrastructure for the testing and evaluation of information retrieval systems operating on European languages. This activity first began in 1997 in the United States as a track in TREC (the Text REtrieval Conference series) but since 2000 has been coordinated in Europe under the auspices of the IST programme of the European Commission. In this brief overview, she describes the organization of the annual CLEF evaluation campaigns, lists the main results, and outlines plans for the future.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
The popularity of the Internet and the consequent global availability of networked information sources for an increasingly vast public have led to a strong demand for tools that permit the user to find information wherever and however it is stored, regardless of language boundaries. This demand has been stimulated by the fact that the Internet is no longer an English-only preserve and other language content is growing rapidly. It is estimated that by 2005 the non-English speaking Internet population will be approximately 80% and non-English content will be well over 50% [1].
As English is the first language of only about 5% of the worlds population, there is no doubt that this figure will rise steadily in the next decade with the Asian languages occupying a growing percentage. There is, therefore, a strong demand for efficient cross-language systems that allow users to search document collections in multiple languages and retrieve relevant information in a form that is useful to them, even when they have little or no linguistic competence in the target languages. However, such systems are not easy to develop and work is generally still in an experimental stage. Approaches currently being tested imply the integration of tools and methodologies from the fields of information retrieval, natural language processing and human-computer interaction among others. An intensive process of system testing and tuning is thus needed before the separate components can be implemented successfully in end user applications.
The aim of the Cross-Language Evaluation Forum (CLEF) is to assist this process by (i) providing an infrastructure for the testing and evaluation of systems operating on European languages, and (ii) creating test-suites of reusable data which can be employed by system developers for benchmarking purposes. These objectives will be achieved through the organization of a series of annual evaluation campaigns.
CLEF has established strong links with two successful IR system evaluation campaigns held in the United States and Japan: the Text REtrieval Conference (TREC) series organized by the National Institute for Standards and Technology (NIST) [2], and the NACSIS Test Collection for Information Retrieval (NTCIR) sponsored by the National Institute for Informatics of Tokyo [3]. Both initiatives include tracks for cross-language retrieval system evaluation (NIST will focus on English-Arabic retrieval in the next years; NTCIR is working on the evaluation of systems for mono- and cross-language retrieval for Asian languages). The three initiatives (US, Asian and European) aim at creating a network of complementary activities in the cross-language system evaluation area.
In this report, we describe the organization of the CLEF evaluation campaigns and list the main results achieved so far. The final section gives an idea of our plans and hopes for future campaigns. For more details, the interested reader is referred to [4].
CLEF represents a continuation and expansion of an activity that first began in the United States within the TREC conference series and is now coordinated in Europe and conducted as an EU-US collaboration. The CLEF 2000 and 2001 campaigns were sponsored by the DELOS Network of Excellence for Digital Libraries [5]. From October 2001, CLEF is being run as an independent project of the European Commission (IST-2000-31002). The consortium members are:
Typically, an information retrieval system evaluation campaign permits participating groups to compare the performance of their systems with respect to given tasks, under a set of controlled conditions: a reference document collection; a standard set of queries; assessment of the relevance of the ranked list of results submitted by each participating system; a comparative analysis of the results. When the evaluation is performed in a number of successive campaigns, the accumulated resources of data and assessed result sets become increasingly valuable for the participating research groups providing them with material for independent system testing and tuning. CLEF aims at making this test data also available to the wider R&D community
CLEF provides a series of evaluation tracks designed to test different aspects of information retrieval system development. The intention is to encourage systems to move from monolingual searching to the implementation of a full multilingual retrieval service.
Multilingual Information Retrieval: This is the main task in CLEF. It requires searching a multilingual collection of documents for relevant items, using a selected query language. This is a complex task, testing the capability of a system to handle a number of different languages simultaneously and to merge the results, ordering them according to relevance.
Bilingual Information Retrieval: In this track, any query language can be used to search just one of the CLEF target document collections. Many newcomers to cross-language system evaluation prefer to begin with the simpler bilingual task before moving on to tackle the more complex issues involved in truly multilingual retrieval.
Monolingual (non-English) Information Retrieval: Until recently, most IR system evaluation focused on English. However, many of the issues involved in IR are language dependent. CLEF provides the opportunity for monolingual system testing and tuning, and for building test suites in other European languages.
Domain-specific Mono- and Cross-Language Information Retrieval: The rationale for this task is to study CLIR on another type of collection, serving a different kind of information need. The information, which is provided by domain-specific documents, is far more targeted than news stories and contains much terminology. It is claimed that the users of this type of collection are typically interested in the completeness of results. This means that they are generally not satisfied with finding just some relevant documents in a collection that may contain much more. Developers of domain-specific cross-language retrieval systems need to be able to tune their systems to meet this requirement.
Interactive Cross-language Information Retrieval: An interactive track that focused on the document selection problem was experimented with success in CLEF 2001. The design of future interactive tracks will be determined on the basis of input from interested participants.
For each task, the participating systems construct their queries (automatically or manually) from a common set of statements of information needs (known as topics) and search for relevant documents in the collections provided, listing the results in a ranked list.
The main CLEF test collection is formed of sets of documents in different European languages but with common features (same genre and time period, comparable content); a single set of topics rendered in a number of languages; relevance judgments determining the set of relevant documents for each topic.
Multilingual Corpus: The document collection currently consists of nearly 1,000,000 documents in six languages Dutch, English, French, German, Italian and Spanish. It contains both newswires and national newspapers. The collection used for the multilingual task in 2001 contained documents in five of those languages (Dutch was excluded). Two target collections were used for the bilingual track in 2001. Participants could query sets of either English or Dutch newspaper documents, using their preferred topic language. Spanish and Dutch were introduced for the first time in CLEF 2001 for different reasons. Spanish was included because of its status as the fourth most widely spoken language in the world. Dutch was added not only to meet the demands of the considerable number of Dutch participants in CLEF (the largest group) but also because it provides a challenge for those who want to test the adaptability of their systems to a new, less well-known language. The domain-specific task uses a different collection: the GIRT database of approx. 80,000 German social science documents, which has controlled vocabularies for English-German and German-Russian. The interactive track used data (documents in French and English and results) from the CLEF 2000 campaign.
Topics: The participating groups derive their queries in their preferred language from a set of topics created to simulate user information needs. Following the TREC philosophy, each topic consists of three parts: a brief title statement; a one-sentence description; a more complex narrative specifying the relevance assessment criteria. The English version of a typical topic from CLEF 2001 is shown below:
Title: U.N./US Invasion Haiti
Description: Find documents on the invasion of Haiti by
U.N./US soldiers.
Narrative: Documents report both on the discussion about
the decision of the U.N. to send US troops into Haiti and on the
invasion itself. They also discuss the direct consequences.
The title contains the main keywords, the description is a natural language expression of the concept conveyed by the keywords, and the narrative adds additional syntax and semantics, stipulating the conditions for relevance assessment. The motivation behind these structured topics is to provide query input for all kinds of IR systems, ranging from simple keyword-based procedures to more sophisticated systems supporting morphological analyses, parsing, query expansion and so on. In the cross-language context, the transfer component must also be considered, whether dictionary or corpus-based, a fully-fledged MT system or other. Different query structures may be more appropriate for testing one or the other methodology.
For CLEF 2001, 50 such topics were developed on the basis of the contents of the multilingual collection and topic sets were produced in all six document languages. Additional topic sets were then created for Finnish, Swedish, Russian, Japanese, Chinese and Thai. Participants could thus choose to formulate their queries in any one of nine European or three Asian languages.
Results assessment: The number of documents in large test collections such as CLEF makes it impractical to judge every document for relevance. Instead, approximate recall figures are calculated by using pooling techniques. The results submitted by the participating groups are used to form a "pool" of documents for each topic and for each language by collecting the highly ranked documents from all the submissions. The assumption is that if a sufficient number of diverse systems contribute results to a pool, it is likely that a large percentage of all relevant documents will be included. All documents not included in the pool remain unjudged and are therefore assumed to be irrelevant. A main concern with such a pooling strategy is that if the number of not detected relevant documents is above a certain (low) threshold, the resulting test collection will be of limited future use in testing systems that did not contribute to the pool. A grossly incomplete pool would unfairly penalize such systems when calculating precision and recall measures. This pooling strategy was first adopted by TREC and has been subsequently employed by both NTCIR and CLEF. A number of studies have been made to test its validity [6], [7], [8], [9]. Relevance assessment of the documents in the pool is distributed over a number of different sites and performed in all cases by native speakers. The results are then analyzed centrally using recall and precision measures and run statistics are produced and distributed.
This activity has grown in popularity and complexity since its beginnings in 1997 when a bilingual track was offered at TREC. Participation in CLEF 2001 was up more than 50% from the previous year, with 34 groups submitting results in one or more of the 5 different tasks offered: 9 from N.America; 21 from Europe, and 4 from Asia. Runs were submitted for all five tasks and for all twelve topic languages. Twenty-one groups tried one of the cross-language tasks, while ten preferred to remain with the monolingual track. Only eight groups were brave enough to attempt the multilingual track (processing a document collection in five languages is certainly a challenging task) and it should be noted that just two of these were newcomers to CLEF. An additional three groups tackled the experimental interactive task.
The results of CLEF 2001 were reported at a Workshop in Darmstadt, Germany (immediately preceding ECDL2001 the European Conference on Digital Libraries). Both traditional and innovative approaches to CLIR were presented, and different query expansion techniques were described. All kinds of source to target transfer mechanisms were employed, including both query and document translation. Commercial and in-house resources were used and included machine translation, dictionary and corpus-based methods. The search strategies used varied from traditional IR to a considerable employment of natural language processing techniques. Different groups focused on different aspects of the overall problem, ranging from the development of language-independent tools such as stemmers to much work on language-specific features like morphology and compounding. A number of groups compared different techniques in different runs in order to evaluate the effect of a given technique on performance. In particular, it was noticeable that many groups were testing systems that integrated more than one translation method, e.g. MT or bilingual dictionary look-up combined with a data extracted from a comparable of parallel corpora. Overall, the CLEF 2001 Workshop offered a very good picture of current issues and the approaches now being adopted in CLIR. Recent CLEF Proceedings provide an excellent overview of both the state-of-the-art and the latest experiments in this field [10,] [11], [12].
The organization of the CLEF 2002 campaign will be very similar to that of CLEF 2001. There will be five tracks:
There will be seven languages in the multilingual collection as Finnish documents have been added to the existing set of Dutch, English, French, German, Italian and Spanish newspaper and news agency texts. Topics will again be provided in a large number of languages, according to demand. The multilingual task will remain unchanged. The greatest change will be in the bilingual task. In an attempt to encourage work on European languages other than English, it will be possible to query any of the document collections bilingually, with the single exception of English. The only variation to this rule will regard newcomers; they will be given the option of choosing English as their target collection. The domain-specific task will be expanded with more use being made of the controlled vocabulary. The conditions of the interactive task are still to be decided.
More information will be available from the Clef Web site [13].
It remains to be seen how long it will be possible to continue CLEF. The current funding from the Commission covers the 2002 and 2003 campaigns. Our aim is to add more languages to the document collections and to include new tasks. We would like to cover not only the major European languages but also some representative samples of minority languages, including members from each major group: e.g. Germanic, Romance, Slavic, and Ugro-Finnic languages. We also hope to include tracks to evaluate CLIR systems working on media other than text. In particular, we are beginning to examine the feasibility of organizing a spoken CLIR track in which systems would have to process and match spoken queries in more than one language against a spoken document collection.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Carol Peters
Researcher
IEI-CNR
Area di Ricerca CNR, Via Moruzzi, 1
56124 Pisa
Italy
Phone: +39 050 3152897
Fax: +39 050 3152810
Carol Peters is a researcher at the Istituto di Elaborazione della Informazione an Institute of the Italian National Research Council in Pisa. Her current research interests are focused on the multilingual information access area. She has collaborated on numerous European Commission projects and is now working on the implementation of the multilingual interface for two digital library projects: ECHO - European CHronicles On-line - collections of historical film archives from four European countries, and SCHOLNET - a digital library to support virtual scholarly communities. She is also Coordinator of the Cross-Language Evaluation Forum (IST-2000-31002), a cross-language information retrieval system evaluation activity sponsored by the Information Science Technologies programme of the EC in collaboration with the US National Institute of Standards and Technology.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
For citation purposes:
Peters, C. "Evaluating Cross-Language Systems the CLEF Way", Cultivate Interactive, issue
6, 11 February 2002
URL: <http://www.cultivate-int.org/issue6/clef/>
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Related articles:
If you would like to view similar articles to this one click on a key word below:
< - clef - language - retrieval - >
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
Copyright ©2000 - 2001 Cultivate. | Published by UKOLN | Design by ILRT | Contact Us |