Cultivate Interactive Home Page *
*

Search Disabled

  Home | Current Issue | Index of Back Issues
  Issue 4 Home | Editorial | Features | Regular Columns | News & Events | Misc.

ARION: An Advanced Lightweight Architecture for accessing Scientific Collections

By Catherine Houstis and Spyros Lalis - May 2001

Catherine Houstis and Spyros Lalis describe the work of Project Arion. ARION, an advanced lightweight architecture for accessing scientific collections, aims to provide a new generation of Digital Library services for the searching and retrieval of digital scientific collections that reside within research and consultancy organisations.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Introduction

Scientific data and programs have long been treated as ‘private’ resources to be used only by the people/organisation who created/developed them. This ‘private ownership’ however is usually a situation arising from inaction rather than from policies restricting data reuse. Data and models are uniquely collected/developed as part of a scientific study, but post-study it is not a priori clear what should happen to the data used. There are literally thousands of scientific collections/data sets that are getting lost at the end of a study that produced them. This is tremendously valuable information, which is getting lost because of non-existent cataloguing (metadata), unreachable because of heterogeneity of software/hardware it is stored, poor documentation and etc, all at a great expense of the taxpayer’s money. Research is very expensive, as it requires specialised expertise to be carried through, thus a poor return on this investment can be prohibited if scientific collections could be shared and reused. This is the premise of a digital library, making such resources electronically available to a large number of –possibly remote– users.

Internet-based techniques have been developed to make scientific resources available to the wider scientific community and improve this situation. However, even state of the art systems typically come with four main flaws, which make them unattractive both to resource providers and users. First, the scientific data resource export procedure remains complicated involving programming effort and expertise that is alien to the data providers. Second, users are offered a simple search interface with little guidance on how to track down or create specific information. Thirdly, once a resource is found there is little support for flexible reuse, i.e. one can either take/use the resource as is or not at all. Thus, dynamic combination of several resources belonging to different providers to create new resources is virtually impossible. Last but not least, current solutions do not work with existing practices and financing methods used in the organisations that produce data and as such they are regarded as a ‘burden’ rather than as an ‘assistance’.

ARION, a recently funded international research and development project, is aiming to provide a new generation of Digital Library services for the searching and retrieval of digital scientific collections that reside within research and consultancy organisations. This functionality will be achieved via an appropriate distributed system that can be easily installed and administered by the various participants.

ARION advances the findings of previous studies in areas, such as, management of networked scientific repositories, metacomputing, intelligent information integration and digital libraries. ARION is a federated open system and is developed in association with national data providers, scientific researchers and SME’s to ensure that the project meets their needs. The ARION consortium is composed of research organisations: the Institute of Computer Science-Foundation for Research and Technology (GR) as the leader, the National Technical University of Athens (GR), the Consiglio Nazionale delle Ricerche CNR-IMA (IT), the Commission of the European Communities, Joint Research Centre (IT), the University of Crete (GR); and the SMEs HR Wallingford Ltd (UK), the Oceanographic Company of Norway ASA and the Enterprise LSE Limited (UK). The ARION started in January 2001 and will be completed in 3 years.

Digital Libraries: State of the Art

The rapid development of distributed computing infrastructures and the growth of the Internet and the WWW have revolutionised the management, processing, and dissemination of scientific information. Repositories that have traditionally evolved in isolation are now connected to global networks. In addition, with common data exchange formats, standard database access interfaces, information mediation and brokering technologies in the context of Digital Libraries Initiatives and I3, Intelligent Information Integration, emerging data repositories can be accessed without knowledge of their internal syntax and storage structure. Furthermore, search engines, are enabling users to locate distributed resources by indexing appropriate metadata descriptions. Open communication architectures provide support for language-independent remote invocation of legacy code thereby paving the way towards a globally distributed library of scientific programs. Finally, workflow management systems exist for coordinating and monitoring the execution of scientific computations. The standardisation and interoperability is pursued by the W3C.

This technology has been so far successfully used to address system, syntactic, and structural interoperability of distributed heterogeneous scientific repositories [1]. However, interoperability at the semantic level is needed to overcome the problem of identifying the scientific resources that can be combined in a meaningful way to produce new data [2]. This is of key importance for providing widely diversified user groups with advanced, value-added information services.

Another body of work addresses integration of heterogeneous information over a number of networked distributed repositories [3]. In this context the aim has been in building global environmental systems. Integration has also benefited from workflow technology, which has been used originally in business processes.

Solutions for Scientific Collections

A Digital Library of Scientific Collections: Concept Innovation

A Digital Library of scientific collections is a new and unprecedented concept. It encompasses the characteristics of a traditional library and in addition, it creates new content on line. In traditional libraries humans create new knowledge after having used the library content. In the case of scientific content (and in the ARION digital library), new content is created continuously upon user demand. Any scientific area is represented not only by means of multimedia document information but also in terms of data sets, programs and tools which can produce new information, interactively, either by analyzing data or by predicting physical phenomena, in terms of simulation of physical processes. Data analysis can be statistical analysis or extraction of information from satellite pictures for instance, or data acquisition from databases belonging to the library content via data mining tools, etc.

Another difference with traditional libraries is that the content of such a library is not within the walls of a building, nor can be stored using a single centralised computer system. Scientific objects such as programs for instance, are in general not portable and in addition they may need specialised software/hardware to execute. In ARION they reside in the provider’s organisation servers and are remotely invoked via the ARION system. Thus, the content of the digital library is distributed over the provider’s servers. In addition, the library documents not only the scientific object descriptions (metadata), but also scientific expertise in terms of data production rules (workflows), to make their reuse possible to the users. Visualisation tools are used to convey information to the users, statistical tools, and any other tools scientists use with their data sets and programs all supplied by the provider’s organisation. A WWW interface makes the library services accessible from anywhere via a web browser and an Internet connection. Thus, it provides an international collaborative environment. This adds tremendous value to a worldwide community of users.

ARION has the potential of becoming an international forum of scientific content and lead the effort of creating digital libraries of scientific objects worldwide. To the best or our knowledge the generalisation of ideas presented in ARION have not been put forward previously. Previous work has addressed management of scientific information for specific scientific areas and as such in all cases is a much simpler or very specific context. In the case of ARION, the scalability of the problem, the generality of the content, and the automated thus attractive ways to add new content are dealt within the architecture.

A Digital Library of Scientific Collections: Technical Innovation

The ARION Digital Library provides lightweight and straightforward tools to the repository providers, to automate the publication and export of their repository collections. It provides to the user an automated fast and accurate system to locate, retrieve and visualise data on demand. In scientific collections, the existence of scientific programs provide the possibility of computing data on demand by making complex combinations of data and programs existing in various heterogeneous geographically distributed and autonomous collections. The ARION advanced architecture supports these functions. Support is based on the coupling of ontologies with metadata and workflows to be able to address the needs of multiple scientific collections.

This functionality yields several technical innovations, which are indicated below:

ARION: An Advanced Lightweight Architecture for a Digital Library of Scientific Collections

ARION promotes advanced features of Digital Library technology and in addition it promotes features that take into account the content and characteristics of scientific collections. Specifically, it is based on an advanced middleware architecture that seamlessly integrates Digital Library, Intelligent Information Integration, and Workflow technologies. It is comprised of three main modules: the Metadata Search Engine, the Knowledge Base System, and the Workflow Runtime System, which co-operate to provide the user with the desired functionality. The architecture is shown in Fiure 1. The functionality of each component is briefly described in the following.

The Metadata Search Engine is responsible for locating external resources, either data sets or programs. It may also retrieve complementary information stored in the repositories, e.g. user documentation on the available resources. The Search Engine accepts metadata queries on the properties of resources and returns a list of metadata descriptions and references. References point to repository wrappers, which provide an access and invocation interface to the underlying legacy systems (repositories) where the data and programs reside. The Knowledge Base System accepts queries regarding the availability of ontology concepts. It generates and returns the corresponding data productions based on the available resources and the constraints imposed by the ontology rules. These productions provide all the information that is needed to construct workflow specifications. The KBS regularly communicates with the Metadata Search Engine to update its database. The Workflow Runtime System monitors and coordinates the execution of workflows. It executes each intermediate step of a workflow specification, accessing data and invoking programs through the repository wrappers. Checkpoint and recovery techniques are employed to enhance fault tolerance.

In addition, a user interface designed to work on a web browser at the user computer (with Internet access), is reached via a web address and provides access to the ARION system. A number of tools are developed for the provider in order to publish and install scientific collections into a scientific digital library in a provider friendly manner. These tools are part of the ARION architecture.

This architecture ensures the scalability and extensibility required in large, scientific collections systems. It allows operationally autonomous and geographically dispersed organisations to selectively “export” their resources. Publishing/installing a new resource with the system requires merely supplying appropriate metadata/ontology, workflow descriptions and wrappers.

Figure 1: A middleware architecture for distributed scientific repositories
Figure 1: A middleware architecture for distributed scientific repositories.
The system consists of interoperable Knowledge Base, Metadata Search, and Workflow Runtime components.

To enhance performance and fault tolerance, the Metadata Search Engine can be distributed across several machines. Also, several knowledge units adhering to different domains of scientific knowledge can be plugged into the Knowledge Base System to support a wide variety of scientific applications and user groups.

Efficient execution and administration of the system are achieved via special data and program export wizards for wrapper generation, automated use of filters to transform data between different formats, and use of mobile code that is downloaded and used at the user’s request.

Conclusion

The ARION architecture has been presented, forming a library of data sets programs and tools, all components of scientific collections. This library is a federation of heterogeneous systems, which interoperate to provide data services to its users. These services are access of data sets when they are stored into the system archives or dynamic production of data sets when they can be produced on the fly, upon user demand. Retrieval occurs via special tools to either visualise or statistically analyze the data sets.

Due to its modularity, participants may install only parts of the system on their premises, depending on their needs and limitations, both organisational and commercial. A provider may include the entire system architecture in order to organise his in house collection or various down scale versions of it, like only search engine or metadata storage. The system architecture supports different versions with a variety of capabilities at the provider’s end, in addition to a system-wide server featuring all architectural components for everyone’s use. This important architectural feature of the ARION system addresses the scalability problem of global (Internet accessible) digital libraries of scientific collections.

This work is supported by the EU 5th framework program. IST-2000-25289

References

  1. THETIS: A Data Management and Data Visualization System for Coastal Zone Management of the Mediterranean Sea. Contact person C. Houstis
    URL: <http://kos.ics.forth.gr:8000/> Link to external resource
  2. V. Christophides, C. Houstis, S. Lalis, H. Tsalapata. (1999) Ontology-driven Integration of Scientific Repositories, NGITS’99, New Generation Information Technologies, Lecture Notes in Computer Science, Elsevier, Habart Habaron, Israel, July 1999.
  3. C. Houstis, S. Lalis, N.M. Patrikalakis, W. Cho. (1999) Federated Scientific Information Systems, position paper for the invitational workshop for the EU-NSF cooperation on Large Scientific Database Systems.
    URL: <http://www.cacr.caltech.edu/euus/documents/houstis.html> Link to external resource

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Author Details

Catherine Houstis
Institute of Computer Science-Foundation for Research and Technology
Heraklion Greece

Houstis@ics.forth.gr Link to an email address

Catherine Houstis received her Ph.D. from the Electrical Engineering Department of Purdue University, USA, in 1977. In 1978 she was a Postdoctoral associate at the EE Dept. of Purdue University. In 1979 she joined the National Cashier Register (NCR) corporation as a research scientist in the Advanced System Research and Development department. From 1980 to 1983 Catherine worked as an assistant professor at the Electrical and Computer Engineering Department of the University of South Carolina. In 1984 she became an associate professor. From 1984-1987 she was a visiting associate professor at the EE Dept. of Purdue University.

In 1987 Catherine joined the Computer Science Department of the University of Crete. She was also a research associate at the Institute of Computer Science of FORTH. She is now a full professor and the Leader of the Distributed Systems Laboratory at the Institute of Computer Science FORTH. She has lead and participated in research projects funded by NSF in the USA, and ESPRIT, AIM, RACE, Telematics and Digital Libraries for scientific data collections in the EC. Her main research interests are in Internet based scientific information systems, Metacomputing, commercial aspects of scientific information systems and performance evaluation of global distributed systems.

Spyros Lalis
Institute of Computer Science-Foundation for Research and Technology
Heraklion Greece

lalis@ics.forth.gr Link to an email address

Spyros Lalis received a doctorate in Technical Sciences and a Diploma in Computer Engineering from the Swiss Federal Institute of Technology Zurich, in 1989 and 1994 respectively. Since 1997 he has been a Research Associate of the Institute for Computer Science at the Foundation for Research and Technology Hellas and an Adjunct Professor of Computer Science at the University of Crete.

Currently Spyros is Visiting Assistant Professor at the Computer and Communications Engineering department at the University of Thessaly. He is actively involved in the design of distributed systems, two of them developed through funded European projects. He is also leading a European research project in the area of ubiquitous computing. His interests include Programming Languages and Systems, Software Engineering, Distributed and Parallel Systems, Metacomputing, Ubiquitous and Pervasive Computing, and Economies of Electronic Services.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

For citation purposes:
Houstis, C and Lalis, S. "ARION: An Advanced Lightweight Architecture for accessing Scientific Collections", Cultivate Interactive, issue 4, 7 May 2001
URL: <http://www.cultivate-int.org/issue4/arion/>