Organizers Synopsis|
Program Schedule|
Resumes of Experts|

Related Materials|



Chinese-European workshop on Digital Preservation


Andreas Aschenbrenner

Significant Properties and File Format Characteristics
The presentation will introduce the notion of ?significant properties? - the features of digital objects that need to be preserved. The definition of significant properties at the inception of a digital preservation programme will guide an organisation in choosing the preservation method suited for their specific requirements. The preservation of an object?s significant properties is constrained by the characteristics inherent in its file format. The presentation will therefore also discuss file format characteristics, and their role in a successful long-term preservation effort.
Cooperation has always been essential in the digital preservation community regarding knowledge exchange and collaboration in research activities. As initiatives increasingly address implementation, cooperation also gains practical significance. Initiatives embark on collaboratively building services that are required by various preservation systems. This presentation addresses file format registries. The preservation community jointly calls for a register that identifies and documents file formats, to come to terms with the myriad of different file formats. Already activities towards building a file format registry are emerging, and some preservation initiatives already rely on such a future service in their current approaches.

Reinhard Altenhoner
Persistent Identifiers
The electronic publishing process implies in generell certain attributes: "fast, cost-saving, worldwide acessible". Seen from the author's or the user's perspective, however, those attributes are sufficient to ensure permanent access to the electronic publication? The usual experience with the Internet promises not a sufficient result: URLs don't offer a mechanism that allows internet based publications to be identified unequivocally and being traceable at anytime. One solution to that situation is being provided by Persistent Identifiers such as Uniform Resource Names (URN), whose implementation Die Deutsche Bibliothek consolidates within the scope of the project EPICUR.
An implementation in local contexts cannot provide the persistency of that addressing scheme on its own. For example, if an institution ceases the maintenance and provision of a digital collection, their references - even when applied by a persistent adressing scheme - would equally be volatile as in case of URLs. To ensure a long-term usability of a permanent addressing scheme such as URN, it is necessary tocreate an infrastructure with an institutional backup.
The talk focusses on a general description of the Persistent Identifier activities worldwide and on productive implementations. Based on the EPICUR project in Die Deutsche Bibliothek, a prospect on continuative activities will be given.

Ren└ van Horik
Preservation of image formats
Issues covered in the presentation:
, Definition and description of image formats.
, Types of image formats.
, Overview of theories (and the assumptions on which the theories are based) on digital preservation relevant for preservation of image formats.
 o-> Metadata
 o-> File format standards
 o-> Role of XML
 o-> Registries
 o-> Etc.
, Overview of available digital preservation solutions relevant for the preservation of digital images.
 o-> Format registry
 o-> Format identification
 o-> Digital archiving
 o-> Distributed storage
 o-> Emulation
 o-> Etc.
, Importance of evolutionary approach. Digital data formats are relative young. Only the future can judge which assumptions were right.
, 'Building blocks' for the long term preservation of digital raster images
 o-> Based on following assumptions:
, Graphics file format standards are durable
, Digital data encoded in the XML data format is durable data
, Metadata on digital objects is essential in order to understand and process digital images in the future
, Building block 1: Graphics file format standards.
 o-> Discussion of graphic file formats used in the period 1994 - 2004
 o-> TIFF format seems the best format for long-term access.
 o-> Discussion of TIFF file format
, Building block 2: XML data format for durable encoding of the bitstream of digital images.
 o-> How to express the bitstream of a digital image in XML?
, Expression of content model in XML
, Binary to XML conversion
, XML to binary conversion (in the future)
 o-> Methods available to express a digital image in XML
, Bitstream syntax description language (BSDL)
, Universal Virtual Computer (UVC)
, Format language for Audio-Visual object representation (Flavor/Xflavor)
 o-> Comparison of three methods
, Building block 3: Preservation metadata element sets for digital images.
 o-> Methods to create and store preservation metadata on digital images (e.g. "automatic metadata exposure" project of RLG).
 o-> Some important metadata element sets for digital images:
, NISO Z39.87 (technical metadata for digital still images)
, EXIF (for digital images created by digital camera's)
, SepiaDES (for digital surrogates of historical photographs
, Etc.
, Conclusions:
 o-> (Baseline) TIFF seems best format for long-term storage of digital master images
 o-> Expression of bitstream in XML: more research required
Preservation metadata: application profiles and registries help to 'discriminate exactly what we know vaguely'.

Case study: Preservation of image documents at the Netherlands Institute for Scientific Information Services (NIWI-KNAW)
Issues covered in the presentation:
, Task and mission of NIWI-KNAW
 o-> Archiving of scientific data (Netherlands Historical Data Archive for archiving data sets created by scholars in the Humanities)
 o-> Research & Development of ICT applications in Humanities (e.g. Historical discipline)
 o-> Digital data collection creation projects (historical censuses, GIS, visual material)
, Creation and archiving of image documents at NIWI-KNAW
 o-> Project oriented
 o-> Scope on digitisation of historical sources
 o-> Relation analogue original - digital surrogate
 o-> Benchmarking digitisation chain
 o-> Examples (mainly digitisation of historical photographic collections)
, How to guarantee long-term access to images?
 o-> Risk management (assessment of risks that threat long-term access to digital images and the impact of the risks)
 o-> In some situations microfilm is best archival medium!
, 'Film based imaging'
, Preservation microfilming vs. preservation imaging
 o-> Access = preservation.
 o-> OAIS reference model
, Function as "checklist"
, Used in practice to implement a data archive (of image documents)

Preservation of Scientific Data in the Humanities
Issues covered in the presentation:
, Scientific data archives in the Humanities and Social Sciences:
 o-> Social science data archives. (Survey based. Archiving methods developed in the 1970s).
 o-> Electronic text archives. (Importance of TEI for describing content, context and structure of electronic texts).
 o-> Historical data archives (both structured and unstructured data. Archiving routines based on social science data archives).
, Source oriented computing vs. problem oriented computing and its impact on data archiving methods.
, Importance of relation dataset - historical source
 o-> Public record offices. (Relative recent interest in digital archiving, definition issues, legal context, adoption of principle of provenance in digital environment).

, International situation:
 o-> Social Science and Historical data archives in Europe (Organisation and situation in a number of countries)
 o-> International collaboration
, IFDO (International Federation of Data Organizations)
, CESSDA Council of European Social Science Data Archives)
 o-> User oriented organisations
, AHC: Association for History and Computing.
, ACH/ALLC: Association for Computers and the Humanities / Association for literary and linguistic computing.

, Important standards:I
 o-> OAIS (helps to establish common vocabulary)
 o-> DDI (Data Documentation Initiative)
 o-> XML based standards, e.g. METS

, Changing research practice and influence on data archiving
 o-> Scholars working together in networked environment ("Collaboratories" / "Sharium")
 o-> Data archives must be active at the beginning of the data life cycle
 o-> Central vs. distributed models for storage
 o-> Open Access

Andreas Rauber
ELOS: The EU FP6 Network of Excellence on Digital Libraries,
with a specific focus on its Preservation Cluster activities
Digital Libraries (DL) have been made possible through the integration and use of a number of IC technologies, the availability of digital content on a global scale and a strong demand for users who are now online. They are destined to become essential part of the information infrastructure in the 21st century. The DELOS network intends to conduct a joint program of activities aimed at integrating and coordinating the ongoing research activities of the major European teams working in DL-related areas with the goal of developing the next generation DL technologies.
This talk will provide an overview of the seven Research Clusters within the DELOS Network, with a specific focus on activities of the DELOS Preservation Cluster.

Using Utility Analysis to Evaluate and Compare Preservation Strategies
Long-term preservation solutions become critical as an increasing amount of information is being digitized or directly created and thus existing only in electronic form.
While different approaches, such as Emulation, Migration, or Computer Museums, are being proposed as solutions to this challenge, neither of them excels in all circumstances. Selection of the most appropriate strategy and tools becomes a non-trivial task. In this talk we present an adapted version of Utility Analysis, which can be applied to selecting the optimal preservation solution for each individual situation. This analysis method, which is usually applied in infrastructure projects, such as highways, airports, or city district development, is here used to combine the wide range of requirements, which are to be considered in order to select a suitable preservation strategy.
Additionally, we present a framework for identifying and defining the criteria influencing the choice of a particular preservation solution, such as a specific migration tool.
The evaluation metric is explained theoretically and demonstrated via case-studies performed for different application domains.

Preservation of Scientific Data (in Natural Sciences)
Preservation of "Primary Data" is of very high relevance in science. While letter publications become redundant by review publications over the years, and while review articles are remembered by many readers, primary data often can not be reconstructed. Primary data as there is e.g. weather data, accelerator data, space observation data, build the backbone of scientific research and publication activities. It is essentially required to reconstruct experiments, to recalculate final results in scientific publications and to check their correctness. The existence of primary data makes the difference between fiction and science. Primary data often is open for re-usage in other research activities, e.g. measurement of the radius of the proton at CERN.
This talk will offer an overview of the relevance of primary data in natural science, of its preservation requirements, and how it is preserved today.

Title: Preservation Planning, Institutional Strategies and Policies
Should we preserve everything or only a selection of the available information? This is the question which will be illuminated from different point of view within this talk. Institutions all around the world are developing strategies and policies for preservation. Many developments are done in parallel redundantly, some are still open. This talk will give a brief overview of the activities (in selection) and the solutions they developed or the state of discussion process. This overview is part of the project "nestor" supported by the Germany Ministry for Education and Research.

Hilde van Wijngaarden

Different approaches to digital preservation (Migration, Emulation, UVC, etc)
Digital preservation consists of three subjects: safe storage, preservation metadata and permanent access. First we have to make sure digital objects are stored on secure storage media and are maintained by proper procedures for safety, back-up and refreshment. In order to be able to retrieve the stored objects, we have to register information on the objects in preservation metadata and work on the technical possibilities to render the stored objects, now and in the future. To work on permanent access solutions, a number of questions have to be answered about what it is we want to view and use in the future. Different strategies can be deployed, each with their own advantages and disadvantages. In this presentation these strategies will be explained and linked to their intended use and their possibilities, including examples. Apart from existing strategies, new procedures and especially tools will have to be tested and developed to keep our digital archives accessible. Research and development on permanent access requires continuous effort and internation co-operation.
Case study: Preservation Strategies of the National Library of The Netherlands
As a deposit library, the National Library of the Netherlands (KB) was faced with having to store digital publications already more than ten years ago. As the number of digital publications was growing, the KB decided to make digital preservation one of its main concerns. This resulted, among other things, in an operational digital archive (the e-Depot) and projects to develop preservation functionality. In this presentation two projects will be explained in more detail: the Preservation Manager, a tool for the monitoring of technical metadata, and the Universal Virtual Computer for JPEG. This UVC is a new approach for the rendering of digital objects, without depending on current platforms or formats. Together with IBM, we developed a first working UVC, which will be demonstrated.
Case study: Preservation of scientific e-journals at the National Library of The Netherlands
The digital archiving system of the KB, the e-Depot, stores e-journals of major international publishers automatically and for the long-term. This amounts to a total of over 2 million articles stored today and this is just after the first year that the system has been operational. Two major publishers, Elsevier and Kluwer, deposit their world production of e-journals at the e-Depot. And since they publish mainly in the field of Science, Technology and Medicine, the e-Depot now holds about 20% of everything that is recently published in this field, world-wide. This presentation will explain what lead to this result, how the e-Depot works, what we have agreed with the publishers and what we plan for the future. We call the e-Depot a 'safe place', working towards international co-operation (safe place strategy) and towards certification as a so-called trusted depository.

Michael Day

Metadata for preservation
In recent years there have been a range of metadata specifications and frameworks developed to support digital preservation activities. These range from formats that are intended to be specific to certain types of resources to generic frameworks based on the information model defined by the Reference Model for an Open Archival Information System (OAIS). Those specifications that exist have been developed from the perspective of a variety of different professional domains and world-views. The presentation will attempt to define preservation metadata, introduce some of the most important schemas and standards being developed, and outline some of the problems that result from the differing perspectives that inform their development.
The OAIS Reference Model: current implementations
The OAIS Reference Model (ISO 14721:2003) is an important part of the current digital preservation landscape. Initially developed by the Consultative Committee on Space Data Systems, the OAIS establishes a common framework of terms and concepts, identifies the basic functions of an archival system, and provides an information model for managing digital objects and information packages. The OAIS information model has proved to be extremely influential on the development of preservation metadata schemas. While the OAIS Reference Model does NOT specify any implementation, it has informed the implementation of some preservation systems, including the National Library of the Netherlands deposit system. This presentation will provide an introduction to the OAIS Reference Model and highlight some recent implementations that have been informed by it.