A. Hollier E-archiving WG/41

7 March 2000

Fourth Report of the Electronic Archiving Working Group to the Archive Committee

Contents

Appendix 1 - Draft Operational Circular on Electronic Records

Appendix 2 - Examples of electronic records management outside CERN

Appendix 3 - Glossary of terms

1. Introduction

The International Council on Archives defines a record as " …a specific piece of recorded information generated, collected or received in the initiation, conduct or completion of an activity, and that comprises sufficient content, context and structure to provide proof or evidence of that activity." This definition highlights some of the problems with electronic records. In many cases only the content of the electronic document is kept, without the structural and contextual information necessary to ensure that it constitutes a valid ‘record’ or that it will continue to be usable in the long-term, or even the medium-term.

2. The Electronic Archiving Working Group

The Working Group’s mandate was to "explore the size and the typology of the electronic documents at CERN, [and] their current status, and prepare recommendations for an archiving policy to preserve their historical and economical value in future".

The group comprised a small number of individuals within CERN who have experience in information technology and information management. This experience was supplemented by desk-based research using printed and on-line sources. Time constraints on the group members and lack of funding for external contact (visits, etc.) has limited what could be achieved. However, it has been a valuable exercise in increasing the group members’ understanding of these issues.

The group drew up a short list of requirements for a system that would adequately meet the requirements of electronic records management (called a Certified Information System). This formed the basis of a questionnaire, and structured interviews were held with the responsible persons for the major electronic information management systems identified by the group within CERN.

It is recommended (see section 6 below) that a new Working Group be set up for the next stage in the process, and that this group should include at least some of the responsible persons mentioned above. Raising the awareness of these key people emerged as one of the major requirements. It is important that they should understand the issues involved and also that they should have an opportunity to give their input at the planning stage. In this way they are more likely to be fully behind any policy decisions that are eventually taken.

3. Computer systems examined by the Working Group

3.1 CEDAR (CERN EDMS for Detectors and Accelerator)

This system aims to organise engineering data for LHC and its experiments from design through to disassembly. Consequently it has been created with medium-term to long-term storage in mind. A document management system was purchased (CADIM/EDB by Eigner & Partner). A wide range of formats are accepted and access is controlled by a system of user privileges and passwords. Some metadata are kept, as is a log of accesses and changes made to documents. Incremental back-ups are made twice a day and a full back-up once a week. CEDAR now incorporates the CERN Drawings Directory (CDD).

The Working Group’s report on this system was written on 23 July 1998.

3.2 CFU (Contract Follow-Up)

The role of this system is to control the follow-up of CERN contracts for material and services. CFU is still under development. It is intended that the final version will also contain scanned versions of relevant documents coming from outside CERN. The system is based on an Oracle database, with a TUOVI WDM user interface and a web interface for database management. A header is kept containing some metadata, and a log is preserved with the intention of guaranteeing the authenticity of the documents. A system of user privileges and passwords is set up to protect documents against unauthorised use.

The Working Group’s report on this system was written on 12 February 1999.

3.3 E-mail

The system provides an e-mail service for almost 9000 users, and can be accessed remotely by users outside CERN. The server is UNIX based, with a variety of public domain software, and a number of different client systems are accepted. Limited metadata are stored, embedded in the documents.

The system is constantly evolving. New plans mean that users will have the ability to file e-mails in a searchable database. List-box archives will be made more visible to encourage their use.

The Working Group’s report on this system was written on 17 June 1998.

3.4 CERN Document Server

This is a primarily a server for the Library, but also includes other resources such as photographs and internal notes. It contains mainly bibliographic records with some full-text documents. The system is based on a UNIX server, using ALEPH Integrated Library System software with a web interface.

The Working Group’s report on this system was written on 15 June 1998.

3.5 NICE (Network Integrated Computer Environment)

This system provides storage, retrieval and back-up for the PC network, and facilitates information sharing. Limited metadata are attached to the documents. Version control, classification, retention and migration of files are left to individual users.

The Working Group’s report on this system was written on 17 July 1998.

3.6 WWW (World Wide Web)

This system allows the easy diffusion of information through CERN and world-wide. It is based on a Sun server, and uses Apache Webserver and a number of different client browsers. There is effectively no mechanism for long-term storage. Back-ups are kept for a limited period. The keeping of metadata is left up to the authors; almost nothing is stored, although the use of Dublin Core is recommended in the guidelines.

The Working Group’s report on this system was written on 25 September 1998.

3.7 AIS - Administrative Information Services (previously EDH - Electronic Document Handling)

This system facilitates the creation and processing of CERN’s administrative documents. It is available via the web to collaborators based outside CERN. The AIS application server is currently running on a Sun server using the Solaris operating system. It is backed up every night. Only types of documents that can be stored on the underlying Oracle database are accepted. Metadata are kept, as is a log of document usage. The AIS application server was the first application at CERN to use a secure connection to the login server, and electronic signatures.

The Working Group’s report on this system was written on 27 October 1999.

Experimental databases were deliberately excluded from this study, but there remain other systems which should be examined, for example the Human Resources (HR) database and Andrew File System (AFS).

4. Criteria of a Certified Information System

4.1 The system must preserve all the aspects of a genuine record:

Content: what the document "says"

Structure: the internal format and arrangement of the document

Context: where, when, why and by whom the record was created and its role in the activities of CERN.

4.2 The system must preserve the integrity of the records; this includes:

Version control: the identification of revised documents and safeguarding of obsolete documents

Inviolability: the prevention of unauthorised access, alteration or removal of documents

Authenticity: preservation of a history of the document’s creation, transmission and use.

4.3 The system must preserve access to the records. There are various ways in which this may be achieved; one is to ensure that the record can be migrated without loss of essential characteristics and to carry out such migrations as required.

5. Some weaknesses found in existing CERN systems

Poor search capability.

Insufficient metadata kept.

No consideration of long-term archiving needs.

Insufficient back-up.

Lack of version control.

No clear policy on document formatting (this could make migration difficult).

Lack of clear classification schemes.

Lack of retention planning - in most cases the current practice is to keep everything ‘forever’.

Lack of security to ensure the integrity and authenticity of documents.

6. Recommendations

6.1 Find a way to make system owners aware of long-term archiving requirements. Raise awareness of CERN-specific issues at the highest level.

6.2 Establish a cross-functional group (the new Working Group) to advise on further requirements and to assist with and monitor the implementation of agreed policy. This Working Group should include at least some of the persons responsible for systems named in this report. It would be beneficial to find an outside speaker who has practical experience in this field to come and address the (potential) new Working Group members at an early stage.

6.3 The WG has worked largely in isolation so far. Before finalising any policy, the new Working Group should visit and learn more about solutions already implemented elsewhere. An experienced consultant should be invited to CERN to give an external viewpoint. Additional funding should be provided for these purposes.

6.4 The new Working Group should carry out pilot studies to develop the tools necessary for the practical implementation of an electronic archiving policy. It should also carry out a consultation exercise to ensure the suggested operational circular on electronic records management is of practical value to CERN. It should gather information on requirements from end-users, and from other bodies within CERN who see the bigger picture, e.g. Directors, Divisional Secretaries, Council, Legal group, Audit group.

6.5 Agree a final text for the operational circular and present it to the Standing Concertation Committee