A. Hollier E-archiving WG/40

7 March 2000

Appendix 1 to the Fourth Report of the Electronic Archiving Working Group

Draft Operational Circular on Electronic records

 

1. Requirements for Record-keeping at CERN

CERN is responsible for ensuring that its information and documentation are adequate to meet operational needs and legal requirements, both now and in the future. Increasingly, much of this information will exist only in electronic form. This poses new challenges since the electronic recording of information in the course of a transaction does not necessarily create a valid record that can be accepted as evidence of that transaction or activity. In order to create such a record, systems and procedures must be in place that:

Long term storage and access of valid electronic records is expensive, so it is important that costs are minimised by good management. Any piece of recorded information produced in the course of CERN’s activities may be a CERN record, regardless of its medium and regardless of whether it was produced by a CERN employee. Not all of these records require long-term preservation; but for those that do, CERN must be sure that they are preserved as valid records.

2. The challenge of electronic records

Technological change means that electronic documents quickly become unreadable. In some cases it is impossible to read any of the content, in other cases formatting and structure are lost to an extent that may give a misleading impression of the record. Furthermore, the context and provenance of an electronic document are not immediately apparent, so these must be preserved deliberately if the record is to be fully understood. Electronic documents are easy to change, and these changes can be difficult to detect, leading to problems of authenticity and reliability. They are also easily deleted.

Computer systems that allow the creation of electronic documents are not necessarily true record-keeping systems. A conscious effort is required to produce valid records and to ensure their preservation. If valid records are not kept, the organization faces a number of risks including:

3. Actions

Record keeping and accountability must be built into work processes and the electronic work environment in order to ensure that information remains accessible, understandable and reliable. This requires positive action to be taken immediately.

3.1 Identify the responsible persons

The leader of each Division and Experimental collaboration will work with the Divisional Records Officer (DRO) to oversee electronic records management in their area. They will seek advice as required from centres of expertise within CERN, such as the Legal and Audit Services.

Division leaders will ensure that electronic information-keeping systems maintained by their divisions comply with the requirements outlined in this circular.

An advisory panel nominated by the Archive Committee will monitor these activities and offer assistance as required. DROs will make an annual progress report to Archive Committee.

3.2 Identify and evaluate official records

The DRO will examine the business processes and functions of his/her Division or experiment in order to understand the flow of information. He/she will identify the points at which valid records are required, and survey the information that is currently held. (See Guideline 4.1.)

3.3 Establish procedures to ensure the preservation of official records

The DRO will produce an inventory of the significant records, and a retention schedule that specifies the length of time for which each group of records must be kept and in what format. As an interim measure this may just differentiate between records selected for permanent preservation and the rest. He/she will specify the confidentiality and access requirements for each group of records. If necessary, classification plans should also be produced. Relevant staff will be trained to maintain and update these tools and to implement the procedures. (See Guideline 4.2.)

Once basic procedures have been established, a further step is to identify vital records and arrange regular back-ups to be made and stored off-site.

3.4 Establish systems to ensure the preservation of official records

The leader of each Division and Experimental collaboration will ensure that significant records in his/her area are managed in a reliable electronic records management system, and that the relevant staff members are trained in its use.

Division leaders will ensure that electronic information-keeping systems maintained by their Divisions are capable of creating valid records and preserving them indefinitely:

3.4.1 The system must preserve all the aspects of a genuine record:

Content: what the document "says"

Structure: the internal format and arrangement of the document

Context: where, when, why and by whom the record was created and its role in the activities of the Organization.

3.4.2 The system must preserve the integrity of the records; this includes:

Version control: the identification of revised documents and safeguarding of obsolete documents

Inviolability: the prevention of unauthorised access, alteration or removal of documents

Authenticity: preservation of a history of the document’s creation, transmission and use.

3.4.3 The system must preserve access to the records. There are various ways in which this may be achieved; one is to ensure that the record can be migrated without loss of essential characteristics and to carry out such migrations as required.

The key to long-term archiving of electronic records is the preservation of adequate metadata. This is the information that supplements and describes the actual content of the record. (See Guideline 4.3.) It is also advisable that open archive standards be considered in the system design. So far as possible the system should automate the process of creating and preserving a valid record, and purging records that are no longer required.

3.5 Capture of archival records

Records selected for permanent preservation must be kept securely, and the DRO will ensure a flow of metadata for all such records to the CERN archive database. This database should be capable of offering a CERN-wide search facility.

4. Guidelines

4.1 Information survey

Carry out a function-based analysis to identify the business procedures and information flow, and identify the points for which a valid record is required. Survey the information that is currently kept, both in paper and electronic form. Identify significant groups of records and for each group describe the purpose for which the records are kept, the current arrangement and reference system, the medium, the quantity and rate of growth, the office/individual of primary responsibility, and the location. Draw up a records inventory based on this information.

These procedures should be repeated and updated regularly.

4.2 Information management

Compare the inventory with the functional analysis and use as a basis for a discussion on information requirements with the information creators and users, and the responsible manager (division leader or experiment spokesperson). If necessary, draw up a classification plan for all records, identifying a hierarchy of classes of records based on the business procedures, and selecting the most suitable coding scheme to create a unique identifier for all records.

Draw up a retention schedule that identifies how long each class of records needs to be kept to meet legal, operational or historical requirements. Differentiate between the record copy and working copies. Decide what outputs should be preserved, for example whether ‘snapshots’ of databases are sufficient, and, if so, how frequently should they be produced. Define access privileges.

See the ‘Guidelines for Divisional Records Officers’ for further details on the types of records that should be considered for permanent preservation.

4.3 Metadata [to be used by the second Working Group as a basis for the production of simplified CERN-specific metadata guidelines]

No standard for metadata exists, though various models have been proposed. Currently one of the most widely accepted is based on a study carried out at Pittsburgh University on Functional Requirements for Evidence in Recordkeeping, and this is proposed as the basis for electronic archiving at CERN. Having identified the requirements, the Pittsburgh model proposes a six-layer structure of metadata that is briefly summarised below.

  1. Handle Layer - a unique identifier which identifies the data as a record and uniquely identifies the domain from which it originated (provenance) and the date and time; and descriptive information such as title, indexing terms, etc, to enable it to be retrieved.
  2. Terms & Conditions Layer - information on restrictions, if any, to access and use of the record, including retention and disposition.
  3. Structural Layer - information about the structure which will allow the record to retain its evidential value after hardware and software migrations, e.g. identification of each file in the record and of its encoding, compression or encryption. Identification of how the record appeared, e.g. application, software and hardware dependencies. Identification of the links between different files in a record. Identification of the source of the record (i.e. how the data was captured).
  4. Contextual Layer - information supporting the use of the record as evidence of a transaction. Context metadata that identifies the originator and recipient and the type of transaction; responsibility metadata that identifies the organizational unit responsible for the transaction; and system accountability metadata that certifies the procedures and systems logs
  5. Content Layer - the actual data itself.
  6. Use History Layer - how, when and by whom it was used subsequent to creation.

At least on a basic level it should be possible to interpret the record without reference to external documentation, so it is suggested that it be based on ASCII text with short descriptions of any more complex encoding. It is recommended that documents are encapsulated with their metadata to form a layered record with each layer verified by an electronic signature. The Public Record Office of Victoria, Australia offers a good model.

4.4 Technical Design Recommendations [to be established by the second Working Group]