The Hierarchical File Server (HFS) provides methods of storing data from a wide variety of sources, and for many different purposes. The HFS is a valuable University resource, with a finite, though large, capacity. The HFS Archive Policy is intended to ensure that the HFS Archive facility is used to best effect.
One of the goals in acquiring the Hierarchical File Server, in 1997, was to provide economical long-term storage of digital materials. At the beginning of 2013 the HFS Archive held over 80 terabytes of data, mainly from research and digital imaging projects. Over the past ten years the use of computers and the rate at which digital data is produced has changed substantially. Most, if not all, research projects have significant digital outputs whether large-scale data from remote sensors through to the digital materials collected in the making of a book. The HFS Archive Policy should, for example, now reflect the view that the writing of data to CD/DVD-ROM as a long-term data preservation strategy is unsustainable.
The HFS provides both backup and archiving services. The essential difference between archiving and backup is that the existence of an archive file is independent of the existence of the file from which it was copied; a backup copy is dependent on the existence of the file in the local filestore, and if the file disappears from the local filestore, its copies will, after a delay, disappear from the backup filestore.
There are other differences that make some data suitable or unsuitable for the HFS Archive Service - a simple table of what constitutes suitable candidate data for archive is listed for quick reference.
2. Data Storage and Curation
Archiving is the process of transferring information of value into a distinct repository to ensure its long-term safe-keeping (and therefore also accessibility). All archives, whether physical or digital, have policies which dictate the selection, retention and deletion of materials. Discrimination in the receipt and retention of material is always required since the cost of retention is never zero (and indeed is likely to increase over the longer-term as various preservation actions are initiated). The University is therefore wary of "just in case" archiving or of attempts to use archives as an extension to day to day file storage.
The HFS Archive policy is intended to encourage uptake of the Archive service for the long-term storage of data considered to be of value to the University as whole, or likely to be of value to our successors. Therefore, the decision on whether to make use of the HFS Archive should be based on qualitative rather than quantitative judgements (i.e. the value rather than simply the amount of data). Since, in practical terms, it is impossible to state with any firm conviction that a given data set will remain of value to the University for ever, it is important that the Archive incorporates retention and deletion policies.
The HFS Archive provides long-term file storage. The HFS Team has the expertise to maintain and develop the storage infrastructure, migrating data from one medium to another as required. However, the HFS Team does not provide a data curation service. The documenting and management of the data content is the responsibility of the Data Curator. Every dataset lodged with the HFS Archive must have a Data Curator, the contact details of whom must be kept up to date. The Data Curator is responsible for submitting data to the Archive; for ensuring the data is documented to agreed standards; and for reviewing on a regular basis with the HFS Team the need to retain or to delete the data. The HFS Team expects to work in collaboration with domain experts and OULS to ensure that good practice in data curation is implemented for all datasets submitted to the Archive, and that consequently the HFS Archive service continues to offer value for money in the long-term storage of the University's digital assets.
3. HFS Archive Policy
This Policy should be read in conjunction with the HFS Service Level Description.
Preserving files in the HFS Archive represents a major investment of University resources. As such it should be subject to proper control and its usage should be open to scrutiny by appropriate University bodies.
The HFS Archive Policy comprises the following elements:
- Project Definition
- Conditions of Use
- Chargeable Service
- Typical Usage Examples
Data Curator - the individual or unit responsible for the management and curation of data deposited with the HFS Archive. The data curator must include details of one or more named individuals and these details must be kept current.
Data Documentation - a set of documentation and metadata accompanying datasets deposited with the HFS Archive, intended to provide sufficient information about the provenance and format of the data to help ensure it's longer-term value and to assist in any decisions regarding data retention or deletion.
Data Provider - the Department or other recognised unit of the University that can be said to own the data.
Project - a discrete set of activities giving rise to a collection of digital objects deposited with the HFS. Once a project is defined, data may be added to (or deleted from) the collection periodically, though it is expected that a project has start and end dates.
The HFS Archive service is available to Senior Members, Staff, and Postgraduates. Applications from Postgraduate students should be sponsored by a Senior Member. A Postgraduate student cannot be named as the data curator.
3.3. Project Definition
The HFS Archive contains projects. For the purposes of this policy a Project is defined as a discrete, finite activity arising from which are one or more datasets. Projects in this context need not be related to research, externally-funded, nor involve teams. Projects do have start and end dates, and project data may similarly comprise a bounded collection of digital objects (including related sub-collections).
Each HFS Archive 'project' is allocated a maximum quota of 4 TB. Usage above this quota is subject to cost-recovery charging. The 4 TB limit reflects the total amount of data that may be stored on a single HFS tape (data is replicated on three tapes: online, local secure location, remote secure location).
3.5. Conditions of Use
The following requirements apply to the use of the HFS Archive service:
- The data submitted to the HFS Archive should have demonstrable future value to the University or those on whose behalf the University operates (e.g. unique, expensive to reproduce or obtain, subject to grant or statutory conditions). OUCS expects the data provider (e.g. Department or other recognised University unit) to assess the intellectual value of the data.
- An application to open or close a Project must be made to the HFS via the route specified in the HFS Service Level Definition. The application should include, amongst other things: details of the projected use of the Archive, including initial dataset size (and number of files); estimated overall file storage quota required; frequency and size of submissions; indicative Project lifespan.
- A Data Provider wishing to open a Project likely to require in excess of the 4 TB quota over its lifetime must consult with the HFS Team prior to making an application. Projects exceeding the 4 TB quota are subject to cost-recovery charging. Departments should ensure that applications to external funding bodies include, where applicable, a clear element for the long-term storage of data.
- Each HFS Archive project must have a Data Curator together with the contact details of one or more named individuals. It is the responsibility of the Data Provider to ensure the HFS has current details.
- Projects comprise both deposited data and Data Documentation. Each discrete data collection must be accompanied by Data Documentation. The submission and maintenance of Data Documentation is the responsibility of the Data Curator. The precise form that this documentation should take is not mandated. However, Data Curators are urged to seek advice from appropriate domain experts, whether experts within subject-based data archives or the relevant sections of OULS.
- Projects are created with an expected lifetime of five years in the first instance and the archive project expiry date will be set to reflect this. However, Data Curators are responsible for reviewing their respective projects on an annual basis. The HFS requires 5-yearly confirmation from Data Curators that a project and its corresponding data collections should be retained in the HFS Archive. In such cases an application for continued storage must be made at this anniversary. Please be aware that charges may become applicable at this, and subsequent, anniversaries. Data Curators are responsible for managing the data, including the addition and deletion of data, and the maintenance of accompanying Data Documentation.
- No guarantee can be given by the University that data will be retained beyond the current expiry data of any given project. Therefore, no guarantee of data preservation beyond this date should be communicated to third parties.
- The use of the HFS Archive is subject to compliance with the Client Responsibilities described in the Service Level Definition for the HFS.
4. Cost-Recovery Charging
The hardware infrastructure underpinning any reliable Archive service must be of high quality, be available 24/7, be scalable and have a degree of redundancy that avoids service disruption if any single element or path of elements fails. Such systems obviously entail an elevated level of expense. The University bears this expense partly via the funding of the Archive service as a core-service. and this is represented by there being no point-of-use charge for the archival storage of the first 1 TB of any approved project.
Increasingly, however, this level of 'free' storage is not adequate with some projects generating many terabytes of data requiring archival. All projects seeking funding should now include a defined element for long-term storage of any data and as a result the following charging models have been developed, based on the Full Economic Costs (FEC) model, to reflect the costs of large-scale, long-term archival of data within the HFS.
- For projects wholly or largely funded by bodies external to the University;
- For projects wholly or largely funded by sources internal to the University.
Charges are incurred at the TB boundary and in advance. A purchase order should be raised for "the archival of N TB of data for X years with the HFS", include the Project Name and be sent to OUCS. The HFS will monitor actual occupancy on a monthly basis and report this to the designated contact email address for the project. Additional purchases can be made for increased storage and applied on a yearly basis.
The charging models cover staff costs (management, systems administration, and user support), as well as the costs of media, hardware and software maintenance licensing and support - all of which, with the exception of the client software licence, may be seen to increase generally in line with the amount of data stored. Media costs represent a significant element of the storage costs and have a reasonable lifetime of five years before the data should be rewritten. Charges are incurred at the TB boundary as the tape media have a capacity of 1TB. Thus it clearly requires the same number of tape volumes (2) to store either 1.1TB or 2TB. Three copies of data are written to separate tape volumes, one of which is stored online for ready data retrieval, one volume is stored on-site in a secure firesafe and one stored at a secure site outside Oxford. It is hoped that this represents an affordable yet high quality, reliable archive service.
The charges are reviewed on an annual basis and are documented as a Premium Service in the HFS Service Level Description.
5. Example HFS Archive Projects
The following are typical examples of how the HFS Archive service is used to support the management of research data.
- A collection of almost 400 electronic literary compositions derived from sources from ancient Mesopotamia (modern Iraq) had been stored on two standalone Mac desktop machines within the University. The data was unique and amounted to just 4GB in total and was clearly a candidate for archiving with the HFS.
- A large project analysing a time series of satellite imagery to describe environmental conditions and predict the spread of borne diseases was funded internally by the University. The source data was downloaded from an institution in the US over an unreliable link (the host systems in the US were not always available), yet these source data were a start point for a number of labour- and time-intensive analyses which produced further large data sets. Both source and derived data were vulnerable stored on one data server. The project now stores close to 1TB of data with the HFS.
- A study into the ways in which networks of brain cells interact to influence behaviour is part-funded by the MRC. The study involves the production of a number of electrophysiological recordings - large image files that are intrinsic to the project yet due to their size (6TB in total) cannot be stored locally. As data is accrued by this project it is archived with the HFS on a rolling chargeable basis, costed per TB per five years.