Digital Preservation Policy

 

  1. Purpose
  2. Objectives
  3. Relation to Institutional Mission
  4. Guiding Principals
  5. Challenges
  6. Categories of Commitment
  7. Scope
  8. Collection Development Priorities
  9. Levels of Preservation
  10. Roles and Responsibilities
  11. Policy Review Cycle
  12. Appendix A: Glossary/Definitions
  13. Appendix B: Recommended File Formats
  14. Appendix C: DAM Storage Architectures
  15. Appendix D: Sources/References

Last revision January 2021 

Purpose

The Library of the American Museum of Natural History was established in 1869 with the founding of the Museum and has grown to be one of the largest natural history libraries in the world, with topics spanning the full range of the natural sciences (except botany). The primary function of the AMNH Library is to serve and support the work of the Museum's scientific staff. The Library also serves scholars in natural history from around the world, as well as interested members of the general public. The holdings are comprised of general reference collections, special collections, and digital collections. (AMNH, About the Library)

To provide these services and preserve these collections, the Library has continually kept current with technological advances in archives and library services and conservation. Digitized and born-digital content is a growing component of the Library collection, and this digital preservation policy and program will address the method by which we will secure an increased understanding of the content and formats in the collection and the means and measures by which we will retain this content and ensure the ongoing use of the digital assets in the collection.

Objectives

Digital preservation includes a set of actions, workflows, and tools to ensure the long-term continuous access to authentic secure content in digital formats. Digital material is at a particular risk due to a variety of factors including but not limited to access dependency on external software and systems, media failure, and frequently changing file formats and packaging. This policy aims to define and communicate the extent of the AMNH Library’s commitment to sustaining digital preservation by providing a structure to:

  • Develop and communicate the AMNH Library’s commitment to digital preservation for their community of users, including donors, Library staff, AMNH Scientific staff and other researchers.
  • Identify and commit to a set of community-accepted standards for preservation activity that will be adhered to through the actions of the preservation program at the AMNH Library, including acquisition, preservation, storage, description, and access.
  • Identify the types of digital material that will be preserved and minimum levels of preservation for all material in the repository (bit-level preservation/redundancy).
  • Provide guidelines to recognize the need for advanced levels of preservation activity where applicable.
  • Establish workflows for all stages of preservation activity based upon recognized standards and norms.
  • Define the roles and responsibilities of those who engage with digital assets, including a commitment to maintaining in-house stewardship for digital assets as a function of the Library.

Relation to Institutional Mission

The preservation of digitally formatted content for future access, research and interpretation is a critical component in successfully fulfilling both the institutional and Library research and educational missions. The mission statement of the American Museum of Natural History is “To discover, interpret, and disseminate -- through scientific research and education -- knowledge about human cultures, the natural world, and the universe.” (AMNH, Mission statement) The Research Library’s own mission also supports this institutional mandate, specifically through the maintenance and preservation of materials: “The AMNH Library is a unique and extensive collection of natural science books, journals, archives, photographs, moving images, art and Museum memorabilia. The Library's mission is to foster intellectual growth and support the research, teaching, and educational activities of the Museum. The Library fulfills its mission by acquiring, organizing, preserving, and making available collections of scholarly materials in all formats to Museum staff, students, the wider scientific community and the public.” (AMNH, About the Library)

Guiding Principles

  • Comply with the Open Archival Information System (OAIS) reference model standard and work toward conforming to certification requirements for ISO Standard 16363 for Trusted Digital Repositories.
  • Establish and maintain local procedures to meet archival requirements pertaining to provenance, chain of custody, authenticity and integrity of content.
  • Comply with intellectual property, copyright, and ownership rights for preservation of and access to all content.
  • Capture and maintain high-quality and useful standards-based metadata of all types, including descriptive, technical and preservation metadata.
  • Adhere to prevailing standards for preserving access to digital content of long-term value so that it remains accessible, meaningful, and readable.
  • Commit to an interoperable, reliable, and scalable digital archive with appropriate storage management for content.
  • Establish adequate and secure backup and disaster recovery safeguards and seek to monitor threats to the accessibility of digital content.
  • Participate in consortia and collaborative digital preservation solutions when they are a good use of Libraries resources and contribute back to the community whenever possible.
  • Partner with parties within the institution and external to the institution to support the goals of the Digital Preservation Policy, to further develop our digital preservation system, and to serve the collective desire to preserve digital content.
  • Consider the preservation implications of any systems designed or implemented to manage digital content and allocate adequate resources and infrastructure for sustained digital preservation, acknowledging the need for elasticity based upon content and resources.
  • Define and regularly review a sustainability plan that ensures the cost-effective, transparent, and auditable management of the digital archive over time.
  • Maintain all system and program components including hardware, software and storage media to be current with best practices, standards, and security needs.
  • Commit to a reliable and scalable digital archive while retaining an awareness of the cost-prohibitive-ness of digital preservation maintenance. Commit to using open-source technology whenever available or possible.
  • Commit to maintaining an infrastructure and staff to manage the archive.

Challenges

  • Perception and training—Although we have been working with digitized assets for some time, this is the AMNH Library’s first effort to codify and document a commitment to digital collection management and preservation. As such, it is unfamiliar territory and will require continued flexibility, change, increased staffing, and staff training, both for the Library staff as well as other AMNH Departmental stakeholders and users.
  • Technological change—Developing a program that will be reflective and receptive to technological change in asset formats as well as tools to effectively capture, store, describe, preserve, and share the material.
  • Sustainability—Instituting a program that is robust but realistically achievable given the staff and resources at hand both currently and in the future.
  • Balance—Both the preservation of and access to resources is at the heart of the AMNH Library’s mission. While cognizant of access issues, the program will strive to balance this with the integrity of the digital object maintenance.

Categories of Commitment

  • Born digital materials – Rigorous effort will be made to ensure preservation in perpetuity of material selected for preservation. This effort will include preservation strategies such as: migration, emulation, and geographically distributed and redundant bit-level replication. Digitized materials with no available analog versions will be treated with the same care as born digital materials.
  • Digitized materials (available analog version) – In most cases the analog version of these materials will be considered the preservation format, and preservation activities for the digital materials will be limited to local and non-distributed bit-level replication. Whenever possible, digitized materials will be created using file formats conducive to long-term preservation activities. The cost of re-digitizing materials should be weighed against the cost of long-term preservation. In cases where the analog carriers of information are obsolete or at great risk of obsolescence, such as audio/visual materials, the digitized materials will be considered the preservation copy and treated with the same care as born digital materials.
  • Commercially available digital resources – Due to copyright and other licensing issues, the AMNH Library will make no effort to preserve commercially available digital resources, including published content as well as proprietary software. To the extent that it is possible, the RAC will attempt to preserve archival data stored within proprietary systems.
  • Legacy digital materials – Legacy digital materials are materials acquired by the AMNH Library prior to the development of this digital preservation system. Many of these materials were not officially appraised, accessioned, or evaluated for long-term value. Some of these materials are stored on obsolete media, encoded in obsolete file systems or formats, or are otherwise inaccessible. When possible, we will attempt to recover this data and evaluate it for inclusion into the digital preservation system. However, the AMNH Library makes no guarantee that recovery will be successful or that it will be able to provide the resources necessary to attempt recovery.

Scope

This policy addresses the preservation of digital material for which the American Museum of Natural History Library is the primary custodian. This includes both assets of digital format that represent material with an analog counterpart and assets that were created digitally and have no analog counterpart (born-digital):

  • Digital resources created by the AMNH Library which are deemed to have long-term archival value.
  • Digital surrogates created by the AMNH Library (either internally or by an outside vendor) which are deemed to have long-term archival value.
  • Unique digital material acquired by the AMNH Library (through gift or purchase) as part of an archival or manuscript collection which is deemed to have long-term archival value and is not likely to be held or preserved anywhere else.
  • Note on material that is commercially available or developed and maintained through a consortia agreement: the AMNH Library has a responsibility to ensure that appropriate preservation activity is performed by someone (possibly but not necessarily AMNH Library) to ensure ongoing access to this material by the Library users and staff.

Collection Development Priorities

  • Material that is of a born-digital format will be selected for collection retention along with and according to the same Collection Development policy as all AMNH Library and Special Collections material.
  • Material that is digitized from an analog source will be selected based on several factors, including:
    • Copyright status
    • Significance
    • Current or potential use
    • Physical condition
    • Uniqueness
  • Digitization activities will fall into one the following categories:
    • AMNH Library digitization projects. These are comprised of material in existing collections and are completed as part of the Department’s general collection management activities. Digitization is completed either internally or in some cases by an outside vendor. These projects are not subject to specific deadlines.
    • Specially funded digitization projects made possible through grants or gifts. These may involve additional short-term staff and will be subject to specific deadlines and special project parameters.
    • Digitization based on user requests of materials that are rare or unique, fit the selection criteria, and receive regular use. This includes material requested by researchers in the archives, by faculty for teaching purposes, and by users through Interlibrary Loan. Although single-purpose digitization is necessary, it is not the focus of our digital collection development, and these materials will be evaluated according to the same criteria as other digitization projects.
  • Organization and metadata: Regardless of the importance of a collection or set of items, it must be organized and described before it can be digitized. Books must be cataloged. Archives and manuscript materials must be processed and have a finding aid, though item-level metadata may be created as part of the development of the digital collection. Metadata must follow community-based standards as well as all guidelines developed through the AMNH Library digital programs workflows. 

Levels of Preservation

The National Digital Stewardship Alliance (NDSA) has identified and refined 5 primary functional areas for digital preservation concerns and activity: Storage, Integrity, Control, Metadata, and Content. To evaluate a repository’s current state of digital preservation as well as provide guidance for repositories to improve preservation activities, the NDSA has developed the Levels of Digital Preservation chart.

The AMNH Library is challenged with balancing our mandate to preserve content integrity with limited resources of staff, time, and funding.  As such, we are committing to achieve a minimum of Level 1 or Level 2 for each of these categories (highlighted below) before the next policy review (December 2020). We will reassess our levels with each review of this policy and identify areas to improve.

*Note: As of November 2020 the Library is implementing a new Digital Asset Management system (Cortex.) This DAMs will allow us to better meet these goals. At some point, material in our Digital Repository DSpace will possibly be ingested into Cortex, but for now, the assets in DSpace will remain in their current location. The last column of this chart attempts to disambiguate how these different sets of material are currently treated.

NDSA's Levels of Digital Preservation chart with AMNH's compliance reflected in blue.

NDSA's Levels of Digital Preservation chart with AMNH's compliance reflected in blue.

Roles and Responsibilities

  • Creators (donors or depositors, most typically AMNH Departmental staff): The donor or depositor of the material selected for inclusion in the collection is responsible for the management of the material prior to transfer, cooperation in the successful transfer of the material as well as the provision of requested and appropriate levels of description and identification. The donor will also work with the AMNH Library archivists to discuss and complete transfer agreements, use restriction and embargos.
  • AMNH Library—Collections development (Special collections and digital archivists, Library management): Collection development activities will include communication and outreach both with users as well as potential donors and content creators/contributors. This will include explanation of material and metadata requirements, use agreements, permissions, levels of expectation for long-term sustainability, and frequency of collection (records management scheduling and processes). This group will also be responsible for creating and reviewing policy and procedure.
  • AMNH Library—Collections management (Digital archivist, Cataloging): Asset and metadata management and preservation activities will include but not be limited to acquisition, capture/transfer, copying, processing, long-term preservation, storage, disposal, description and restriction/access. Although there will be a minimal level of preservation for all resources, the activity will be material driven and flexible based on the material.
  • AMNH Library—Systems management (Digital Systems Librarian): Systems integrations and performance activities will ensure the effective maintenance and functionality of the various components and tools used to provide preservation of and a robust user experience of our digital assets.
  • AMNH Library—Reference (Reference services): Reference responsibilities include the use and promotion of the digital collections and the tools and systems in place as well as the effective communication to appropriate team members of any issues in service, metadata, or requests.
  • AMNH Information Technology (IT): IT will be responsible for technological and system-wide support as well as the provision of a multi-level storage solutions as repository for the digital collections. IT will also coordinate with the Digital Archivist to facilitate and identify enterprise material capture, transfer and preservation (staff accounts and profiles, email).
  • External partners and Collaborators: The AMNH Library will be partnering with outside services to provide additional support for digital preservation activities, including the design, implementation and administration of a proprietary Digital Asset Management System. The roles and responsibilities of these partners are contractually defined and may vary over time. The AMNH Library is also a founding member of and provides digital content to the Biodiversity Heritage Library. Although it is trusted that BHL provides adequate preservation activities, this policy does not cover the content shared with BHL.
  • Consumers (AMNH staff and volunteers, external users): The users are responsible for communicating access preferences, issues and concerns.

Policy Review Cycle

This policy will be reviewed annually to assure regular updates based on advancements in technology, preservation strategies and tools, and as the AMNH Library further refines and strengthens their preservation workflows and capabilities. The next scheduled policy review will be December 202!.

Version history: v.1 12/19; v.2 1/21

Appendix A: Glossary/Definitions 

[from National Digital Stewardship Alliance (NDSA) glossary  and Digital POWRR Workshop - Glossary of Digital Preservation Terms]

Accession: The processes of receiving, preparing, cataloguing and storing digital resources in a form suitable for digital preservation. See also “Ingest.”

Authenticity: A mechanical characteristic of any digital object that reflects the degree of trustworthiness in the object, in that the supportive metadata accompanying the object makes it clear that the possessed object is what it purports to be. 

Backup: Additional copies of a digital asset made to protect against loss due to unintended destruction or corruption of the primary set of digital assets. The essential attribute of a back-up copy is that the information it contains can be restored in the event that access to the master copy is lost.

Bit Preservation: A baseline preservation approach that ensures the integrity of digital objects and associated metadata over time in their original form, even as the physical storage media which houses them evolves and changes.

Checksum: An algorithmically computed numeric value for a file or a set of files used to validate the state and content of the file for the purpose of detecting accidental errors that may have been introduced during its transmission or storage. The integrity of the data can be checked at any later time by recomputing the checksum and comparing it with the stored one. If the checksums match, the data was almost certainly not altered. Sƒee also “Fixity Check.”  

Description: The process of capturing, analyzing, organizing and recording information that serves to identify, manage, locate and explain data resources and the contexts that produced them. See also “Metadata.” 

Derivative: A transformed version of an original source file, often called a "service," "access," "delivery," "viewing" or "output" file, used to facilitate access to or additional use of the content. 

Digital Preservation: The series of managed activities, policies, strategies and actions to ensure the accurate rendering of digital content for as long as necessary, regardless of the challenges of media failure and technological change. 

File Format: Packages of information that can be stored as data files consisting of a fixed byte serialized encoding of a specified information model, and/or a fixed encoding of that encoding in a tangible form on a physical storage structure.

Fixity Check: A mechanism to verify that a digital object has not been altered in an undocumented manner. Checksums, message digests and digital signatures are examples of tools to run fixity checks. Fixity information, the information created by these fixity checks, provides evidence for the integrity and authenticity of the digital objects and are essential to enabling trust. See also “Checksum” and “Digital Signature.”

Format Migration: A means of overcoming technical obsolescence by preserving digital content in a succession of current formats or in the original format that is transformed into the current format for presentation. The purpose of format migration is to preserve the digital objects and to retain the ability for clients to retrieve, display, and otherwise use them in the face of constantly changing technology.

Ingest: The process through which digital objects are added into a managed environment

Instance: Any particular instantiation of a digital file, object or collection.

Life Cycle: A set of iterative, modular processes that govern the creation, acquisition, selection, description, sustainability, access and preservation of digital content over time.

Metadata - Administrative: Administrative metadata comprises both technical and preservation metadata and is generally used for internal management of digital resources

Metadata - Descriptive: Metadata that identifies a resource and describes its intellectual content for purposes such as discovery and identification

Metadata - Preservation: The contextual information necessary to carry out, document, and evaluate the processes that support the long term retention and accessibility of digital content. Preservation metadata documents the technical processes associated with preservation, specifies rights management information, establishes the authenticity of digital content, and records the chain of custody and provenance for a digital object.

Metadata - Rights Management: Administrative metadata that indicates the copyrights, user restrictions, and license agreements that might constrain the end use of digital content (including metadata files).

Metadata - Structural: Metadata used to describe the logical or physical types, versions, relationships or other characteristics of content files comprising a complex digital object.

Metadata - Technical: Metadata that describes the technical state of and process used to create a file. Often closely related either to its file format or the original software used to create the file, e.g. scanning equipment and settings used to create or modify a digital object.

Open Archival Information System (OAIS) Reference Model: Developed by the Consultative Committee on Space Data, a conceptual framework and reference tool for defining a digital repository. It provides a model of the environment, functions, and data types for implementing a digital repository.

Permissions: The access available to system users attached to specific roles in a computing environment, as well as the mechanism for administering access to a specific object on a computer system. Depending on the system or application, permissions can be defined for a specific user, specific groups of users, or all users; or for a role, or groups of roles; or based on one or more user attributes.

Preservation Copy: Digital content targeted for preservation that is considered the master version of the intellectual content of any arbitrary digital resource. Preservation master files may capture additional information about the original beyond the content itself. Because they are created to high capture standards, preservation master files could take the place of the original record if the original was destroyed, damaged, or not retained. Preservation masters generally do not undergo significant processing or editing. Preservation masters are often used to make other copies including reproduction and distribution copies.

Provenance: Information on the origin of a digital object and also on any changes that may have occurred over the course of its life cycle.

Verify: The process of checking a copy of a data file to make sure that it is exactly equal to the original data file, or that a file remains unchanged over time.

Appendix B: Recommended File Formats

[Adapted from Smithsonian Institution Archives Recommended Preservation Formats for Electronic Records]

Type  Primary Preservation Format (preferred)  Secondary Preservation Format (acceptable) 
Text/word processing applications 

PDF/A 

PDF 

RTF (text) 

TXT 

XML with schema 

Spreadsheet applications or structured data 

PDF/A (must capture entire workbook - macros disabled) 

PDF 

CVS 

Tab-delimeted 

TXT 

XML 

Presentations 

PDF/A 

PDF 

Original 
Images  TIFF (uncompressed) 

JPG 

DNG 

PNG 

JP2 

Graphics  TIFF  PDF 
Video  .dv, MOV, AVI  MPEG-4 
Audio  BWF-Broadcast WAV (.wav is the extension) 

WAV 

AIFF 

FLAC 

Email messages/account  tbd  tbd 
Database Management Systems (DBMS)  Keep original  XML with schema 
Web archived material  tbd 

tbd 

 

Appendix C: DAM Storage Architecture

Cortex architecture diagram with containers.
Cortex architecture diagram with containers.
Courtesy of Orange Logic
Cortex architecture diagram illustrating how user and API clients interface with primary and backup servers.
Cortex architecture diagram illustrating how user and API clients interface with the servers with primary and backup servers.
Courtesy of Orange Logic

Appendix D: Sources/References