The Evolving Archive: How Born-Digital Assets and AI are Changing the Landscape for the AMNH Research Library

by Jennifer Cwiok on

Library News

The American Museum of Natural History at the corner of 77th St. and Central Park West, photographer unknown. The American Museum of Natural History at the corner of 77th St. and Central Park West, photographer unknown.
The following article was developed to support an upcoming presentation that Jen Cwiok and Kendra Meyer will give at the 2022 Henry Stewart DAM and Museums conference. A version of the article was originally published on LinkedIn. 

Building a Born-Digital Archive Program with our DAMS

One of the more personally satisfying outcomes of implementing a Digital Asset Management system (DAMS) at the American Museum of Natural History (AMNH) Research Library has been establishing a born-digital archives collection program. The Archives of the AMNH Research Library are expansive, capturing our 150+ year institutional history. Because the Library was founded with the establishment of our institution in 1869, we are truly the Museum’s memory keeper. Our archival collections include a wide variety of formats, including over a million photographic negatives, films, field books, correspondence, manuscript collections, art, memorabilia, and institutional records.

Contrary to the trope of dusty attics and basements, an archive’s ultimate purpose is to be accessed. Over time the AMNH Library has systematically digitized selected material from these physical collections, with particular attention to visual material such as images and films. These surrogates of analog material substantially became what we called ‘Digital Special Collections’, our former image database. But what about born-digital materials – those which originate in digital format and have no physical source or surrogate? One might reasonably argue that an increased number of archival records will present themselves in this format in the future. Were we prepared for this? The answer was no, and we knew it.

This is not to say there was no born-digital content in our collections at all. Any images created by the AMNH Photo Studio in recent years were born-digital and many were and added to our image database, and some digital content existed in our pending accessions and backlog (one guess why these are in the backlog!). But with no viable management, storage, and preservation solution, we could not begin to address this material in good conscience, so we simply didn’t.

Enter the DAMS - in 2019, the AMNH Library received generous grant funding in the form of the Shelby White and Leon Levy Archive Initiative, allowing for a series of transformative projects, including the adoption of the Library's first Digital Asset Management system. The expanded services and capabilities supported by a DAMS would necessitate expanded direction around policy, workflows, standards, permissions. And besides these foundational decisions determining how the DAMS would be governed, we needed to wrap our heads around anticipated collection growth. With the facility to incorporate born-digital collections looming on the horizon, we had to also strategize how precisely we would manage them. Knowing that a DAMS would make these collections possible, how could the DAMS work with our entire suite of tools and services to actually support them?

Although digital assets have additional preservation and technical needs, in essence, they are just another format for archivists to manage: nitrate film, acetate film, paper, digital. Theoretical archival tenets such as original order, description and arrangement would equally apply to this format. And just as there exists a common and ridiculous misconception that digitization is the answer to all archival problems, it is similarly tempting to fall into the mindset that a DAMS would be a single miracle cure. In reality, it is just another tool. Along with our existing suite of collection management and discovery applications, we needed to learn how best to use this tool to do our work.

We use the open-source system ArchivesSpace for our Archival Collection management, and it was determined that this would continue to be the catalog or database of record for our growing digital collections. So, while assets are described in the DAMS, the collections – their provenance, their arrangement, their context are done in Archives Space. However, unlike the storage of a 5-inch document boxed collection on a shelf, the DAMS can both provide descriptive data and access to content along with storage. Why not rely on the DAMS for all descriptions? It was critical to clearly define the purpose of each tool, what assets and data would be held in each, where the responsibility for maintaining those materials and data is, and how we would share it. Stumbling blocks popped up for each stage of the archival workflow. How will we classify this format? Would it be accessioned like analog material? Can we create a backlog, or do we have the resources to process collections immediately? What about hybrid collections made up of both physical and digital assets? In our institution, we have a clear division of labor for these – who would be responsible? Lastly, in the future when our existing collections subtly shift format, how can we continue accruals?

There is no right or wrong answer to these questions, and we are continually growing, but at the heart of it, the DAMS is just one link in the chain, albeit a critical one. This facet of thinking will help as we hope to someday integrate with other departments in the Science Divisions and across the Museum.

Using AI to Connect Collections Data to Digital Assets

As we expand the archival collection to include born-digital assets, we are also scaling the DAMS to meet the requirements of the Scientific Divisions. The photographic collection consists of both digitized and born-digital photographs of specimens and artifacts within the AMNH scientific collections. These photographs serve dual purposes: surrogates for the artifacts and specimens and archival photographs which make up the archival photographic collection. Traditionally, these assets have been cataloged with data essential in managing the assets within the photographic archival collection, but not as surrogates of artifacts and specimens. This means the data pertaining to the specimen or artifact is often incomplete or suspect because it did not originate from an authoritative source like a CMS (Collection Management System).

The missing link in this equation is a meaningful connection between the various collection management systems at play and the legacy databases we’ve used to catalog digital assets. This resulted in overburdening digital asset records to capture as much collection data as possible. Without an institutional DAMS, collections databases are also stretched doing double duty: representing data pertaining to the collection item and storing a high-resolution digital surrogate of the item. The DAMS alleviates trying to force the CMS to manage digital assets and has forced us to reconsider our cataloging practices when it comes to photographs of artifacts and specimens.

Note the differences in the metadata between the two examples below:

Image in the DAMS with photographic metadata only assigned to the asset
Image in the DAMS with photographic metadata only assigned to the asset


Image in the Anthropology collection database with object metadata assigned to the asset
Image in the Anthropology collection database with object metadata assigned to the asset

To remediate this problem, we’ve implemented a metadata schema loosely based on VRA Core (Visual Resources Association) where data pertaining to the artifact or specimen is captured in one panel and metadata corresponding to the digitized or born-digital asset is captured in another. Setting up our metadata in this way alleviates confusion of what is being cataloged: the photograph itself versus the artifact or specimen represented in the photograph. However, the task of re-cataloging these assets is insurmountable, but we see artificial intelligence (AI) as a powerful tool in automating this process and connecting digital assets to collections data.

Image records for digitized and born-digital assets may contain a Department Catalog Number, the unique identifier assigned to an artifact or specimen by the scientific departments and may be used to connect the image to the item in the scientific department's CMS. Too often, the identifiers that this valuable connection hinges upon are not available. We propose that AI can be used to find matches between digital photographs of collection items whether the Department Catalog Number is in the metadata or not. At first, we proposed AI could be the solution for both specimens and artifacts, but we quickly realized that photographs of specimens are a hard nut to crack. AI may be used to identify the specimen, but is it the specimen in the AMNH scientific collection? This realization narrowed the scope to using photographs of artifacts within the Division of Anthropology’s collection.

1st Pass Objectives Using AI to Connect Digital Assets to Collections Data

For digital asset records where we have the Department Catalog Number, AI can be used to find matches in the Anthropology CMS, but where we lack Department Catalog Numbers in the metadata, we propose that AI using image recognition would also be able to find matches allowing us to integrate authoritative collections metadata into the digital asset records in the DAMS.

2nd Pass Objectives Using AI to Identify Collection Artifacts within Archival Field Photographs

Items within the Division of Anthropology collections are a result of expeditions throughout the Museum's history. AI image recognition can identify artifacts within an archival field photograph. Adding the Department Catalog Number to the metadata in the record will enable a connection between the archival field photograph and the collection record describing the artifact.

An example of field photographs from the Jesup North Pacific Expedition (1897-1902) and the related items in the Division of Anthropology’s collection
An example of field photographs from the Jesup North Pacific Expedition (1897-1902) and the related items in the Division of Anthropology’s collection.

The implementation of an institutional DAMS has catapulted the AMNH Archive into a new era. No longer are we restricted to focusing on analog photographs of the Museum’s past, but we are now able to incorporate current born-digital assets allowing us to create a living breathing archive documenting all aspects of the Museum. Relating digital assets with their appropriate collections is equally as important as providing access through the DAMS. AI will be an important cornerstone in creating these data integrations between digital assets and collections data; making connections to tell the Museum’s story and helping write its future.

This entry was written by Jennifer Cwiok, Digital Systems Librarian and Kendra Meyer, Shelby White & Leon Levy Digital Archivist.