AMNH’s Hackathon Brings Cutting Edge Technology to Darwin Manuscripts

See the article in The New York Times – How Darwin Evolved

The third annual hackathon, Hack the Stacks, focused on the AMNH Library. The hackathon is a pro bono collaboration with the museum to advance open science and create open-source solutions to problems faced by scientists and researchers in its overnight 24-hour computer solution-building challenge event. Teams of computer science specialists applied their skills to develop innovative solutions for challenges identified by the AMNH Library staff to develop tools that would enrich its digital content. 

The project undertaken by the Hackathon Darwin team was to develop a computer prototype to reassemble pieces of cut or torn manuscript pages, reconstructing the notes as originally written. Darwin would arrange and classify his notes according to subjects that he was interested and engaged in at the time. He would cut or tear parts of a page of notes, reusing them by moving them to a different location, perhaps another topical portfolio, relevant to a new idea at a later time. By reassembling the pages as they were originally written, researchers hope to understand the path of Darwin’s research and the gradual maturation of the thinking underlying his theories. 

The fragments are difficult to unite by eye because the pieces can be so widely separated. Additionally, the various types of edges, cut or torn, e.g., curved, saw-tooth or fuzzy edges, with usually no distinctly singular, identifying characteristics can be quite challenging to visually be brought together as complementary edges.   

Hackathon-6: single edge (cut)
Hackathon-7: single edge (saw-tooth)

The computer experts sought to solve this challenge by trying to reassemble pieces of manuscript pages digitally. Coding and image processing were applied to differentiate the edges. The Fourier Transform, a tool that breaks waveforms into alternate representations by a series of sinusoidal functions, was used to normalize edge plots. Euclidean distances were calculated for these derived edge plot curves, closest possible matches were determined, and lastly, the top matches were identified. Their work will accomplish in months what would take years to do by hand. 

The first two matches were found by computer in January 2017. There is one page with the bottom cut off and moved 39 pages to a same genus related section within the same portfolio in notes for Power of Movement in Plants, 1880. Two fragments from a page of notes that came to reside in two different portfolios, Sexual Selection in Birds and Sexual Selection in Mammals, in Descent in Man, 1871, were brought together; the information on this page ended up in two different chapters in the published book.  

Hackathon-3: Match 1
Hackathon-4: Match 2

The second match, especially, would be very difficult, if not impossible, to identify by eye as being once intact, complementary pieces and highlights the truly exciting potential offered by this digital application developed by the Hackathon Darwin Team.

View PowerPoint presentation – Hacking Darwin

  For more information about Hack the Stacks and Team Darwin read an article in Library Journal