Building the OCP Digital Collections

Harvard’s Open Collections Program (OCP) works in careful collaboration with the University’s distinguished faculty, librarians, and curators to develop highly specialized "open collections," which are available to Internet users everywhere. In developing these collections, OCP produces digital objects and catalog records that are open, useful, and persistent. More detailed information may be found by following the links in the right panel to the selected best practices and technical standards OCP routinely uses.


OCP’s online collections increase the availability and use of historical resources for teaching and research. OCP works to meet research and learning needs for college students, faculty, independent researchers, and members of the public interested in historical perspectives on topics of contemporary relevance.

Through carefully developed web interfaces, users find thematically organized, easily searchable selections from Harvard’s libraries, archives, and museums. These collections do not, however, aggregate everything available at Harvard on a broad topic. In selecting materials, OCP applies common principles that guide each collection.

Broad topic ideas originate in the OCP Executive Committee, and are finalized by the OCP Content Committee, project faculty advisory committees, and project working groups of librarians, curators, and archivists with collections expertise. Criteria for assessing the value and feasibility of nominated topics are clear:

  • The subject should utilize and represent a number of Harvard collections.
  • The subject should have a broad appeal for teaching not just at Harvard, but at schools and colleges across the country and around the world.
  • The subject should utilize a wide range of materials—books, pamphlets, manuscripts, images—that represent global perspectives.
  • The subject should not be too general.
  • The subject should complement, not duplicate, the work of other digitization initiatives.


With subjects in place, OCP collection development specialists evaluate candidate items according to topic relevance and ownership (rights to digitize and distribute material). OCP identifies materials in the aggregate in order to maximize the number of Harvard holdings that can be digitized within project timelines and budgets.

In consultation with special collections conservators in the Weissman Preservation Center, OCP digital processing librarians also review candidate items for completeness and condition.

Digitization and Discovery

Project cataloging and descriptive metadata practices are designed to promote discovery of digitized items in the environments that students, teachers, and researchers use. OCP applies community standards for bibliographic description, assigns persistent links to digital objects, and stores metadata in centrally supported library systems using open protocols (MODS, OAI–PMH) to facilitate discovery in major Internet search engines, as well as in library catalogs and project databases for OCP web sites.

When supported by optical character recognition (OCR) software, machine–printed texts in a variety of languages are digitized to facilitate full–text as well as catalog searching. OCR–generated texts are not corrected to be 100% accurate transcriptions of all characters in the original materials.

Digital imaging and structural metadata practices have evolved with technologies and institutional expertise—primarily in HCL Imaging Services—to produce complete, legible, navigable, citable, and portable electronic reproductions delivered by the centrally managed delivery systems of the Harvard University Library’s Office for Information Systems. Digitization processes and practices for materials preparation and quality control balance mandates for safe handling, high rates of throughput, and affordability.