{{Short description|Protocol to harvest metadata}} [[File:OAI-PMH.jpg | thumb | right | Logo of Open Archive Initiative]] The [[Open Archives Initiative]] '''Protocol for Metadata Harvesting''' ('''OAI-PMH''') is a protocol developed for [[data harvesting|harvest]]ing [[metadata (computing)|metadata]] descriptions of records in an archive so that services can be built using metadata from many archives. An [[implementation]] of OAI-PMH must support representing metadata in [[Dublin Core]], but may also support additional representations.<ref name=":0">{{Cite journal |last=Lynch |first=Clifford A. |date=August 2001 |title=Metadata harvesting and the Open Archives Initiative |url=https://www.cni.org/wp-content/uploads/2001/08/Metadata-Harvesting-and-the-Open-Archives-Initiative.pdf |journal=ARL: A Bimonthly Report |issue=217 |archive-url=https://web.archive.org/web/20120525000130/http://www.arl.org/resources/pubs/br/br217/br217mhp.shtml |archive-date=25 May 2012}}</ref><ref name="Breeding">{{Cite journal |author=Marshall Breeding |date=September 2002 |title=Understanding the Protocol for Metadata Harvesting of the Open Archives Initiative |journal=Computers in Libraries |volume=22 |number=8 |pages=24–29 |url=https://librarytechnology.org/document/9944 |access-date=2021-02-08}}</ref>

The protocol is usually just referred to as the OAI Protocol.

OAI-PMH uses [[XML]] over [[HTTP]]. Version 2.0 of the protocol was released in 2002; the document was last updated in 2015. It has a [[Creative Commons license]] BY-SA.

==History== <!--This summary was largely drawn from (Lynch, 2001).-->

In the late 1990s, [[Herbert Van de Sompel]] ([[Ghent University]]) was working with researchers and librarians at [[Los Alamos National Laboratory]] (US) and called a meeting to address difficulties related to [[interoperability]] issues of [[Print server|e-print servers]] and [[digital library|digital repositories]]. The meeting was held in [[Santa Fe, New Mexico]], in October 1999.<ref>{{Cite journal|last=Marshall|first=E.|year=1999|title=Researchers plan free global preprint archive|url=https://www.science.org/doi/10.1126/science.286.5441.887a|journal=Science|volume=286|issue=5441|pages=887a–887|doi=10.1126/science.286.5441.887a|pmid=10577235|s2cid=178990556 |via=|url-access=subscription}}</ref> A key development from the meeting was the definition of an interface that permitted e-print servers to expose [[metadata]] for the papers it held in a structured fashion so other repositories could identify and copy papers of interest with each other. This interface/protocol was named the "Santa Fe Convention".<ref name=":0" /><ref name="Breeding"/><ref>{{Cite web |title=The Santa Fe Convention by the Open Archives Initiative |url=http://www.openarchives.org/sfc/sfc_entry.htm |date=February 15, 2000 |website=Open Archives Initiative |access-date=May 29, 2022}}</ref>

Several workshops were held in 2000 at the ACM Digital Libraries conference,<ref>{{Cite web|title=The Santa Fe Convention of the Open Archives Initiative|url=https://dspace.library.uu.nl/bitstream/handle/1874/3142/VandeSompelDLib2000SantaFe.htm?sequence=2&isAllowed=y|access-date=2021-02-10|website=dspace.library.uu.nl}}</ref> at the 1st ACM/IEEE-CS joint conference on Digital libraries<ref>{{Cite book|editor1=Edward A. Fox|editor2= Christine L. Borgman|language=en|location=Roanoke, Virginia, United States|publisher=ACM Press|volume=|pages=|doi=10.1145/379437|isbn=978-1-58113-345-5 |title=Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries |date=2001 }}</ref><ref>{{Cite book|last1=Lagoze|first1=Carl|last2=Van de Sompel|first2=Herbert|title=Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries |chapter=The open archives initiative |date=2001 |citeseerx=10.1.1.161.6800|chapter-url=http://portal.acm.org/citation.cfm?doid=379437.379449|language=en|location=Roanoke, Virginia, United States|publisher=ACM Press|volume=|pages=54–62|doi=10.1145/379437.379449|isbn=978-1-58113-345-5|s2cid=1315824 |via=}}</ref> and elsewhere to share the ideas from the Santa Fe Convention.<ref>{{Cite journal|last1=Van de Sompel|first1=Herbert|last2=Lagoze|first2=Carl|year=2000|title=The Santa Fe Convention of the Open Archives Initiative|url=http://www.dlib.org/dlib/february00/vandesompel-oai/02vandesompel-oai.html|journal=D-Lib Magazine|language=en|volume=6|issue=2|pages=|doi=10.1045/february2000-vandesompel-oai|doi-access=free|issn=1082-9873|via=}}</ref> It was discovered at the workshops that the problems faced by the e-print community were also shared by libraries, museums, journal publishers, and others who needed to share distributed resources. To address these needs, the [[Coalition for Networked Information]]<ref>{{cite web |url=http://www.cni.org/ |title=Homepage |website=Coalition for Networked Information |access-date=May 29, 2022}}</ref> and the [[Digital Library Federation]]<ref>{{cite web |url=http://www.diglib.org/ |title=Homepage |website=Digital Library Federation |access-date=May 29, 2022}}</ref> provided funding to establish an [[Open Archives Initiative]] (OAI) secretariat managed by Herbert Van de Sompel and Carl Lagoze. The OAI held a meeting at [[Cornell University]] ([[Ithaca, New York]]) in September 2000 aimed to improve the interface developed at the Santa Fe Convention.<ref>{{Cite web|title=OAi-tech Meeting, Cornell University, September 7-8 2000|url=http://www.openarchives.org/meetings/tech-Cornell/oai-tech-cornell.htm|access-date=2021-02-10|website=www.openarchives.org}}</ref> The specifications were refined over e-mail.

OAI-PMH version 1.0 was introduced to the public in January 2001 at a workshop in [[Washington D.C.]],<ref>{{Cite web|last=|first=|date=|title=The Open Archives Initiative: Open Meeting Renaissance Hotel, Washington DC January 23, 2001|url=https://www.openarchives.org/meetings/DC2001/OpenMeeting.html|access-date=2021-02-10|website=www.openarchives.org}}</ref> and another in February in [[Berlin, Germany]].<ref>{{Cite web|last=|first=|date=|title=The Open Archives Initiative: Open Meeting Staatsbibliothek zu Berlin, Germany February 26, 2001|url=https://www.openarchives.org/meetings/Berlin2001/OpenMeeting-agenda.html|access-date=2021-02-10|website=www.openarchives.org}}</ref> Subsequent modifications to the [[XML]] standard by the [[W3C]] required making minor modifications to OAI-PMH resulting in version 1.1. The current version, 2.0, was released in June 2002. It contained several technical changes and enhancements and is not backward compatible.<ref>{{Cite journal|last1=Van de Sompel|first1=Herbert|last2=Young|first2=Jeffrey A.|last3=Hickey|first3=Thomas B.|year=2003|title=Using the OAI-PMH ... Differently|url=http://www.dlib.org/dlib/july03/young/07young.html|doi-access=free|journal=D-Lib Magazine|volume=9|issue=7/8|pages=|doi=10.1045/july2003-young|issn=1082-9873|via=}}</ref>

== OAI workshops == From 2001 [[CERN]], and later in collaboration with [[University of Geneva]], has organized bi-annual OAI workshops,<ref>{{Cite web |title=Previous OAI Workshops – OAI |url=https://oai.events/previous-oai-workshops/ |access-date=2023-01-13 |website=The Geneva Workshop on Innovations in Scholarly Communication |language=en-US}}</ref> which over time have developed to cover most aspects of [[open science]]. Since 2021 the workshop series is named the Geneva Workshop on Innovations in Scholarly Communication, with the nick name OAI reflecting its origin.<ref>{{Cite web |last=Azwa |first=Adnan Siti Norfateha |title=Library Guide: Open Access Guide: The Latest on OA |url=https://umlibguides.um.edu.my/OpenAccess/OANews |access-date=2023-01-13 |website=umlibguides.um.edu.my |language=en}}</ref>

==Uses== Some commercial [[search engine]]s use OAI-PMH to acquire more resources. [[Google]] initially included support for OAI-PMH when launching sitemaps, however decided to support only the standard XML [[Sitemaps]] format in May 2008.<ref>{{cite web |url=https://googlewebmastercentral.blogspot.com/2008/04/retiring-support-for-oai-pmh-in.html |title=Retiring Support for OAI-PMH in Sitemaps |date=April 23, 2008 |website=Google Search Central Blog |access-date=May 29, 2022}}</ref> In 2004, [[Yahoo!]] acquired content from [[OAIster]] ([[University of Michigan]]) that was obtained through metadata harvesting with OAI-PMH. [[Wikimedia Foundation|Wikimedia]] uses an OAI-PMH repository to provide feeds of [[Wikipedia]] and related site updates for search engines and other bulk analysis/republishing endeavors.<ref>{{Cite web |title=Wikimedia update feed service |url=http://meta.wikimedia.org/wiki/Wikimedia_update_feed_service |publisher=Wikimedia Meta-Wiki |access-date=14 July 2013}}</ref> Especially when dealing with thousands of files being harvested every day, OAI-PMH can help in reducing the network traffic and other resource usage by doing incremental harvesting.<ref>{{cite web |url=http://www.dlxs.org/docs/13/ancil/harvest.html |title=OAI Harvesting System |website=DLXS |access-date=May 29, 2022}}</ref> NASA's [[Mercury (metadata search system)|Mercury]] metadata search system uses OAI-PMH to index thousands of metadata records from Global Change Master Directory (GCMD) every day.<ref>{{Cite journal |title=Data sharing and retrieval uses OAI-PMH |author1=R. Devarakonda |author2=G. Palanisamy |author3=J. Green |author4=B. Wilson |year=2010 |journal=Earth Science Informatics |volume=4 |issue=1 |pages=1–5 |publisher=Springer Berlin / Heidelberg |doi=10.1007/s12145-010-0073-0|s2cid=46330319 }}</ref>

The [[mod_oai]] project is using OAI-PMH to expose content to web crawlers that is accessible from [[Apache HTTP Server|Apache Web servers]].

OAI-PMH has later been applied to sharing of scientific data.<ref>{{Cite journal|last1=Devarakonda|first1=Ranjeet|last2=Palanisamy|first2=Giri|last3=Green|first3=James M.|last4=Wilson|first4=Bruce E.|year=2011|title=Data sharing and retrieval using OAI-PMH|url=http://link.springer.com/10.1007/s12145-010-0073-0|journal=Earth Science Informatics|language=en|volume=4|issue=1|pages=1–5|doi=10.1007/s12145-010-0073-0|s2cid=46330319 |issn=1865-0473|via=|url-access=subscription}}</ref>

==Software== OAI-PMH is based on a [[client–server]] architecture, in which "harvesters" request information on updated records from "repositories". Requests for data can be based on a datestamp range, and can be restricted to named sets defined by the provider. Data providers are required to provide [[XML]] metadata in [[Dublin Core]] format, and may also provide it in other XML formats.

A number of software systems support the OAI-PMH, including [[Fedora Commons|Fedora]], [[EThOS]] from the [[British Library]], [[EPrints|GNU EPrints]] from the [[University of Southampton]], [[Open Journal Systems]] from the [[Public Knowledge Project]], [[Desire2Learn]], [[DSpace]] from [[MIT]], HyperJournal from the [[University of Pisa]], Digibib from Digibis, [[MyCoRe]], [[Koha (software)|Koha]], Primo, DigiTool, Rosetta and MetaLib from [[Ex Libris Group|Ex Libris]], ArchivalWare from PTFS, DOOR <ref>{{cite web |url=https://door.sourceforge.net/index.html |title=Overview |website=DOOR |access-date=May 29, 2022}}</ref> from the eLab<ref>{{cite web |url=http://www.elearninglab.org/ |title=eLab |website=Universita della Svizzera italiana |language=Italian |access-date=May 29, 2022}}</ref> in Lugano, Switzerland, panFMP from the [[PANGAEA (data library)|PANGAEA data library]],<ref>{{cite web|url=http://www.panFMP.org/|title=PANGAEA® Framework for Metadata Portals|website=panfmp.org}}</ref> [[SimpleDL]] from Roaring Development, and jOAI from the [[National Center for Atmospheric Research]].<ref>{{Cite web |url=https://github.com/NCAR/joai-project|title=NCAR/joai-project|website=Github.com|date=31 May 2022 }}</ref>

==Archives== A number of large archives support the protocol including [[arXiv]] and the [[CERN]] Document Server.

==See also== * [[Data format management]] * [[Digital curation]] * [[Digital preservation]] * [[File format]] * [[Dublin Core]], an ISO metadata standard * [[National Digital Information Infrastructure and Preservation Program]] (NDIIPP) * [[National Digital Library Program]] (NDLP) * [[Metadata Encoding and Transmission Standard]] (METS) maintained by the Library of Congress * [[Preservation Metadata: Implementation Strategies]] (PREMIS) * [[LOCKSS]] * [[Search as a service]] * [[Web archiving]] * [[Object Reuse and Exchange]] (OAI-ORE) * [https://oai.events Geneva Workshop on Innovations in Scholarly Communication]

==References== {{Reflist|colwidth=35em}}

==External links== * [https://web.archive.org/web/20100314214736/http://oai.sdu.edu.tr/ Suleyman Demirel University Open Archives Harvester] * [http://www.openarchives.org/OAI/openarchivesprotocol.html Protocol specification] * [https://www.loc.gov/library/libarch-digital.html National Library of Congress, Digital Collections and Programs] * [http://www.digitalpreservation.gov/ Library of Congress, National Digital Information Infrastructure and Preservation Program] * [https://www.loc.gov/webcapture/ Library of Congress, Web Capture]

{{open access navbox}} [[Category:Online archives]] [[Category:Internet protocols]] [[Category:Metadata]] [[Category:Open access projects]] [[Category:Archival science]]

[[de:OAI-PMH]]