David Cook, a Canadian librarian talked about harvesting of records from DSpace into Koha at KohaCon13. He originally worked with a custom PHP and perl scripts to pull from DSpace database directly. Now he is working only in Perl and can this script can be used to harvest from other repositories, not just DSpace.
The custom script requires adding new metadata field into DSpace records that indicates if it was pulled or not. If it is has not been pulled, it pulls dublin core format then sends marcxml to Perl script which marks the DSpace record as pushed and it is pushed into Koha. The Perl script first checks to see if it exists in Koha and ignores it if it already exists in Koha.
- Pro:powerful metadata transformation
- Con: hardcoded and is inflexible
- Con:require access to DSpace database
- Con: scripts not packaged with Koha
- Con: ONLY Additions are processed, not changed DSpace records.
He switched to OAI-PMH (Open Archives Intitiative Protocol for Metadata Harvesting) which uses 6 verbs/requests to harvest metadata: Identify/ListSets/ListMetadataFormats/ListIdentifiers/GetRecord/ListRecords. The ListRecords is the command that David uses in his work to harvest.
First thing to do is Generate a status for the record and if the record is marked Add-the record is checked to see if it has already been added to Koha. If it is set to Ignore, it is ignored, the Update status will modify the bib in Koha and a delete status will be marked for deletion in Koha.
There are some issues with “translating” or crosswalking between metadata standards, specifically Dublin Core to Marc21. Qualified Dublin Core is native to DSpace and is the best option for harvesting into Koha. You can write your own XSL to facilitate the translation for other metadata standards. After the transformation, the record is passed to Koha and the “from” data in OAI-PMH Repository configuration is set. The “from” date allows for selective harvesting and eliminates the need for full harvests and loads.