Koha News

KohaCon13: What is after MARC???

A major bit of fun at Kohacon is learning what the great minds in Koha are thinking about. I sat in on a session where the topic related to non Marc Cataloging. Galen Charlton talked about plans he is considering for future versions (3.14 and 3.16) for adding support for non marc storage and indexing. Currently MARC stored in MARCxml and Zebra reads the MARCxml blob. But now with DOM we are simple feeding it an xml document that “happens‚Äù to be MARC but does not have to be. We could ideally feed it any type of XML data that is another form of metadata->MODS, Dublin core, EAD, etc. When you set up Koha you would tell it what types of XML metadata you plan on using. The database would hold the xml and a flag that indicates which kind of metadata it is.

We would provide it rules, for example that tell Koha how to find title in Dublin Core, how to find title in Marc, etc. etc. This works well for xml that describes a ‚Äòbibliographic’ entity, e.g. a book in MARCxml, a photo with dublin core metadata. However, an EAD record can describe a collection, including each and every piece in the ‚Äòbox’. So EAD records may need to be broken up into smaller units.

Further issues include editing these type of xml records in Koha. Finding and/or writing an editor for non-MARC records would be required. Editor plugins? And how to search for these records. (Search.pm) Something to consider-> relevance ranking. Will photos return first, or a mixed result list of books, photos, etc. Lastly, how to display these records will also need to be addressed.

The Roadmap to get there….

  1. Adjust the database schema. xml table with flag and xml data. Issue of pulling out bits to populate other database fields. Koha would need rules to find title, author, etc. from other metadata types.
  2. How to send the xml to the indexer. Send the indexer xml wrapped in more xml to identify it as a specific metadata type. Koha would need rules to read each type of xml data.
  3. Display of the xml data. Multiple stylesheets? Other options other than xslt?
  4. What do developers need to prepare for? This topic will continue to be discussed in the Kohacon Hackfest.

The Benefits…..

Utilize multiple types of MARC data. UNIMARC and MARC21, sure thing!

Ability to crosswalk standards? If we have a MODS format record, can we export it as Dublin Core? YES!

Parsing arbritrary xml without worrying about the location of specific parts of data. Teaching Koha to read a schema. Could ideally feed it a ‚Äònew’ xml data format. Possibly, if there is the need.

This development will make it possible for Koha to move to *whatever* is after MARC. It could also mean that Koha *becomes* the discovery platform! It expands Koha’s capabilities. We can get to more metadata!! We don’t lose bits of metadata because of MARC’s limitations.

Read more by Joy Nelson

Tags kohacon13