Basic Overview of Match Points

Last week I had the chance to further my knowledge of how match points for Koha cataloging imports work.  I thought I would take a minute to share what I learned from my good friends Ruth Bavousett and Jared Camins-Esakov.

First and foremost, I want to go through setting up a match point. The first box you will see is shown below.

 

The matching rule code is something you enter that will uniquely identify your matching rule.  You can call this anything that is not currently in use.  I recommend something descriptive and not something like “George” or “”Frank”.

The description is just a simple explanation of what your rule does.  For example it could be “Match rule for bib numbers”.

The match threshold is really important!  This is what tells the matcher exactly what threshold must be reached by the rules we are about to define in order to be considered a match.  I will explain this in more detail after we go over defining match points.

Next, lets take a look at how to define match points.

The first thing we have to define is the search index to match on.  These are the index names that are loaded into the zebra config.  For example: ISBN or biblio-number.  For a complete list see /etc/zebradb/biblios/etc/bib1.att in your Koha install.

We also need to give this match point  a score.  This essentially means that for each match that is found we will add this number to the total score.  If the total score of all the matches reaches the threshold we defined in the previous step then we have found a match.  A common mistake is to define the threshold too high and the matchpoint score too low resulting in no matchers so beware!

The sub-block is pretty self explanatory in that you just define what part of the field you would like to match on.  There are a few extra variables in there for defining an offset, length, and Normalization Rule but those are rarely used.

Keep in mind that you may have several match checks depending on the complexity of your desired match.

Lastly we have the match checks.  In this section we are allowed to define match checks for data we may already have in our collection.

As you can see above you have the same matching selectors as in the match rule.  This feature is rarely used and is typically removed for normal use cases.  When it is used it acts as a double check.  It ignores zebra and does the check on the database.

Pro-tip: this could be helpful if you have stale indexes and don’t want to rebuild before running your match points.

 

That’s pretty much it.  You can now use your matching rule when staging your marc records for import.  Feel free to leave a comment below if you have any questions/comments.

 

[Originally posted by Elliott Davis]

4 Responses to Basic Overview of Match Points

  1. [...] Davis posted an overview of match points used in importing MARC [...]

  2. [...] Davis posted an overview of match points used in importing MARC [...]

  3. Are we able to use the OCLC number as a match point now? I created a match point for that when we first migrated in 2010 and then was told that an OCLC match points won’t work, and that except for the canned ones that came with the installation these don’t really work as advertised. I know that even using the canned ones there are still errors in matching sometimes and we lose previously cataloged records. Is there a list of fields that has unique information we can use as match points besides ISBN? Are there plans to improve this feature so that if the total *doesn’t* equal a given sum, for example an exact match to ISBN and OCLC, then the records won’t be replaced?

    • Nicole C. Engard says:

      Cathi,

      I’m not sure who told you that, but we have several partner libraries with OCLC number match points. It can be done, has been done, and does work. Just submit a ticket and ask us to set up an OCLC matching rule for you (let us know if you want the 035 or the 001 – both have OCLC numbers in them).

      Thanks
      Nicole

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>