Basic Overview of Match Points
Last week I had the chance to further my knowledge of how match points for Koha cataloging imports work. I thought I would take a minute to share what I learned from my good friends Ruth Bavousett and Jared Camins-Esakov.
First and foremost, I want to go through setting up a match point. The first box you will see is shown below.
The matching rule code is something you enter that will uniquely identify your matching rule. You can call this anything that is not currently in use. I recommend something descriptive and not something like “George” or “”Frank”.
The description is just a simple explanation of what your rule does. For example it could be “Match rule for bib numbers”.
The match threshold is really important! This is what tells the matcher exactly what threshold must be reached by the rules we are about to define in order to be considered a match. I will explain this in more detail after we go over defining match points.
Next, lets take a look at how to define match points.
The first thing we have to define is the search index to match on. These are the index names that are loaded into the zebra config. For example: ISBN or biblio-number. For a complete list see /etc/zebradb/biblios/etc/bib1.att in your Koha install.
We also need to give this match point a score. This essentially means that for each match that is found we will add this number to the total score. If the total score of all the matches reaches the threshold we defined in the previous step then we have found a match. A common mistake is to define the threshold too high and the matchpoint score too low resulting in no matchers so beware!
The sub-block is pretty self explanatory in that you just define what part of the field you would like to match on. There are a few extra variables in there for defining an offset, length, and Normalization Rule but those are rarely used.
Keep in mind that you may have several match checks depending on the complexity of your desired match.
Lastly we have the match checks. In this section we are allowed to define match checks for data we may already have in our collection.
As you can see above you have the same matching selectors as in the match rule. This feature is rarely used and is typically removed for normal use cases. When it is used it acts as a double check. It ignores zebra and does the check on the database.
Pro-tip: this could be helpful if you have stale indexes and don’t want to rebuild before running your match points.
That’s pretty much it. You can now use your matching rule when staging your marc records for import. Feel free to leave a comment below if you have any questions/comments.
[Originally posted by Elliott Davis]