A little bit about “Relevance” in Koha 3

To say that relevance ranking in Koha is a bit arcane is really putting it mildly; it’s a black-box that isn’t very easy to understand, even to the turbo nerds (like me).

We get questions occasionally on the helpdesk about why this or that appears above (or below) others–how, exactly, is it choosing that? The answer, of course, is “it’s complex”.

The first thing that’s useful to note is that searching and relevance ranking are two different operations! Koha does the search you’ve requested, and then, on all records returned, does relevance ranks!

Here are some general thoughts that may shed some light into relevance ranks:

  • Some fields in the MARC record are more “relevant” than others. The title (245$a) is a lot more relevant than, say, the genre note (505$a).
  • The more a word appears, the more relevance points it will get. It’s possible, if you have a large 500 note with the words repeated, that it could “out-rank” something with higher relevance, like a single appearance in the title.
  • The earlier that your word or words appear in the tag, the more relevant they are.
  • If you enter more than one word, then the two words appearing in the same MARC tag is more relevant than if they both appear in different tags.
  • Even more-relevant is when they appear in the order you gave them. So, a search for “wild cats” all return “Wild Cats Story”, and “Story of Cats That Went Wild”, but the first will get more relevance for that title.
  • The more words you enter, the smarter the relevance engine gets, since it knows more about what you’re looking for. There are no stopwords in Koha, so you can enter “the,” and it will do its’ level best to “rank” most all of the catalog.

For a really good example of this, in most any Koha public library catalog, do a search for “help”. Somewhere in your results will be Kathryn Stockton’s The Help,but likely, it won’t be at the top; titles that start with the word “Help!” will be more-relevant. Adding the “the” to the front of the search (“the help”) will cause Stockton’s book (and the 2011 movie based on it) to pop right to the top!

The overarching moral to this story is that single-word search will almost-always struggle with relevance ranking, particularly if it’s a very common word; giving more words will help Koha find the things you’re looking for, and put them at the top.

I hope these tips help!

[Originally posted by D. Ruth Bavousett]

Leave a comment

Your email address will not be published. Required fields are marked *

Are you human? * Time limit is exhausted. Please reload CAPTCHA.

3 thoughts on “A little bit about “Relevance” in Koha

  • Eli

    Thanks very much for this Ruth. This is an area that has frustrated several staff with searching for items, including The Help. I’m going to pass this along to everyone here at Three Rivers.

  • Libranto

    When we think about end user’s or patron’s requirements, information need is top issue. So, information discovery most important issue which we need to focus on. On the other hand Information Retrieval (IR) is a complex issue. Because IR performance depends on many factors. IR models, stemming, query expansions with synonyms, boosting and query models are important topics. When we think about Koha, koha was a premature child with a binary model by using MySQL as a IR tool. After, Koha improved IR performance by using Zebra. But, unfortunately it is not enough for ideal IR performance. Zebra is not a “all in one” tool set. I think Koha has to use Lucene based IR tools (lucene, elastic search or Solr) for becoming mature. Lucene’s default IR model is a bit Vector Space and a bit Boolean model (or you can say some kinds of Extended Boolean Model). I don’t want to describe IR models and IR tools and also advantages or disadvantages. I can say that, if Koha Community don’t use Lucene based IR tools (stemming, query expansions, faced search, search complate …) users will continue seeking other systems.
    To sum up, this blog entry so superficial because of Zebra. Relevance can’t deal with Zebra. On the other hand information need and information seeking behaviors are also important. Zebra is not enough toolset on this concepts.