While many of our library patrons know that the most effective way to access the library catalog is by going to the library website or by going directly to Aspen Discovery, there are a lot of users that are used to doing all their searches in Google, Bing, Duck Duck Go, or another traditional search engine. To help reach these users where they are, Aspen Discovery publishes information to help these search engines understand all of the content the library owns.
You can use the Google Search Console to see how your site is performing. You can sign up here. If you choose to use the URL prefix method of signing up, we can add the required file to your server during the verification process.
Using the Search Console, we can see that our libraries are having some amazing results within Google and the results are getting better all the time. The following graph shows the stats for Arlington Public Library over the past three months.
Arlington is getting great results for individual titles as well as public lists that their librarians have created. For example, a search for “the fall of cabal” or “the fall of the house of cabal” in Google will Arlington’s record for The fall of the House of Cabal by Jonathan Howard as result number 5 (this can change day to day), just under the results for Amazon, macmillan publishers, and Audible. It currently ranks above results from Target, Walmart, Thriftbooks, and other sellers. Searches for list content are also getting really good rankings in Google. For example, a search for “books similar to Red White and Royal Blue” shows Arlington’s list Read-Alikes for Red White & Royal Blue as the seventh result.
So what are we doing to get these great results? All search engines web crawlers (sometimes called bots) to: discover all of the content on a website, learn what is being provided, and then rank it against other content to provide relevant content for a patron’s search. This is similar process to what Aspen Discovery does when it reads information from the ILS, OverDrive, Axis 360, Hoopla, Cloud Library, etc; extracts relevant information like title, author, and subjects; adds the data to our Solr index; and then allows the patron to search the index to display relevant results.
Because the primary way patrons interact with Aspen Discovery is through search, it can be difficult for Google and other search engines to know how to get to all of our content since they can’t automatically do a bunch of searches. To help search engines access all of our content, we provide a file called a sitemap to them. The sitemap contains a list of all the records that your catalog holds. Because our catalogs are so big, we actually have to divide the listing of records in the catalog into multiple files all gathered together by a single sitemapindex. That can be found at yoursite.org/sitemapindex.xml. The index then provides links to each title within Aspen Discovery so Google or any other search engine can find them.
Within each record itself, we also add information that is visible to search engines and other servers. The information we provide has a general name of linked data and we provide linked data in the schema.org format that Google recommends as well as displaying BibFrame works. You can see the data using Google’s Structured Data Testing Tool. For The Midnight Library, we see information as shown below:
Although we can never guarantee how Google will show our results, we can definitely influence how they see our site and do everything we can to make sure that our results rank as highly as possible within the results.
In the Aspen Usage Dashboard, we show information about how many pages search engine bots are loading each month. This will ebb and flow over the course of a year and as we identify new search engines. Here is one example of search engine activity over the last year.
Sometimes, a search engine crawler gets overly aggressive and indexes too many pages too quickly, or it only provides search results in locations that the library does not serve (i.e. crawlers that only serve Russia, Europe, or China).
If that is the case, we can block traffic from particular IP addresses or IP address ranges, but that is normally only needed if we notice a drop in page response times.
If you want to setup the Google Search console, we would be happy to help you so you can better track your results and we’d love to discuss any ideas you may have for getting better results for your catalog in the search engines that patron’s use everyday.
Read more by Mark Noble