Why don’t we ever get around to looking at our ux?

5 11 2015

I’ve always wanted to do this, but there always seems to be something that trumps that promised project to improve our users’ online user experience. That ejournal whose access has been lost and requires chasing up the publisher; the RFID project etc etc..

It is however so critical to address the question of ux in order to move from delivery of a base level discovery service to something that can really improve student experience and help students do better research and ultimately get better results. In some ways it’s blindingly obvious. Big business has been doing this well for years and reaping the benefits (eg Amazon). Libraries have traditionally (with some noteable exceptions) been rather slow to recognise the importance of work in this area.

But we have dipped our toe in the water and that’s what this post is about. When we bought Primo in 2014, we were aware that out of the box, the interface was not going to be everything that we might desire. One of the things that interested us was the fact that you could build on its API and there was a certain degree of customised control over the interface which was possible. We were determined that we would address this shortcoming.

IOE Library Search was launched using the perpetual beta model. We wanted something up and running to provide an alternative to our legacy systems and we wanted users to be able to feed back to us the problems they were experiencing so that they could be resolved using an agile methodology. Admittedly the first cut was pretty basic. Anticipating this, we planned for a user experience workshop to take place.

My knowledge in this area was decidely vague and I had visions of setting up some sort of recording studio in a laboratory and getting software to track evey mouse click leaving us with a morass of data to interpret in some way. However, I did a little research and it seems that it is perfectly possible to set up something which is fairly low tech and get some very useful data as long as some basic rules are followed.

The first tip is to have a prepared script and stick to it. We used a popular one from Steve Krug’s book Don’t Make me Think, revisited: A common sense approach to Web usability which has been adopted by many libraries. We adapted this to our local requirements. We chose a variety of scenarios_to_test in an attempt to capture different types of useability information. We wanted to know if students could easily accomplish some basic tasks such as finding a full text article; finding a course reading cited on Moodle; and reserve a book.

The findings were quite interesting. We immediately spotted that two links which were listed on top of each other – “Advanced”, and “Browse”, were being read as “advanced browse”. So making an obvious separator for these with a tooltip would improve that howler quite easily. In fact, a lot of what we found was that we had made assumptions about terminology which had either no meaning or a different one to students than to librarians. The other key finding was that there were simply too many options available at item level. In Primo, these are described as tabs and having too many of these was confusing to the user. One user asked a very good question: “What is the full record tab for?” This is the tab which includes the detailed marc fields but it appeared that many users were simply not interested in this level of metadata detail.

All in all, these findings were a very useful first step for us. Our strategy now is to make some changes based on these and monitor usage via analytics. We will also be running up a follow up session to check we have not introduced any new problems whilst fixing the other ones.

 

 






That iffy EPrints full text search – getting help from Primo

17 07 2014

We run three EPrints instances here and one thing it is not strong on is phrase searching. That sounds a bit like a minor issue, but when you have a corpus of born digital material which has been ingested but only has limited metadata, it becomes rather important. The default EPrints simply treats a phrase search as a simple boolean AND search. We decided, we needed something better – more like what a Google searcher might expect. At the same time we were working on our Discovery project and EPrints metadata was being ingested into our chosen discovery system Primo, from Ex Libris, branded here as IOE Librarysearch. This is a relatively trivial operation involving use of OAI-PMH. We had always intended to ingest our full text documents into Primo as a discovery
layer without some of your core content is something of a misnoma. We understood that this was achievable as long as we could expose the urls of the full text documents in the OAI metadata. That itself was no problem. However, it turned out that getting this into Primo and indexed was rather more tricky. The initial approach was to grant EPrints admin access to Primo and allow it to use the EPrints API to populate a special Oracle table set up for the purpose. Unfortunately the indexing software which was being used to extract the content from the pdfs was found to be unreliable. We were unable to ascertain why certain files indexed fine whilst others produced unintelligble errors and were skipped. We finally decided that this approach was going to be neither sustainable nor complete and therefore had to think of something different.

My next thought was what about simply including the full text in the metadata by extracting it ourselves using open source tools which we knew to work reliably. I started by creating a new eprints field called fulltext as follows and linking it to our single in-use content type. Here are the field settings

Multiple values Yes
Type: longtext
Required: No
Volatile field: Yes
Include in XML export: No
Index: No

Next, we needed a special perl script (to be run as a one-off) which would extract text from all relevant existing documents in the repository. I considered creating an EPrints plugin but that didn’t seem to fit the requirement being more suited to actions being performed on a single record than a batch. The perl script was called extract_fulltext.pl. A prerequisite was the existence of the open source pdf toolkit (pdftk). On the test EPrints server I was using, this was already installed.

The basics of the script were to loop through the relevant records, call pdftk to extract the text and then to chunk it into portions for adding to the metadata records. The reason for needing to chunk it up is that the maximum length of the longtext field is 65000 bytes and the length of many of our pdf documents exceeds that. I successfully ran this on my small test set of records and the result is a very horrible looking long record. That itself is easily resolvable by configuring eprints not to display the field. We wouldn’t want to do so anyway as it would show all the nasty OCR errors that are often present. There is presumably a loading implication if a record is larger in size which may affect performance but I have not had time to test that yet.

The next thing is that we need to ensure that newly added pdfs similarly get text extracted and added to the metadata record. In this case, an EPrints plugin is going to be the answer. You could either have a button near the file upload screen in the document editing screen or possibly automate the running of this following a save action. I have not had time to develop this.

Next, we need to make an adjustment to the Primo Pipe for this resource in order to ingest the fulltext field and ensure it is searchable by Primo. At the time of writing, this is not resolved. It looks like an enrichment plugin needs to be set up such as described at on this Ex Libris Developers page but we are awaiting advice from Ex Libris as to how to achieve this.

Finally, the piece-de-resistance. We want to improve searchability. We already run Primo which has the sort of retrieval we are looking for (phrase searching supported for example), so the idea was to use the Primo API to replace the search in EPrints. Could this be done?

I started off by using the EX Libris Developer network to study the workings of the Primo API. Helpfully they had an example application which I was able to download and play with in order to begin to understand it. I decided to try this on our test DERA repository. The mechanics were reasonably straightforward. I cloned the index.html file from /usr/share/eprints3/archives/<repoid>/html/en and called it primoindex.html, placing it in the same folder. I placed the search part of the demo Primo API app in this page and limited it to search on Primo scope “DERA”. I also placed the custom css and js files from the API there.

I was able to get a working prototype running fairly quickly. It was trivial to change the links to results to those within the native EPrints repository as the eprints id was also in the metadata. This still has some way to go before it can be said to be a fully viable solution but does show what can be achieved by using the best parts of different systems in a suitable collaboration.