Specific Questions for Library Staff Endeca Experts

Total Page:16

File Type:pdf, Size:1020Kb

Specific Questions for Library Staff Endeca Experts

Specific Questions for Library Staff Endeca Experts

All the of the examples refer to the example.html file available at (http://ils.unc.edu/bmh/tmp/endeca/example.html):

1. What causes long sessions (500+ steps) of RSS search or exporting the record in email or print format? I.e. we see some long sequences composed of repeating RSS actions or exporting services in the transaction logs?

For example, see Session 193900 (with 702 steps), Session200590 (with 389 steps) ANSWER: Robots crawling for data probably. Suggest identifying robots via (a) stop list of primary offenders; see Derek for list he already has; (b) generate your own list, but extracting the URLs of offending search records, and compile unique listing (i.e. remove duplicates). Then use this to filter out records prior to analysis. Coordinate this with Derek, as we may be able to update his list that student worked on for him last semester.

2. Why are there many transaction logs like “/search?op=logs&………….”? What did these logs suggest?

This is logs run by Library staff (Ben). You can match anything with “op=logs” and remove it.

For example: 152.23.231.43 - - [06/May/2009:00:04:17 -0400] "GET /search?op=log&href=%3FR %3DUNCb5125184&ulabel=Development+as+freedom+%5Belectronic+resource%5D HTTP/1.1" 200 - "http://search.lib.unc.edu/search?Nty=1&Ntk=Keyword&Ntt=amartya+sen%2C+development+as+freedom" "Mozilla/4.0 (compatible MSIE 7.0 Windows NT 6.0 SU 3.21 SLCC1 .NET CLR 2.0.50727 InfoPath.2 .NET CLR 1.1.4322 .NET CLR 3.5.30729 .NET CLR 3.0.30618)"

3. The server sometimes repeats logging the same transaction twice or more times. Why has that happened?

For example, see Session 25186 (322 steps), Session 65429 (642 steps)

Not sure why (would like to know). But should be eliminated. Probably as part of some scheme to identify duplicated entries, as well as zero length operations.

4. Are there some robots using the catalog who run the queries in the batch mode?

For example, see Session 69721 (528 steps),Session 218823 (332 steps) Not positive, but probably some kind of robot that just grabs everything off the page (which results in set of requests for each item on page). Again, should try to filter out.

5. Are there librarians using the catalog who know the specific information about items?

For example, see Session 65861 (528 steps) Yes. Brad and Xi need to think about how to handle this because we want to (1) filter out all non-humans, and then potentially (2) filter out non general users (i.e. filter out librarians or speciality use).

6. Is there any interface change after the adoption of Endeca?

For example, not all of those export services (email, text, cite, print, plain text, endnote, RefWorks, delicious…) were appearing in the interface in Feb this year.

Yes. Library staff will determine what they can and email to us to keep us informed. Subversion code checking would be very fine grained (probably more than needed). Other option is the emails Jill sends out to users about new features. She’ll look at this and let us know (and send us first pass). Derek also suggested we could consider using wget (http://en.wikipedia.org/wiki/Wget) to quantitatively determine changes to given web page (like the library interface page).

At our next meeting, Xi and Brad will present more detailed coverage for those interested. Jill, Ben and Derek will report on their organizations perspectives after updating them on our progress, and in particular about

1) Whether there are analyses Xi and Brad can run that would help them. 2) Whether Brad’s suggestions for interface changes might be feasible for testing (we would do all the testing, just need version of library interface up so users would use it and generate log data for analysis). See http://bioivlab.ils.unc.edu/wiki/index.php/Endeca#Ideas_for_Interactive_Use _of_LCC_and_LCSH_for_faceted_browsing (db1, sp02).

PS.. Initial follow-up from Ben, and Derek: Thanks to Derek's diligence, we caught one of the spurts of numerous simultaneous requests for movies shortly after it occurred and looked into what the cause was. We're pretty sure its an administrative interface written by one of our CALAs that is used for checking out movies for classes. The requests are originating from the web server that runs that interface, so the IPs to filter out are 152.2.176.85 and 152.2.176.91. I'm told these should be static until sometime next year at least.

Recommended publications