From 723e59e6ead3df92f6b04b55104829ab6d92870d Mon Sep 17 00:00:00 2001 From: Mike Taylor Date: Mon, 15 Feb 2010 17:08:38 +0000 Subject: [PATCH] Complete --- archive/wiring-into-masterkey | 145 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 139 insertions(+), 6 deletions(-) diff --git a/archive/wiring-into-masterkey b/archive/wiring-into-masterkey index d79a7ad..19ee2af 100644 --- a/archive/wiring-into-masterkey +++ b/archive/wiring-into-masterkey @@ -1,13 +1,146 @@ -It seems wrong that in IRSpy we have a database of 6000 Z39.50 +It seems wrong that in IRSpy we have a database of 3000 Z39.50 targets, but no easy way to make them available to MasterKey applications. Since Jason built the IRSpy toroid, which exposes IRSpy's ZeeRex database in Torus format, it seems that have had the -pieces we need to present those 6000 databases to MasterKey +pieces we need to present those 3000 databases to MasterKey administrators; but in reality there are several significant barriers to using the toroid output. -data quality -data richness -- more tests needed -irspy overnight test fixed -mkadmin handling many targets + +1. DATA QUALITY + +Of the 3000 or so records in the IRSpy database, some large proportion +represent servers that no longer exist, or never existed, or for which +we do not have access credentials. At present we can not even +identify which records fall into this category, because the records in +the database do not contain a reliability score (and therefore +searching by the score is not possible): the score is calculated when +records are displayed in the web UI, and so is "display-only" + + TASK: calculate reliability as part of the existing + record-transformation process, store the score and index is, + make it available as a search criterion. + +Once bad servers have been identified, we will need to decide what to +do with them. One option of course is just to throw the records away, +but I am always reluctant to discard information. It would probably +be better to introduce a notion of status into the IRSpy database, +mark the relevant records Dormant or similar, and arrange that the +IRSpy toroid ignores all such records. + + TASK: add a notion of record status to the IRSpy database, + supported in the Web UI and in Z39.50/SRU searches, and + arrange for regular record maintenance to modify this setting + for records as required. + + +2. OVERNIGHT TEST RUNS + +Every night, the IRSpy host runs a large series of tests on registered +servers to determine which are still alive, whether their capabilities +have changed, etc. One seventh of the server are tested each night, +so that the database is traversed every week. For some time now, +though, the overnight tests are failing -- sometimes quickly, +sometimes after running for a long and just occasionally not at all -- +due to an XML/XSLT problem: + + runtime error: file ../lib/ZOOM/../../xsl/irspy2zeerex.xsl + line 174 element param + xsltApplyXSLTTemplate: A potential infinite template recursion + was detected. + +This error masks another that was starting to manifest with increasing +frequency, to do with invalid characters in XML. These obviously need +fixing, as the overnight runs are crucial for maintaining the quality +of the data. + + TASK: fix the "infinite recursion" XSTL problem in the + overnight run. + + TASK: fix the invalid XML character problem as soon as it is + once more visible. + + +3. DATA RICHNESS + +The overlap between the set of data generated by IRSpy and what is +required by MasterKey is surprisingly small: at present, the IRSpy +Toroid provides only two fields in the records that it propagates: +ZURL and displayName. Other MasterKey target description fields fall +into four categories: + +* Some, such as authentication, could be propagated simply by + extending the toroid's zeerex2torus.xsl transformation + +* Others pertain to information that is known to IRSpy, but which + require additional business logic to extract: for example, cclmap_au + could be set to @attr 1=1003 for targets that have been determined + to support that access point, and @attr 1=1 for that that support + this but not 1003, and left blank for others. Similarly, + requestSyntax could be chosen from among those syntaxes that are + supported. + +* Other fields specify information which could in principle be + determined by IRSpy, but for which there are presently no tests -- + for example, a carefully design test could probably determine what + query encoding is in use in a given Z39.50 server, and what record + encoding is used in returned records -- but no such tests have been + created for IRSpy. + +* Finally, there are yet other MasterKey field that IRSpy could not + even in principle hope to determine: for example, URL Recipe seems + to be a lost cause, as it relates to web-sites that are "parallel" + Z39.50/SRU servers as well as to those servers themselves. Such + fields will need to be maintained by hand. + +Of the as-yet unsupported MasterKey fields, some have no equivalent in +ZeeRex and are therefore not representable in ZeeRex records. IRSpy +uses an extended ZeeRex scheme for its database, so this is not an +immediate problem. + +In general, not only are more tests needed within IRSpy, but more +intelligence is needed in transforming the data that IRSpy does +discover into information that MasterKey can use (and that +intelligence would perhaps not be best expressed in XSLT). + + +4. ADMIN CONSOLE'S HANDLING OF LARGE NUMBERS OF TARGETS + +Up till now, the MasterKey Admin Console has been used exclusively +with small sets of targets, not exceeding 30 or perhaps 40. It has +therefore been simplest to present all targets together on a single +page -- a strategy that will certainly not work well when we start +using the Admin Console to choose targets from IRSpy's much larger +selection. (At a rough guess, perhaps half to two thirds of the +registered servers in IRSpy are active and functional, so we are +looking at a list of 1500-2000 targets.) + +To handle this, we would need to re-tool the Admin Console so that, as +well as running in its current mode (which is still appropriate in +many situations) it can also run in a mode where it does not show +complete lists of targets, but invites administrators to search for +specific targets; and in which it limits the number of results on each +page and provides a means of stepping back and forth through the pages +representing the full list. + +In short, it needs to present its data in a way that more closely +resembles the way the IRSpy web UI works. Happily, some of the +substrate code for this already exists, as I imagined a search model +rather than a browse model when building the earliest Admin Console. +The big missing area is paging through long result-lists. + + +5. MISCELLANEOUS BUGS + +Finally, Bugzilla shows 18 open IRSpy bugs at + http://tinyurl.com/irspybugs +These are of very variable importance and difficulty, but +unfortunately all currently appear as P4s, as they have been +downgraded through time to reflect their lack of urgency. (In +retrospect, this demonstrates that we need to tweak our Bugzilla +practices so as to distinguish between urgency and importance.) + +These bugs should be reviewed, and we should determine which of them +ought to be solved as part of a MasterKey integration project. + -- 1.7.10.4