X-Git-Url: http://git.indexdata.com/?p=irspy-moved-to-github.git;a=blobdiff_plain;f=archive%2Fwiring-into-masterkey;fp=archive%2Fwiring-into-masterkey;h=0000000000000000000000000000000000000000;hp=c31ab2db8357191fdebb72e0e38d31e328bec74a;hb=f0f8f40910525c365c5bbac74c366944c87b7124;hpb=6962db83488294669326989232161cf76c52bc6b diff --git a/archive/wiring-into-masterkey b/archive/wiring-into-masterkey deleted file mode 100644 index c31ab2d..0000000 --- a/archive/wiring-into-masterkey +++ /dev/null @@ -1,148 +0,0 @@ -It seems wrong that in IRSpy we have a database of 3000 Z39.50 -targets, but no easy way to make them available to MasterKey -applications. Since Jason built the IRSpy toroid, which exposes -IRSpy's ZeeRex database in Torus format, it seems that have had the -pieces we need to present those 3000 databases to MasterKey -administrators; but in reality there are several significant barriers -to using the toroid output. - - -1. DATA QUALITY - -Of the 3000 or so records in the IRSpy database, some large proportion -represent servers that no longer exist, or never existed, or for which -we do not have access credentials. At present we can not even -identify which records fall into this category, because the records in -the database do not contain a reliability score (and therefore -searching by the score is not possible): the score is calculated when -records are displayed in the web UI, and so is "display-only" - - TASK: calculate reliability as part of the existing - record-transformation process, store the score and index is, - make it available as a search criterion. - -Once bad servers have been identified, we will need to decide what to -do with them. One option of course is just to throw the records away, -but I am always reluctant to discard information. It would probably -be better to introduce a notion of status into the IRSpy database, -mark the relevant records Dormant or similar, and arrange that the -IRSpy toroid ignores all such records. - - TASK: add a notion of record status to the IRSpy database, - supported in the Web UI and in Z39.50/SRU searches, and - arrange for regular record maintenance to modify this setting - for records as required. - - -2. OVERNIGHT TEST RUNS - -Every night, the IRSpy host runs a large series of tests on registered -servers to determine which are still alive, whether their capabilities -have changed, etc. One seventh of the server are tested each night, -so that the database is traversed every week. For some time now, -though, the overnight tests are failing -- sometimes quickly, -sometimes after running for a long and just occasionally not at all -- -due to an XML/XSLT problem: - - runtime error: file ../lib/ZOOM/../../xsl/irspy2zeerex.xsl - line 174 element param - xsltApplyXSLTTemplate: A potential infinite template recursion - was detected. - -This error masks another that was starting to manifest with increasing -frequency, to do with invalid characters in XML. These obviously need -fixing, as the overnight runs are crucial for maintaining the quality -of the data. - - TASK: fix the "infinite recursion" XSTL problem in the - overnight run. - - TASK: fix the invalid XML character problem as soon as it is - once more visible. - - -3. DATA RICHNESS - -The overlap between the set of data generated by IRSpy and what is -required by MasterKey is surprisingly small: at present, the IRSpy -Toroid provides only two fields in the records that it propagates: -ZURL and displayName. Other MasterKey target description fields fall -into four categories: - -* Some, such as authentication, could be propagated simply by - extending the toroid's zeerex2torus.xsl transformation - -* Others pertain to information that is known to IRSpy, but which - require additional business logic to extract: for example, cclmap_au - could be set to @attr 1=1003 for targets that have been determined - to support that access point, and @attr 1=1 for that that support - this but not 1003, and left blank for others. Similarly, - requestSyntax could be chosen from among those syntaxes that are - supported. - -* Other fields specify information which could in principle be - determined by IRSpy, but for which there are presently no tests -- - for example, a carefully design test could probably determine what - query encoding is in use in a given Z39.50 server, and what record - encoding is used in returned records -- but no such tests have been - created for IRSpy. - -* Finally, there are yet other MasterKey field that IRSpy could not - even in principle hope to determine: for example, URL Recipe seems - to be a lost cause, as it relates to web-sites that are "parallel" - Z39.50/SRU servers as well as to those servers themselves. Such - fields will need to be maintained by hand. - -Of the as-yet unsupported MasterKey fields, some have no equivalent in -ZeeRex and are therefore not representable in ZeeRex records. IRSpy -uses an extended ZeeRex scheme for its database, so this is not an -immediate problem. - -In general, not only are more tests needed within IRSpy, but more -intelligence is needed in transforming the data that IRSpy does -discover into information that MasterKey can use (and that -intelligence would perhaps not be best expressed in XSLT). - - -4. ADMIN CONSOLE'S HANDLING OF LARGE NUMBERS OF TARGETS - -Up till now, the MasterKey Admin Console has been used exclusively -with small sets of targets, not exceeding 30 or perhaps 40. It has -therefore been simplest to present all targets together on a single -page -- a strategy that will certainly not work well when we start -using the Admin Console to choose targets from IRSpy's much larger -selection. (At a rough guess, perhaps half to two thirds of the -registered servers in IRSpy are active and functional, so we are -looking at a list of 1500-2000 targets.) - -To handle this, we would need to re-tool the Admin Console so that, as -well as running in its current mode (which is still appropriate in -many situations) it can also run in a mode where it does not show -complete lists of targets, but invites administrators to search for -specific targets; and in which it limits the number of results on each -page and provides a means of stepping back and forth through the pages -representing the full list. - -In short, it needs to present its data in a way that more closely -resembles the way the IRSpy web UI works. Happily, some of the -substrate code for this already exists, as I imagined a search model -rather than a browse model when building the earliest Admin Console. -The big missing area is paging through long result-lists. - - -5. MISCELLANEOUS BUGS - -Finally, Bugzilla shows 18 open IRSpy bugs at - http://tinyurl.com/irspybugs -These are of very variable importance and difficulty, but -unfortunately all currently appear as P4s, as they have been -downgraded through time to reflect their lack of urgency. (In -retrospect, this demonstrates that we need to tweak our Bugzilla -practices so as to distinguish between urgency and importance.) - -These bugs should be reviewed, and we should determine which of them -ought to be solved as part of a MasterKey integration project. - -On top of this is the wishlist: -http://twiki.indexdata.dk/cgi-bin/twiki/view/ID/IRSpyWishList -