From: Mike Taylor Date: Wed, 24 May 2006 16:34:38 +0000 (+0000) Subject: new X-Git-Tag: CPAN-v1.02~54^2~1197 X-Git-Url: http://git.indexdata.com/?p=irspy-moved-to-github.git;a=commitdiff_plain;h=8b21f11a0cbfb9333641e12dfbf1f09602d8692b new --- diff --git a/archive/interface b/archive/interface new file mode 100644 index 0000000..0ffee33 --- /dev/null +++ b/archive/interface @@ -0,0 +1,106 @@ +From mike Mon May 22 16:44:51 2006 +X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] + ["7234" "Monday" "22" "May" "2006" "17:43:35" "+0200" "Per M. Hansen" "perhans@indexdata.dk" nil "158" "Re: Service description robot project status" "^X-Spam-Status:" nil nil "5" nil nil nil nil nil nil nil nil nil] + nil) +Return-path: +X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on bagel.indexdata.dk +X-Spam-Level: +Envelope-to: mike@miketaylor.org.uk +Delivery-date: Mon, 22 May 2006 17:43:46 +0200 +Received: from localhost.localdomain [127.0.0.1] + by localhost with POP3 (fetchmail-6.2.5) + for mike@localhost (single-drop); Mon, 22 May 2006 16:44:51 +0100 (BST) +Received: from user.indexdata.dk ([213.150.43.10] helo=[127.0.0.1]) + by bagel.indexdata.dk with esmtp (Exim 3.35 #1 (Debian)) + id 1FiCZL-00043Z-00; Mon, 22 May 2006 17:43:45 +0200 +Message-ID: <4471DC27.6000703@indexdata.dk> +User-Agent: Thunderbird 1.5.0.2 (Windows/20060308) +MIME-Version: 1.0 +References: <445B42DD.3040901@indexdata.dk> <17499.23342.465911.666143@localhost.localdomain> <445EF042.7070806@indexdata.dk> <17503.1546.23498.852784@localhost.localdomain> +In-Reply-To: <17503.1546.23498.852784@localhost.localdomain> +X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE + autolearn=ham version=3.1.1 +From: "Per M. Hansen" +To: Mike Taylor +CC: Sebastian Hammer , + Adam Dickmeiss +Subject: Re: Service description robot project status +Date: Mon, 22 May 2006 17:43:35 +0200 +X-StripMime: Non-text section removed by stripmime +Content-Type: text/plain; charset=ISO-8859-1; format=flowed + + + +Mike Taylor wrote: +>>> What there is of it is in the "irspy" CVS module. I made the +>>> asynchronous-operations enhancements to ZOOM-Perl for it, created a +>>> Perl project framework and worked on the ZeeRex database setup +>>> (Zebra configuration) that underlies it. At that stage, I got +>>> diverted into Metaproxy documentation, Alvis work and various +>>> marketing bits. I expect to spend the rest of today following up +>>> the NPG and M25 leads and finishing up the description of how +>>> multi-database searching works in Metaproxy. Then next week is all +>>> for IRspy. +>>> +>> +>> Ok, sounds good. I am looking forward to see the admin interface and +>> be able to take it for a test spin. +>> +> +> Actually, what would be _really_ helpful would be if you could dummy +> up some HTMl showing how you'd like the admin interface to work. Then +> I can work to that rather than flying blind and hoping you like the +> result. +> +I can make some HTML if you like but I don't think that I can make +something that you can't make even better. Any way let me start by +trying to describe the functionality I envisions, if this thing is gong +to take over ZSpy's role today. + +We need a fairly simple interface for non authenticated users to add new +servers to the repository, something like the current Z-Spy interface: +http://targettest.indexdata.com/newtarget.php, but nicer :-). In +addition to the fields on the current page, I would like the ability to +say what kind of organization is hosting the database eg. public +library, academic library, corporate library and other. If we really +want to make it fancy we should also add the ability to say what +subjects are strongly represented in this databases, like medicine, +engineering, theology, etc. but I am just afraid that there will be so +few servers where this info is available for that it will be a waist of +time to add this. + +When you have filled out the fields, where only the name, host name, +port and database name are mandatory fields, a series of checks should +happen before the server is added: First we should check if the server +is already registered under that host name/IP (make a DNS lookup) port +and database. If it is not, the second check should be a simple init and +connect test. If this test fails I think that we should tell the user +but it should still be possible to add the server. + +The administrator interface should give the ability to browse through +the servers in the repository, a simple list with all servers beginning +with a, b, c, ..., like the current Target directory interface, is fine +by me. Under each server you should be able to view all the data that +was entered and collected by the robot. You should also have the ability +to edit and delete the servers. + +I am not sure how many people ever view the current target statistics +http://targettest.indexdata.com/stat.php but personally I find it +extremely interesting, and I would love if we can reimplement that, but +maybe it doesn't have to be in the first version. + +How is that for a first shot at a requirements spec? + + +-- +Per + + + +--- StripMime Report -- processed MIME parts --- +multipart/alternative + text/plain (text body -- kept) + text/html +--- + + diff --git a/archive/tests b/archive/tests new file mode 100644 index 0000000..81f76e1 --- /dev/null +++ b/archive/tests @@ -0,0 +1,358 @@ +From mike Tue May 23 08:49:36 2006 +X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] + ["10552" "Tuesday" "23" "May" "2006" "09:43:08" "+0200" "marc" "marc@indexdata.dk" nil "326" "SRU Server lint/tester" "^X-Spam-Status:" nil nil "5" nil nil nil nil nil nil nil nil nil] + nil) +Return-path: +X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on bagel.indexdata.dk +X-Spam-Level: +Envelope-to: mike@miketaylor.org.uk +Delivery-date: Tue, 23 May 2006 09:43:10 +0200 +Received: from localhost.localdomain [127.0.0.1] + by localhost with POP3 (fetchmail-6.2.5) + for mike@localhost (single-drop); Tue, 23 May 2006 08:49:36 +0100 (BST) +Received: from user.indexdata.dk ([213.150.43.10] helo=[10.0.1.66]) + by bagel.indexdata.dk with esmtp (Exim 3.35 #1 (Debian)) + id 1FiRXp-0004lO-00; Tue, 23 May 2006 09:43:09 +0200 +Message-ID: <4472BD0C.1000102@indexdata.dk> +User-Agent: Debian Thunderbird 1.0.7 (X11/20051017) +X-Accept-Language: en-us, en +MIME-Version: 1.0 +References: <17522.6764.89623.774386@localhost.localdomain> +In-Reply-To: <17522.6764.89623.774386@localhost.localdomain> +Content-Type: text/plain; charset=UTF-8; format=flowed +Content-Transfer-Encoding: 8bit +X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham + version=3.1.1 +From: marc +To: Mike Taylor , adam@indexdata.dk, + Ralph LeVan +Subject: SRU Server lint/tester +Date: Tue, 23 May 2006 09:43:08 +0200 + +Mike Taylor wrote: +> Guys, +> +> Ralph's fixed his SRU-lint web-page. It takes a few minutes to get +> its head around foo.indexdata.dk, but it does manage, and has some +> useful things to say. +> + + +Hej Ralph + +Thanks for a fast fix of your SRU Server tester +http://alcme.oclc.org/srw/SRUServerTester.html +for the Alvis-Zebra SRU server URL +http://foo.indexdata.dk + +I think your SRU checker does a nice job. And it did indeed find some +errors in our implementation, so we all can improve. Thanks for that! + +I have a couple of small comments: + + +1) +You test scan like this: + +http://foo.indexdata.dk?version=1.1&scanClause=rec.id+=+dog&operation=scan&responsePosition=1&maximumTerms=5 + +I think you should always add a responsePosition=3 to the mix, as there +might be indexes (and there are here!) where the term 'dog' comes +lexocographically after the last index entry, and you get a fat empty + + +1.1 + + +But using + +http://foo.indexdata.dk/?version=1.1&operation=scan&scanClause=rec.id+%3D+%28dog%29&responsePosition=3&maximumTerms=5&stylesheet= + +gives you + + +1.1 + + +FFFEBE6A0D7773AF401A728D5C818AEB +1 + + +FFFF3A78648AC540304B1F50A2C0D644 +1 + + + + +By the way, an empty index is not that useful, but I think it's not +necessarily an error to have one unpopulated index, so using a warning +from your side is a good choice, I feel. + + +2) scan relation 'exact' + +You try +http://foo.indexdata.dk?version=1.1&scanClause=rec.id+exact+dog&operation=scan&responsePosition=1&maximumTerms=5 + +with relation 'exact', but my explain never told you that the server +supports relation 'exact'. + +I think, my correct server response to this should be a fatal +diagnostic, and your correct test result should have been 'diagnostic +this-and-that expected'. + +Unless it's mandatory that any index supports 'exact', in which case an +error should be reported (I need to look in the specs to be sure ..) + +3) test of search retrieve + +You are doing a decent job here. I have a suggestion for a slight +improvement: you might want to use your information from a scan in the +follwing way: + +- search for a term _not_ found in the index, and see that there are 0 + hits (and the response is correct) + +- search for at term found in the scan response, and see that the number + of hits equals the number of hits claimed in the scan response + +I know, it's more work to check for numbers, but it's also a nice sanity +check on top of a syntax/protocol check. + +4) huge records: some of the records are insane huge (up to 5 MB of +XML). For example you hit one here: + +http://foo.indexdata.dk?version=1.1&query=alvis.entity-disease+=+"dominant +optic atrophy"&operation=searchRetrieve&maximumRecords=1 + +You might want to test for XML response message size before doing +anything else to it, and report a warning like 'response too huge to be +tested, exceeds X MB of XML' + +(I know, I should use anotheŕ default schema here, to give small +records, for example the 'dc' schema. We have to improve too ..) + +5) recordSchema +In these cases, one might want to try the other record schema's to see +if one get something useful there .. I did not see you testing any +recordSchema of those I did mention, nor testing non-existent record +schema's .. + +6) In general, testing with wrong arguments on almost any place one can +do is a good idea as well, as the hardest part of SRU/SRW is to get the +diagnostics right .. for example record position too high, or +non-existing relaiton or index, non-existing operation .. ect .. +(Yes, I know, this is huge work to do ..) + +7) Finally, useability: I think it's a nice idea to have this report +formatted in XHTML tables, with nice links to click on for each test +case, such that one can just execute the request which did produce +errors/warnings. This is of course only eye-candy, but also a cheap +improvement of useability. + + +Still, I think you did a very decent job, and created a very useful +service. I thank for the problems discovered with my service. I have +some programming to do as well .. + + +Marc Cromme + +Index Data + +> ------- start of forwarded message ------- +> Return-path: +> X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on bagel.indexdata.dk +> X-Spam-Level: +> Envelope-to: mike@miketaylor.org.uk +> Delivery-date: Mon, 22 May 2006 21:07:05 +0200 +> Received: from localhost.localdomain [127.0.0.1] +> by localhost with POP3 (fetchmail-6.2.5) +> for mike@localhost (single-drop); Mon, 22 May 2006 20:56:19 +0100 (BST) +> Received: from mshieldserver1.oclc.org ([132.174.29.209]) +> by bagel.indexdata.dk with smtp (Exim 3.35 #1 (Debian)) +> id 1FiFk9-0005kc-00 +> for ; Mon, 22 May 2006 21:07:05 +0200 +> Received: From OAEXCH2SERVER.oa.oclc.org ([132.174.29.222]) by mshieldserver1.oclc.org (WebShield SMTP v4.5 MR2); +> id 1148324788949; Mon, 22 May 2006 15:06:28 -0400 +> X-MimeOLE: Produced By Microsoft Exchange V6.5 +> Content-class: urn:content-classes:message +> MIME-Version: 1.0 +> Content-Type: text/plain; +> charset="us-ascii" +> Content-Transfer-Encoding: quoted-printable +> Message-ID: <811A02A11096B343880D2EEF72C4C83202FCD5E9@OAEXCH2SERVER.oa.oclc.org> +> X-MS-Has-Attach: +> X-MS-TNEF-Correlator: +> Thread-Topic: [Adam Dickmeiss: Re: [Tech-alert] SRU Server lint/tester] +> thread-index: AcZ7Hk4E9wk02cosR0uT69PyXob4kACsz3fA +> X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham +> version=3.1.1 +> From: "LeVan,Ralph" +> To: "Mike Taylor" +> Subject: RE: [Adam Dickmeiss: Re: [Tech-alert] SRU Server lint/tester] +> Date: Mon, 22 May 2006 15:06:28 -0400 +> +> Let's just pretend I didn't send that last email, okay? +> +> So, lovely Explain records we're having today! +> +> I've fixed the blow up. You've got a couple of searches that return +> very large records that I think are peculiar. Your stylesheet can't +> render them and I report that I'm getting an error 400 from them. +> +> Run the test against the server again and let me know if there's +> something more I should be doing. +> +> Thanks, Mike! +> +> Ralph +> +> +>>-----Original Message----- +>>From: Mike Taylor [mailto:mike@miketaylor.org.uk] +>>Sent: Friday, May 19, 2006 4:29 AM +>>To: LeVan,Ralph +>>Subject: [Adam Dickmeiss: Re: [Tech-alert] SRU Server lint/tester] +>>=20 +>>Hi, Ralph. FYI, it seems that your SRU lint barfs on our server. +>>=20 +>>------- start of forwarded message ------- +>>Return-path: +>>X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on +>>bagel.indexdata.dk +>>X-Spam-Level: +>>Envelope-to: mike@indexdata.com +>>Delivery-date: Thu, 18 May 2006 18:38:14 +0200 +>>Received: from localhost.localdomain [127.0.0.1] +>> by localhost with POP3 (fetchmail-6.2.5) +>> for mike@localhost (single-drop); Thu, 18 May 2006 17:49:11 +> +> +0100 +> +>>(BST) +>>Received: from kebab.indexdata.dk ([83.133.64.60]) +>> by bagel.indexdata.dk with esmtp (Exim 3.35 #1 (Debian)) +>> id 1FglVt-0007y3-00; Thu, 18 May 2006 18:38:13 +0200 +>>Received: from localhost ([127.0.0.1] helo=3Dkebab.indexdata.dk) +>> by kebab.indexdata.dk with esmtp (Exim 4.50) +>> id 1FglVU-0001I6-DK; Thu, 18 May 2006 18:37:48 +0200 +>>Received: from user.indexdata.dk ([213.150.43.10] +> +> helo=3Dbagel.indexdata.dk) +> +>> by kebab.indexdata.dk with esmtp (Exim 4.50) id 1FglUy-0001Hv-WF +>> for tech-alert@lists.indexdata.dk; Thu, 18 May 2006 18:37:36 +> +> +0200 +> +>>Received: from dickmeiss.net ([213.173.244.115] helo=3D[10.0.0.18]) +>> by bagel.indexdata.dk with esmtp (Exim 3.35 #1 (Debian)) +>> id 1FglUj-0007QK-00 +>> for ; Thu, 18 May 2006 18:37:01 +> +> +0200 +> +>>Message-ID: <446CA2AC.1030200@indexdata.dk> +>>User-Agent: Thunderbird 1.5.0.2 (X11/20060501) +>>MIME-Version: 1.0 +>>References: <446C8BB1.6080206@indexdata.dk> +>>In-Reply-To: <446C8BB1.6080206@indexdata.dk> +>>Content-Type: text/plain; charset=3DISO-8859-1; format=3Dflowed +>>Content-Transfer-Encoding: 7bit +>>X-BeenThere: tech-alert@lists.indexdata.dk +>>X-Mailman-Version: 2.1.5 +>>Precedence: list +>>Reply-To: Announcements/discussion of interesting technology +>> +>>List-Id: Announcements/discussion of interesting technology +>> +>>List-Unsubscribe: >bin/mailman/listinfo/tech-alert>, +>> +> +> +> +>>List-Archive: >alert> +>>List-Post: +>>List-Help: = +> +> +> +>>List-Subscribe: +> +> +>>alert>, +>> +>>Errors-To: tech-alert-bounces@lists.indexdata.dk +>>X-SA-Exim-Connect-IP: 127.0.0.1 +>>X-SA-Exim-Mail-From: tech-alert-bounces@lists.indexdata.dk +>>X-SA-Exim-Scanned: No (on kebab.indexdata.dk); SAEximRunCond expanded +> +> to +> +>>false +>>X-Spam-Status: No, score=3D-2.5 required=3D5.0 tests=3DAWL,BAYES_00, +>> FORGED_RCVD_HELO autolearn=3Dham version=3D3.1.1 +>>From: Adam Dickmeiss +>>Sender: tech-alert-bounces@lists.indexdata.dk +>>To: Announcements/discussion of interesting technology +>> +>>Subject: Re: [Tech-alert] SRU Server lint/tester +>>Date: Thu, 18 May 2006 18:37:00 +0200 +>>=20 +>>marc wrote: +>> +>>>SRU Server tester +>>> +>>>Try it: surf into +>>>http://alcme.oclc.org/srw/SRUServerTester.html +>>> +>>>and give the Alvis-Zebra SRU server URL +>>>http://foo.indexdata.dk +>>>into the box. +>>> +>>>Have fun! +>> +>>Sort of. I get a big Java runtime exception. Nice. +>>=20 +>>/ Adam +>>=20 +>> +>>>Marc +>>> +>> +>>=20 +>>=20 +>>_______________________________________________ +>>Tech-alert mailing list +>>Tech-alert@lists.indexdata.dk +>>http://lists.indexdata.dk/cgi-bin/mailman/listinfo/tech-alert +>>------- end of forwarded message ------- +> +> ------- end of forwarded message ------- + + +-- + +Marc Cromme, cand. polyt, Ph.D +Senior Developer, Project Manager + +Index Data Aps +Købmagergade 43, 2 +1150 Copenhagen K. +Denmark + +tel: +45 3341 0100 +fax: +45 3341 0101 + +http://www.indexdata.com + +INDEX DATA Means Business +for Open Source and Open Standards + + + +