X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Fharvest.mbox;h=0f38a3ad6f59da2823591433041bae6454f614fd;hb=1cd729b5e40fe0bee16a0a2c073dffc8bcbd1d28;hp=4a24a6fe1901f0ba77d7c7b15dc80be2cc28e37d;hpb=fb6e87e036fa1b6d282a30140c202d49152a105b;p=idzebra-moved-to-github.git diff --git a/doc/harvest.mbox b/doc/harvest.mbox index 4a24a6f..0f38a3a 100644 --- a/doc/harvest.mbox +++ b/doc/harvest.mbox @@ -158,3 +158,203 @@ I am very happy to see such a nice software available under GPL. Thanks. kj + +From zebralist-admin@indexdata.dk Mon Nov 25 11:13:10 2002 +MIME-Version: 1.0 +Envelope-to: zebra@miketaylor.org.uk +From: Pete +X-X-Sender: qq15@uxa.liv.ac.uk +To: Kang-Jin Lee +cc: zebralist@indexdata.dk +Subject: Re: [Zebralist] Some progress on Harvest's move to Zebra +In-Reply-To: <200211242045.19196.lee@arco.de> +Content-Type: TEXT/PLAIN; charset=US-ASCII +X-Spam-Level: +Sender: zebralist-admin@indexdata.dk +X-BeenThere: zebralist@indexdata.dk +X-Mailman-Version: 2.0.11 +Precedence: bulk +List-Help: +List-Post: +List-Subscribe: , + +List-Id: Zebra Information Server +List-Unsubscribe: , + +List-Archive: +Date: Mon, 25 Nov 2002 10:19:37 +0000 (GMT) +X-Spam-Status: No, hits=-4.4 required=5.0 tests=IN_REP_TO version=2.20 +X-Spam-Level: +Content-Length: 2853 + +On Sun, 24 Nov 2002, Kang-Jin Lee wrote: + +>Hi, +> +>I finished first steps to use Zebra as fulltext engine for Harvest +>(http://harvest.sourceforge.net/). The performance boost after +>some testing are quite impressive. + +Hi ... I'd almost forgotten that the Harvest project is still active. + +We had a heap of challenges with our Harvest setup and with the +time taken to index and search ... we switched to using +Harvest-NG as the "reaper/gatherer" and modified Zebra to +work with SOIF and our own ranking algorithm - it's been in +service for over 6 months now. + +We had challenges with both speed of gathering and with +speed of indexing and searching but most seem to be +"managable" now. + +We offered our modifications to Zebra to Indexdata who +offered to look at them since the latest release of Zebra +is sufficiently different at the code level to make it +non-trivial for us to apply our code modifications to +it. + + +Cheers + +Pete Mallinson + +> +>Here is my article I wrote for the Harvest mailinglist. +> +>Many thanks for Zebra. +> +>------------------------------------------------------ +>Hi, +> +>The first results after some testing with Zebra are very promising. +> +>The tests were done with around 220 000 SOIF files, which occupies +>1.6GB of disk space. +> +>Building the index from scratch takes around one hour with Zebra where +>Glimpse needs around five hours. +> +>While glimpse blocks search requests when updating its index, Zebra +>can still answer search requests. +> +>While the search time of glimpse varies from some seconds to some +>minutes depending how expensive the query is, Zebra usually takes +>around one to three seconds, even for expensive queries. +> +>Glimpse' index occupies around 250MB of disk space, Zebra's index +>takes around 570MB. +> +>Zebra supports incremental indexing which will speed up indexing even +>further. +> +>There are still potential for faster searches when necessary, using +>tweaks on apache. +> +>On the other hand, modeling data is not complete, yet. +> +>To sum it up: +>- Zebra indexes data five times faster than Glimpse +>- Zebra doesn't cause downtimes for indexupdate +>- Zebra's search time doesn't jump from seconds to minutes for no +> obvious reason, but stays constant within a range of one to three +> seconds +>- Zebra can search more than 100 times faster than Glimpse +>- Zebra can process multiple search requests simultaneously +>- Zebra can speed up indexing by using incremental indexing +>- Glimpse's index size is only around half of the Zebra's index +> +>kj +>------------------------------------------------------ +> +>_______________________________________________ +>Zebralist mailing list +>Zebralist@indexdata.dk +>http://www.indexdata.dk/mailman/listinfo/zebralist +> + + + +_______________________________________________ +Zebralist mailing list +Zebralist@indexdata.dk +http://www.indexdata.dk/mailman/listinfo/zebralist + +From zebralist-admin@indexdata.dk Mon Nov 25 21:39:59 2002 +MIME-Version: 1.0 +Envelope-to: zebra@miketaylor.org.uk +Content-Type: text/plain; + charset="iso-8859-1" +From: Kang-Jin Lee +To: Pete +Subject: Re: [Zebralist] Some progress on Harvest's move to Zebra +User-Agent: KMail/1.4.3 +In-Reply-To: +Cc: zebralist@indexdata.dk +X-Spam-Level: +Sender: zebralist-admin@indexdata.dk +X-BeenThere: zebralist@indexdata.dk +X-Mailman-Version: 2.0.11 +Precedence: bulk +List-Help: +List-Post: +List-Subscribe: , + +List-Id: Zebra Information Server +List-Unsubscribe: , + +List-Archive: +Date: Mon, 25 Nov 2002 20:39:47 +0100 +X-Spam-Status: No, hits=-3.2 required=5.0 tests=IN_REP_TO,AWL version=2.20 +X-Spam-Level: +X-MIME-Autoconverted: from quoted-printable to 8bit by localhost.localdomain id gAPLdwK18535 + +Hi, + +On Monday 25 November 2002 11:19, Pete wrote: + +> On Sun, 24 Nov 2002, Kang-Jin Lee wrote: + +> >I finished first steps to use Zebra as fulltext engine for Harvest +> >(http://harvest.sourceforge.net/). The performance boost after +> >some testing are quite impressive. +> +> Hi ... I'd almost forgotten that the Harvest project is still active. + +It seems that everybody has forgotten Harvest. :-) + +> We had a heap of challenges with our Harvest setup and with the +> time taken to index and search ... we switched to using +> Harvest-NG as the "reaper/gatherer" and modified Zebra to +> work with SOIF and our own ranking algorithm - it's been in +> service for over 6 months now. + +I am very interested in your setup. Would it be possible to send +your configuration files and modifications to me? +I made some small modifications to soif.flt and am still wondering +which query I should use. It would be very nice if I don't have to +reinvent the wheel. + +> We had challenges with both speed of gathering and with +> speed of indexing and searching but most seem to be +> "managable" now. + +How big is your gatherer? + +> We offered our modifications to Zebra to Indexdata who +> offered to look at them since the latest release of Zebra +> is sufficiently different at the code level to make it +> non-trivial for us to apply our code modifications to +> it. + +I would like to take a look at the modifications, too. + +Thanks. + +kj + + +_______________________________________________ +Zebralist mailing list +Zebralist@indexdata.dk +http://www.indexdata.dk/mailman/listinfo/zebralist +