1 From zebralist-admin@indexdata.dk Sun Nov 24 23:16:24 2002
3 Envelope-to: zebra@miketaylor.org.uk
4 Content-Type: text/plain;
6 From: Kang-Jin Lee <lee@arco.de>
7 To: zebralist@indexdata.dk
8 User-Agent: KMail/1.4.3
10 Subject: [Zebralist] Some progress on Harvest's move to Zebra
11 Sender: zebralist-admin@indexdata.dk
12 X-BeenThere: zebralist@indexdata.dk
13 X-Mailman-Version: 2.0.11
15 List-Help: <mailto:zebralist-request@indexdata.dk?subject=help>
16 List-Post: <mailto:zebralist@indexdata.dk>
17 List-Subscribe: <http://www.indexdata.dk/mailman/listinfo/zebralist>,
18 <mailto:zebralist-request@indexdata.dk?subject=subscribe>
19 List-Id: Zebra Information Server <zebralist.indexdata.dk>
20 List-Unsubscribe: <http://www.indexdata.dk/mailman/listinfo/zebralist>,
21 <mailto:zebralist-request@indexdata.dk?subject=unsubscribe>
22 List-Archive: <http://www.indexdata.dk/pipermail/zebralist/>
23 Date: Sun, 24 Nov 2002 20:45:19 +0100
24 X-Spam-Status: No, hits=-1.0 required=5.0 tests=AWL version=2.20
26 X-MIME-Autoconverted: from quoted-printable to 8bit by localhost.localdomain id gAONGNK15639
30 I finished first steps to use Zebra as fulltext engine for Harvest
31 (http://harvest.sourceforge.net/). The performance boost after
32 some testing are quite impressive.
34 Here is my article I wrote for the Harvest mailinglist.
36 Many thanks for Zebra.
38 ------------------------------------------------------
41 The first results after some testing with Zebra are very promising.
43 The tests were done with around 220 000 SOIF files, which occupies
46 Building the index from scratch takes around one hour with Zebra where
47 Glimpse needs around five hours.
49 While glimpse blocks search requests when updating its index, Zebra
50 can still answer search requests.
52 While the search time of glimpse varies from some seconds to some
53 minutes depending how expensive the query is, Zebra usually takes
54 around one to three seconds, even for expensive queries.
56 Glimpse' index occupies around 250MB of disk space, Zebra's index
59 Zebra supports incremental indexing which will speed up indexing even
62 There are still potential for faster searches when necessary, using
65 On the other hand, modeling data is not complete, yet.
68 - Zebra indexes data five times faster than Glimpse
69 - Zebra doesn't cause downtimes for indexupdate
70 - Zebra's search time doesn't jump from seconds to minutes for no
71 obvious reason, but stays constant within a range of one to three
73 - Zebra can search more than 100 times faster than Glimpse
74 - Zebra can process multiple search requests simultaneously
75 - Zebra can speed up indexing by using incremental indexing
76 - Glimpse's index size is only around half of the Zebra's index
79 ------------------------------------------------------
81 _______________________________________________
82 Zebralist mailing list
83 Zebralist@indexdata.dk
84 http://www.indexdata.dk/mailman/listinfo/zebralist
86 From mike@miketaylor.org.uk Sun Nov 24 23:41:14 2002
87 Date: Sun, 24 Nov 2002 23:41:13 GMT
88 From: Mike Taylor <mike@miketaylor.org.uk>
90 X-Was-CC: zebralist@indexdata.dk
91 Cc: mike@localhost.localdomain
92 In-reply-to: <200211242045.19196.lee@arco.de> (message from Kang-Jin Lee on
93 Sun, 24 Nov 2002 20:45:19 +0100)
94 Subject: Re: [Zebralist] Some progress on Harvest's move to Zebra
96 > Date: Sun, 24 Nov 2002 20:45:19 +0100
97 > From: Kang-Jin Lee <lee@arco.de>
99 > Here is my article I wrote for the Harvest mailinglist.
103 It's nice to read all this good stuff about Zebra! I'm currently
104 working on changes to the documentation for the next Zebra release,
105 and I'd love to include a lightly-edited version of your message in
106 the new document. (Basically, I'd obscure the name of your old
107 engine, so it's clear that we're trying to say good things about Zebra
108 rather than score points off a competitor.) Would it be OK for me to
109 quote you? If yes in principle, then I'll run the actual wording past
110 you before submitting it.
114 _/|_ _______________________________________________________________
115 /o ) \/ Mike Taylor <mike@miketaylor.org.uk> www.miketaylor.org.uk
116 )_v__/\ "You question the worthiness of my code? I should kill you
117 where you stand!" -- Klingon Programming Mantra
119 From lee@arco.de Mon Nov 25 10:02:13 2002
121 Envelope-to: mike@miketaylor.org.uk
122 Content-Type: text/plain;
123 charset="iso-8859-15"
124 From: Kang-Jin Lee <lee@arco.de>
125 To: Mike Taylor <mike@miketaylor.org.uk>
126 Subject: Re: [Zebralist] Some progress on Harvest's move to Zebra
127 Date: Mon, 25 Nov 2002 08:27:42 +0100
128 User-Agent: KMail/1.4.3
129 In-Reply-To: <200211242340.gAONefg15769@localhost.localdomain>
130 X-Spam-Status: No, hits=-4.4 required=5.0 tests=IN_REP_TO version=2.20
133 X-MIME-Autoconverted: from quoted-printable to 8bit by seatbooker.net id JAA28796
137 On Monday 25 November 2002 00:40, you wrote:
138 > > Date: Sun, 24 Nov 2002 20:45:19 +0100
139 > > From: Kang-Jin Lee <lee@arco.de>
141 > > Here is my article I wrote for the Harvest mailinglist.
145 > It's nice to read all this good stuff about Zebra! I'm currently
146 > working on changes to the documentation for the next Zebra release,
147 > and I'd love to include a lightly-edited version of your message in
148 > the new document. (Basically, I'd obscure the name of your old
149 > engine, so it's clear that we're trying to say good things about Zebra
150 > rather than score points off a competitor.) Would it be OK for me to
151 > quote you? If yes in principle, then I'll run the actual wording past
152 > you before submitting it.
154 You are welcome to do this.
156 I am very happy to see such a nice software available under GPL.
162 From zebralist-admin@indexdata.dk Mon Nov 25 11:13:10 2002
164 Envelope-to: zebra@miketaylor.org.uk
165 From: Pete <P.D.Mallinson@liverpool.ac.uk>
166 X-X-Sender: qq15@uxa.liv.ac.uk
167 To: Kang-Jin Lee <lee@arco.de>
168 cc: zebralist@indexdata.dk
169 Subject: Re: [Zebralist] Some progress on Harvest's move to Zebra
170 In-Reply-To: <200211242045.19196.lee@arco.de>
171 Content-Type: TEXT/PLAIN; charset=US-ASCII
173 Sender: zebralist-admin@indexdata.dk
174 X-BeenThere: zebralist@indexdata.dk
175 X-Mailman-Version: 2.0.11
177 List-Help: <mailto:zebralist-request@indexdata.dk?subject=help>
178 List-Post: <mailto:zebralist@indexdata.dk>
179 List-Subscribe: <http://www.indexdata.dk/mailman/listinfo/zebralist>,
180 <mailto:zebralist-request@indexdata.dk?subject=subscribe>
181 List-Id: Zebra Information Server <zebralist.indexdata.dk>
182 List-Unsubscribe: <http://www.indexdata.dk/mailman/listinfo/zebralist>,
183 <mailto:zebralist-request@indexdata.dk?subject=unsubscribe>
184 List-Archive: <http://www.indexdata.dk/pipermail/zebralist/>
185 Date: Mon, 25 Nov 2002 10:19:37 +0000 (GMT)
186 X-Spam-Status: No, hits=-4.4 required=5.0 tests=IN_REP_TO version=2.20
190 On Sun, 24 Nov 2002, Kang-Jin Lee wrote:
194 >I finished first steps to use Zebra as fulltext engine for Harvest
195 >(http://harvest.sourceforge.net/). The performance boost after
196 >some testing are quite impressive.
198 Hi ... I'd almost forgotten that the Harvest project is still active.
200 We had a heap of challenges with our Harvest setup and with the
201 time taken to index and search ... we switched to using
202 Harvest-NG as the "reaper/gatherer" and modified Zebra to
203 work with SOIF and our own ranking algorithm - it's been in
204 service for over 6 months now.
206 We had challenges with both speed of gathering and with
207 speed of indexing and searching but most seem to be
210 We offered our modifications to Zebra to Indexdata who
211 offered to look at them since the latest release of Zebra
212 is sufficiently different at the code level to make it
213 non-trivial for us to apply our code modifications to
222 >Here is my article I wrote for the Harvest mailinglist.
224 >Many thanks for Zebra.
226 >------------------------------------------------------
229 >The first results after some testing with Zebra are very promising.
231 >The tests were done with around 220 000 SOIF files, which occupies
232 >1.6GB of disk space.
234 >Building the index from scratch takes around one hour with Zebra where
235 >Glimpse needs around five hours.
237 >While glimpse blocks search requests when updating its index, Zebra
238 >can still answer search requests.
240 >While the search time of glimpse varies from some seconds to some
241 >minutes depending how expensive the query is, Zebra usually takes
242 >around one to three seconds, even for expensive queries.
244 >Glimpse' index occupies around 250MB of disk space, Zebra's index
247 >Zebra supports incremental indexing which will speed up indexing even
250 >There are still potential for faster searches when necessary, using
253 >On the other hand, modeling data is not complete, yet.
256 >- Zebra indexes data five times faster than Glimpse
257 >- Zebra doesn't cause downtimes for indexupdate
258 >- Zebra's search time doesn't jump from seconds to minutes for no
259 > obvious reason, but stays constant within a range of one to three
261 >- Zebra can search more than 100 times faster than Glimpse
262 >- Zebra can process multiple search requests simultaneously
263 >- Zebra can speed up indexing by using incremental indexing
264 >- Glimpse's index size is only around half of the Zebra's index
267 >------------------------------------------------------
269 >_______________________________________________
270 >Zebralist mailing list
271 >Zebralist@indexdata.dk
272 >http://www.indexdata.dk/mailman/listinfo/zebralist
277 _______________________________________________
278 Zebralist mailing list
279 Zebralist@indexdata.dk
280 http://www.indexdata.dk/mailman/listinfo/zebralist
282 From zebralist-admin@indexdata.dk Mon Nov 25 21:39:59 2002
284 Envelope-to: zebra@miketaylor.org.uk
285 Content-Type: text/plain;
287 From: Kang-Jin Lee <lee@arco.de>
288 To: Pete <P.D.Mallinson@liverpool.ac.uk>
289 Subject: Re: [Zebralist] Some progress on Harvest's move to Zebra
290 User-Agent: KMail/1.4.3
291 In-Reply-To: <Pine.GSO.4.44.0211251007060.15395-100000@uxa.liv.ac.uk>
292 Cc: zebralist@indexdata.dk
294 Sender: zebralist-admin@indexdata.dk
295 X-BeenThere: zebralist@indexdata.dk
296 X-Mailman-Version: 2.0.11
298 List-Help: <mailto:zebralist-request@indexdata.dk?subject=help>
299 List-Post: <mailto:zebralist@indexdata.dk>
300 List-Subscribe: <http://www.indexdata.dk/mailman/listinfo/zebralist>,
301 <mailto:zebralist-request@indexdata.dk?subject=subscribe>
302 List-Id: Zebra Information Server <zebralist.indexdata.dk>
303 List-Unsubscribe: <http://www.indexdata.dk/mailman/listinfo/zebralist>,
304 <mailto:zebralist-request@indexdata.dk?subject=unsubscribe>
305 List-Archive: <http://www.indexdata.dk/pipermail/zebralist/>
306 Date: Mon, 25 Nov 2002 20:39:47 +0100
307 X-Spam-Status: No, hits=-3.2 required=5.0 tests=IN_REP_TO,AWL version=2.20
309 X-MIME-Autoconverted: from quoted-printable to 8bit by localhost.localdomain id gAPLdwK18535
313 On Monday 25 November 2002 11:19, Pete wrote:
315 > On Sun, 24 Nov 2002, Kang-Jin Lee wrote:
317 > >I finished first steps to use Zebra as fulltext engine for Harvest
318 > >(http://harvest.sourceforge.net/). The performance boost after
319 > >some testing are quite impressive.
321 > Hi ... I'd almost forgotten that the Harvest project is still active.
323 It seems that everybody has forgotten Harvest. :-)
325 > We had a heap of challenges with our Harvest setup and with the
326 > time taken to index and search ... we switched to using
327 > Harvest-NG as the "reaper/gatherer" and modified Zebra to
328 > work with SOIF and our own ranking algorithm - it's been in
329 > service for over 6 months now.
331 I am very interested in your setup. Would it be possible to send
332 your configuration files and modifications to me?
333 I made some small modifications to soif.flt and am still wondering
334 which query I should use. It would be very nice if I don't have to
337 > We had challenges with both speed of gathering and with
338 > speed of indexing and searching but most seem to be
341 How big is your gatherer?
343 > We offered our modifications to Zebra to Indexdata who
344 > offered to look at them since the latest release of Zebra
345 > is sufficiently different at the code level to make it
346 > non-trivial for us to apply our code modifications to
349 I would like to take a look at the modifications, too.
356 _______________________________________________
357 Zebralist mailing list
358 Zebralist@indexdata.dk
359 http://www.indexdata.dk/mailman/listinfo/zebralist