--- /dev/null
+zmbot: a Simple Web harvesting robot for Z'mbol.
+
+Introduction
+
+ zmbot is a simple web harvester written in Tcl. The following
+ summaries the features:
+
+ o Simple administration. One script does the job and no external
+ database is required to operate.
+
+ o Interruptible. Harvesting may safely be stopped/interrupted at any
+ point.
+
+ o Gentle harvesting. By default a site is visited once per minute -
+ robots.txt honored.
+
+ o Concurrent harvesting (jobs) in one process and one thread.
+
+ o Inspects content-type header to determine structure of page.
+
+ o Written in Tcl and is quite portable. (Some may not think this as being
+ feature; Perl version is welcomed!).
+
+ o Creates simple XML output. One file per URL.
+
+ The robot is started from the command line and takes one or more URL's
+ as parameter(s). Options, prefixed with minus, alter the behaviour of
+ the harvesting. The following options are supported:
+
+ -j jobs The maximum number of concurrent HTTP sessions; default 5 jobs.
+
+ -i idle Idle time in microseconds between visits to the same site;
+ default 60000 = 60 seconds.
+
+ -c count Maximum distance from original URL as given from the command
+ line; default 50.
+
+
+ -d domain Only sites matching domain are visited. The domain given is
+ a Tcl glob expression (.e.g *.somwhere.com). Remember to
+ quote the domain when given on the command line so that your
+ shell doesn't expand this. This option may be repeated thus
+ allowing you to specify many "allowed" domains.
+
+ Example 1: Harvest three links away from www.somwhere.com world-wide:
+ ./robot.tcl -c 3 http://www.somwhere.com/
+
+ Example 2: Harvest the site www.somwhere.com only:
+ ./robot.tcl -d www.somewhere.com http://www.somewhere.com/
+
+ Example 3: Harvest up to two click from www.a.dk and www.b.dk in dk-domain:
+ ./robot.tcl -d '*.dk' -c 2 http://www.a.dk/ http://www.b.dk/
+
+ The zmbot robot creates three directories, visited, unvisited, bad
+ for visited pages, unvisited pages, and bad pages respectively. The
+ visited area holds keywords and metadata for all successully retrieved
+ pages. The unvisited area serves as a "todo" list of pages to be visited
+ in the future. The bad area holds pages that for some reason cannot be
+ retrieved: non-existant, permission denied, robots.txt disallow, etc.
+
+Installation:
+
+ $ ./configure
+ $ make
+
+ The configure script looks for the Tcl shell, tclsh, to determine the
+ location of Tcl and its configuration file tclConfig.sh. To manually specify
+ Tcl's location, add --with-tclconfig and specify the directory where
+ tclConfig.sh is installed. For example:
+ ./configure --with-tclconfig=/usr/local/lib
+