tclrobot.git
21 years agoFix check for content-type ZMBOT.0.1
Adam Dickmeiss [Mon, 13 Jan 2003 13:59:07 +0000 (13:59 +0000)]
Fix check for content-type

21 years agoLook for Tcl on Debian systems
Adam Dickmeiss [Fri, 20 Sep 2002 09:45:02 +0000 (09:45 +0000)]
Look for Tcl on Debian systems

21 years agounset meta attributes (so they are reset for next meta)
Adam Dickmeiss [Tue, 18 Jun 2002 19:57:53 +0000 (19:57 +0000)]
unset meta attributes (so they are reset for next meta)

22 years agoRemove code that skips ?'s in URL
Adam Dickmeiss [Mon, 25 Mar 2002 16:13:21 +0000 (16:13 +0000)]
Remove code that skips ?'s in URL

22 years ago*** empty log message ***
Adam Dickmeiss [Mon, 25 Mar 2002 16:11:08 +0000 (16:11 +0000)]
*** empty log message ***

22 years agoFix unvisited status
Adam Dickmeiss [Thu, 28 Feb 2002 14:04:11 +0000 (14:04 +0000)]
Fix unvisited status

22 years agoRobot honour robots meta tag
Adam Dickmeiss [Sun, 17 Feb 2002 09:29:18 +0000 (09:29 +0000)]
Robot honour robots meta tag

22 years agoFile status written with counts of areas: unvisited, bad, visited.
Adam Dickmeiss [Wed, 14 Nov 2001 09:15:23 +0000 (09:15 +0000)]
File status written with counts of areas: unvisited, bad, visited.
Tag area src=.. used for relative links.

22 years agoMIME check when reading HTTP header (not when reading content).
Adam Dickmeiss [Tue, 13 Nov 2001 11:17:26 +0000 (11:17 +0000)]
MIME check when reading HTTP header (not when reading content).
File robots.txt always read - even when text/plain is denied.

22 years agoRobot follows <frame src=...>.
Adam Dickmeiss [Fri, 9 Nov 2001 13:26:50 +0000 (13:26 +0000)]
Robot follows <frame src=...>.

22 years agoAdded tests script.
Adam Dickmeiss [Thu, 8 Nov 2001 14:22:21 +0000 (14:22 +0000)]
Added tests script.

22 years agoFixed bug regarding relative URLs.
Adam Dickmeiss [Thu, 8 Nov 2001 13:49:06 +0000 (13:49 +0000)]
Fixed bug regarding relative URLs.

22 years agoFixed bug in skipSpace (didn't check for null-byte).
Adam Dickmeiss [Thu, 8 Nov 2001 10:23:02 +0000 (10:23 +0000)]
Fixed bug in skipSpace (didn't check for null-byte).

22 years agoUse simpler regular expression to avoid Tcl regsub error (Tcl8.0.4-5).
Adam Dickmeiss [Wed, 7 Nov 2001 11:50:07 +0000 (11:50 +0000)]
Use simpler regular expression to avoid Tcl regsub error (Tcl8.0.4-5).

22 years agoGlob-expressions may be expressed as a list in rules (multi-OR).
Adam Dickmeiss [Wed, 7 Nov 2001 11:30:52 +0000 (11:30 +0000)]
Glob-expressions may be expressed as a list in rules (multi-OR).

22 years agoRobot saves metadata with unique names in directory "flat" (if it exists).
Adam Dickmeiss [Wed, 31 Oct 2001 08:51:49 +0000 (08:51 +0000)]
Robot saves metadata with unique names in directory "flat" (if it exists).

22 years agoPattern may be negated in rules (! as first character does that)
Adam Dickmeiss [Tue, 30 Oct 2001 08:29:54 +0000 (08:29 +0000)]
Pattern may be negated in rules (! as first character does that)

22 years agoImplemented Allow/deny rules. Better Tcl autoconfig.
Adam Dickmeiss [Fri, 26 Oct 2001 13:26:11 +0000 (13:26 +0000)]
Implemented Allow/deny rules. Better Tcl autoconfig.

22 years agoYet another fix regarding relative links.
Adam Dickmeiss [Fri, 29 Jun 2001 22:25:55 +0000 (22:25 +0000)]
Yet another fix regarding relative links.

22 years agoAdded option to specify Accept-Language.
Adam Dickmeiss [Fri, 29 Jun 2001 21:47:31 +0000 (21:47 +0000)]
Added option to specify Accept-Language.

22 years agoFixes for robots.txt handling (bug introduced by previous commit).
Adam Dickmeiss [Thu, 7 Jun 2001 08:17:00 +0000 (08:17 +0000)]
Fixes for robots.txt handling (bug introduced by previous commit).

22 years agoBug fix for relative links.
Adam Dickmeiss [Thu, 7 Jun 2001 08:10:10 +0000 (08:10 +0000)]
Bug fix for relative links.

22 years agoAdded some character entities for mapping.
Adam Dickmeiss [Wed, 6 Jun 2001 09:37:18 +0000 (09:37 +0000)]
Added some character entities for mapping.

22 years agoAdded README. Ignore case in keywords in robots.txt.
Adam Dickmeiss [Wed, 6 Jun 2001 07:10:31 +0000 (07:10 +0000)]
Added README. Ignore case in keywords in robots.txt.

22 years agomaxDistance set to 50 default.
Adam Dickmeiss [Tue, 5 Jun 2001 08:44:50 +0000 (08:44 +0000)]
maxDistance set to 50 default.

22 years agoRemove characters after semicolon in header contents.
Adam Dickmeiss [Tue, 5 Jun 2001 07:46:00 +0000 (07:46 +0000)]
Remove characters after semicolon in header contents.

23 years agoMinor changes.
Adam Dickmeiss [Tue, 27 Feb 2001 10:45:44 +0000 (10:45 +0000)]
Minor changes.

23 years agoAdded config for zebra/zmbol.
Adam Dickmeiss [Mon, 26 Feb 2001 22:51:51 +0000 (22:51 +0000)]
Added config for zebra/zmbol.

23 years agoMinor fix for anchor references.
Adam Dickmeiss [Tue, 23 Jan 2001 14:28:41 +0000 (14:28 +0000)]
Minor fix for anchor references.

23 years agoRemoved YAZ dependency.
Adam Dickmeiss [Tue, 23 Jan 2001 12:05:06 +0000 (12:05 +0000)]
Removed YAZ dependency.

23 years agoAdded options for the robot.
Adam Dickmeiss [Tue, 23 Jan 2001 11:26:43 +0000 (11:26 +0000)]
Added options for the robot.

23 years agoMultiple http connections. Bug fixes.
Adam Dickmeiss [Tue, 23 Jan 2001 09:20:32 +0000 (09:20 +0000)]
Multiple http connections. Bug fixes.

23 years agoFixed problem with links having .. for root directory of web server.
Adam Dickmeiss [Mon, 11 Dec 2000 17:11:03 +0000 (17:11 +0000)]
Fixed problem with links having .. for root directory of web server.
Thank you FrontPage.

23 years agoImplemented robots.txt rules.
Adam Dickmeiss [Sun, 10 Dec 2000 22:27:48 +0000 (22:27 +0000)]
Implemented robots.txt rules.

23 years agoFile robots.txt now read the each domain.
Adam Dickmeiss [Fri, 8 Dec 2000 22:46:53 +0000 (22:46 +0000)]
File robots.txt now read the each domain.
Pages are now fetched in a Round-robin fashion.

23 years agoDCdot doesn't rely on htmlSwitch no more.
Adam Dickmeiss [Fri, 8 Dec 2000 08:55:35 +0000 (08:55 +0000)]
DCdot doesn't rely on htmlSwitch no more.

23 years agoAdded -nonest for htmlSwitch statement. Robot puts reference to
Adam Dickmeiss [Thu, 7 Dec 2000 20:16:11 +0000 (20:16 +0000)]
Added -nonest for htmlSwitch statement. Robot puts reference to
bad URLs in bad area.

24 years agoMajor speed improvement.
Adam Dickmeiss [Mon, 27 Dec 1999 11:49:30 +0000 (11:49 +0000)]
Major speed improvement.

25 years agoUpdated configure script.
Adam Dickmeiss [Thu, 4 Feb 1999 21:32:00 +0000 (21:32 +0000)]
Updated configure script.

25 years agoChanged tags for the output.
Per M. Hansen [Thu, 4 Feb 1999 20:37:25 +0000 (20:37 +0000)]
Changed tags for the output.

25 years agoMinor changes.
Adam Dickmeiss [Thu, 15 Oct 1998 13:27:19 +0000 (13:27 +0000)]
Minor changes.

25 years agoAdded configure script.
Adam Dickmeiss [Thu, 15 Oct 1998 12:31:25 +0000 (12:31 +0000)]
Added configure script.

25 years agoBuf fixes. Robot saves body of text without tags and java script sections.
Adam Dickmeiss [Thu, 15 Oct 1998 12:30:59 +0000 (12:30 +0000)]
Buf fixes. Robot saves body of text without tags and java script sections.

27 years agoInitial revision
Adam Dickmeiss [Tue, 6 Aug 1996 14:04:22 +0000 (14:04 +0000)]
Initial revision