2 # ============================================================================
3 # Zebra perl API header
4 # =============================================================================
7 # ============================================================================
8 package IDZebra::Filter;
13 IDZebra::init(); # ??? Do we need that at all (this is jus nmem init...)
17 # -----------------------------------------------------------------------------
19 # -----------------------------------------------------------------------------
21 my ($proto,$context) = @_;
22 my $class = ref($proto) || $proto;
24 $self->{context} = $context;
25 bless ($self, $class);
29 # -----------------------------------------------------------------------------
31 # -----------------------------------------------------------------------------
34 # This is ugly... could be passed as parameters... but didn't work.
36 my $dh = IDZebra::grs_perl_get_dh($self->{context});
37 my $mem = IDZebra::grs_perl_get_mem($self->{context});
38 my $d1 = IDZebra::Data1->get($dh,$mem);
40 my $rootnode = $self->process($d1);
41 IDZebra::grs_perl_set_res($self->{context},$rootnode);
46 my ($self, $buff) = @_;
47 $self->{_buff} = $buff;
50 # -----------------------------------------------------------------------------
51 # API Template - These methods should be overriden by the implementing class.
52 # -----------------------------------------------------------------------------
54 # This one is called once, when the module is loaded. Not in
55 # object context yet!!!
60 # Just going to return a root node.
61 return ($d1->mk_root('empty'));
64 # -----------------------------------------------------------------------------
66 # -----------------------------------------------------------------------------
68 my ($proto, $file, %args) = @_;
70 # print "Proto:$proto\n";
72 my $class = ref($proto) || $proto;
74 bless ($self, $class);
77 open ($th, $file) || croak ("Cannot open $file");
81 my $m = IDZebra::nmem_create();
82 my $d1=IDZebra::Data1->new($m,$IDZebra::DATA1_FLAG_XML);
83 if ($args{tabPath}) { $d1->tabpath($args{tabPath}); }
84 if ($args{tabRoot}) { $d1->tabroot($args{tabRoot}); }
86 my $rootnode = $self->process($d1);
87 $d1->pr_tree($rootnode);
88 $d1->free_tree($rootnode);
92 $self->{testh} = undef;
96 # -----------------------------------------------------------------------------
98 # -----------------------------------------------------------------------------
100 my ($self, $buff, $len, $offset) = @_;
102 if ($self->{testh}) {
103 return (read($self->{testh},$_[1],$len,$offset));
105 my $r = IDZebra::grs_perl_readf($self->{context},$len);
107 $buff = $self->{_buff};
108 $self->{_buff} = undef;
115 my ($self, $buffsize) = @_;
120 if ($self->{testh}) {
121 $r = read($self->{testh}, $self->{_buff}, $buffsize);
123 $r = IDZebra::grs_perl_readf($self->{context},$buffsize);
126 $res .= $self->{_buff};
127 $self->{_buff} = undef;
135 my ($self, $offset) = @_;
136 if ($self->{testh}) {
137 # I'm not sure if offset is absolute or relative here...
138 return (seek ($self->{testh}, $offset, $0));
140 return (IDZebra::grs_perl_seekf($self->{context},$offset)) ;
146 if ($self->{testh}) {
149 return (IDZebra::grs_perl_seekf($self->{context}));
154 my ($self, $offset) = @_;
155 if ($self->{testh}) {
158 IDZebra::grs_perl_endf($self->{context},$offset);
166 IDZebra::Filter - A superclass of perl filters for Zebra
173 our @ISA=qw(IDZebra::Filter);
183 my $rootnode=$d1->mk_root('meta');
190 When Zebra is trying to index/present a record, needs to extract information from it's source. For some types of input, "hardcoded" procedures are defined, but you have the opportunity to write your own filter code in Tcl or in perl.
192 The perl implementation is nothing but a package, deployed in some available location for Zebra (in I<profilePath>, or in PERL_INCLUDE (@INC)). This package is interpreted once needed, a filter object is created, armored with knowledge enough, to read data from the source, and to generate data1 structures as result. For each individual source "files" the process method is called.
194 This class is supposed to be inherited in all perl filter classes, as it is providing a way of communication between the filter code and the indexing/retrieval process.
196 =head1 IMPLEMENTING FILTERS IN PERL
198 All you have to do is to create a perl package, using and inheriting this one (IDZebra::Filter), and implement the "process" method. The parameter of this call is an IDZebra::Data1 object, representing a data1 handle. You can use this, to reach the configuration information and to build your data structure. What you have to return is a data1 root node. To create it:
200 my $rootnode=$d1->mk_root('meta');
202 where 'meta' is the abstract syntax identifier (in this case Zebra will try to locate meta.abs, and apply it). Then just continue to build the structure. See i<IDZebra::Data1> for details.
204 In order to get the input stream, you can use "virtual" file operators (as the source is not necessairly a file):
206 =item readf($buf,$len,$offset)
208 Going to read $len bytes of data from offset $offset into $buff
210 =item readall($bufflen)
212 Read the entire stream, by reading $bufflen bytes at once
220 Tells the current offset (?)
226 Optionally, you can implement an init call for your class. This call is not going to be called in object, but in class context. Stupid, eh?
228 =head1 TEST YOUR PERL FILTER
230 You can check the functionality of your filter code, by writing a small test program like
234 $res =pod->test($ARGV[0],
235 (tabPath=>'.:../../tab:../../../yaz/tab'));
237 This will try to apply the filter on the file provided as argument, and display the generated data1 structure. However, there are some differences, when running a filter in test mode:
238 - The include path is not applied from tabPath
239 - the tellf, and endf functions are not implemented (just ignored)
241 =head1 CONFIGURE ZEBRA TO USE A PERL FILTER
243 This is quite simple. Read the Zebra manual, and follow the instructions to create your zebra.cfg. For your I<recordType> choose 'grs.perl.<YourFilterClass>'.
244 Copy your filter module (YourFilterClass.pm) to a directory listed in I<profilePath>. i<profilePath> is added to @INC, when interpreting your package: so if you need to load modules from different locations than the default perl include path, just add these directories.
252 Peter Popovics, pop@technomat.hu
256 IDZebra, IDZebra::Data1, Zebra documentation