X-Git-Url: http://git.indexdata.com/?a=blobdiff_plain;f=doc%2Frecordmodel-domxml.xml;h=bf31b74c720b20a8224c259279db677b5cc8d430;hb=43e4297c07b9c8b29bfc1ea647fc27456198f6ce;hp=bb5b300578cbfe00549ae91bfa5757ba2419d80b;hpb=c99c50f588fb803362a47a933c988360ab1cd98c;p=idzebra-moved-to-github.git diff --git a/doc/recordmodel-domxml.xml b/doc/recordmodel-domxml.xml index bb5b300..bf31b74 100644 --- a/doc/recordmodel-domxml.xml +++ b/doc/recordmodel-domxml.xml @@ -1,7 +1,7 @@ - + &dom; &xml; Record Model and Filter Module - + The record model described in this chapter applies to the fundamental, structured &xml; @@ -365,13 +365,66 @@ - The unique record instruction - may have additional attributes id and - rank, where the value of the opaque ID - may be any string not containing the whitespace character - ' ', and the rank value must be a + + The unique record instruction + may have additional attributes id, + rank and type. + Attribute id is the value of the opaque ID + and may be any string not containing the whitespace character + ' '. + The rank attribute value must be a non-negative integer. See - + . + The type attribute specifies how the record + is to be treated. The following values may be given for + type: + + + insert + + + The record is inserted. If the record already exists, it is + skipped (i.e. not replaced). + + + + + replace + + + The record is replaced. If the record does not already exist, + it is skipped (i.e. not inserted). + + + + + delete + + + The record is deleted. If the record does not already exist, + it is skipped (i.e. nothing is deleted). + + + + + update + + + The record is inserted or replaced depending on whether the + record exists or not. This is the default behavior but may + be effectively changed by "outside" the scope of the DOM + filter by zebraidx commands or extended services updates. + + + + + Note that the value of type is only used to + determine the action if and only if the Zebra indexer is running + in "update" mode (i.e zebraidx update) or if the specialUpdate + action of the + Extended + Service Update is used. + For this reason a specialUpdate may end up deleting records! @@ -400,9 +453,21 @@ for details. + + + &dom; input documents which are not resulting in both one + unique valid + record instruction and one or more valid + index instructions can not be searched and + found. Therefore, + invalid document processing is aborted, and any content of + the <extract> and + <store> pipelines is discarted. + A warning is issued in the logs. + + - The examples work as follows: From the original &xml; file @@ -569,7 +634,7 @@ - + @@ -589,9 +654,109 @@ ]]> + + + +
+ &dom; Indexing &marcxml; + + The &dom; filter allows indexing of both binary &marc; records + and &marcxml; records, depending on it's configuration. + A typical &marcxml; record might look like this: + + + 42 + 00366nam 22001698a 4500 + 11224466 + DLC + 00000000000000.0 + 910710c19910701nju 00010 eng + + 11224466 + + + DLC + DLC + + + 123-xyz + + + Jack Collins + + + How to program a computer + + + Penguin + + + 8710 + + + p. cm. + + + ]]> + + + - Notice also, - that the names and types of the indexes can be defined in the + It is easily possible to make string manipulation in the &dom; + filter. For example, if you want to drop some leading articles + in the indexing of sort fields, you might want to pick out the + &marcxml; indicator attributes to chop of leading substrings. If + the above &xml; example would have an indicator + ind2="8" in the title field + 245, i.e. + + + How to program a computer + + ]]> + + one could write a template taking into account this information + to chop the first 8 characters from the + sorting index title:s like this: + + + + + 0 + + + + + + + + + + + + + + ]]> + + The output of the above &marcxml; and &xslt; excerpt would then be: + + How to program a computer + program a computer + ]]> + + and the record would be sorted in the title index under 'P', not 'H'. + +
+ + +
+ &dom; Indexing Wizardry + + The names and types of the indexes can be defined in the indexing &xslt; stylesheet dynamically according to content in the original &xml; records, which has opportunities for great power and wizardry as well as grande @@ -651,6 +816,25 @@
+
+ Debuggig &dom; Filter Configurations + + It can be very hard to debug a &dom; filter setup due to the many + sucessive &marc; syntax translations, &xml; stream splitting and + &xslt; transformations involved. As an aid, you have always the + power of the -s command line switch to the + zebraidz indexing command at your hand: + + zebraidx -s -c zebra.cfg update some_record_stream.xml + + This command line simulates indexing and dumps a lot of debug + information in the logs, telling exactly which transformations + have been applied, how the documents look like after each + transformation, and which record ids and terms are send to the indexer. + +
+ + + @@ -683,7 +867,7 @@ - + @@ -699,6 +883,7 @@ + -->