zookeeper/solr/collection1/conf/lang/userdict_ja.txt

   1 #
   2 # This is a sample user dictionary for Kuromoji (JapaneseTokenizer)
   3 #
   4 # Add entries to this file in order to override the statistical model in terms
   5 # of segmentation, readings and part-of-speech tags.  Notice that entries do
   6 # not have weights since they are always used when found.  This is by-design
   7 # in order to maximize ease-of-use.
   8 #
   9 # Entries are defined using the following CSV format:
  10 #  <text>,<token 1> ... <token n>,<reading 1> ... <reading n>,<part-of-speech tag>
  11 #
  12 # Notice that a single half-width space separates tokens and readings, and
  13 # that the number tokens and readings must match exactly.
  14 #
  15 # Also notice that multiple entries with the same <text> is undefined.
  16 #
  17 # Whitespace only lines are ignored.  Comments are not allowed on entry lines.
  18 #
  19
  20 # Custom segmentation for kanji compounds
  21 日本経済新聞,日本 経済 新聞,ニホン ケイザイ シンブン,カスタム名詞
  22 関西国際空港,関西 国際 空港,カンサイ コクサイ クウコウ,カスタム名詞
  23
  24 # Custom segmentation for compound katakana
  25 トートバッグ,トート バッグ,トート バッグ,かずカナ名詞
  26 ショルダーバッグ,ショルダー バッグ,ショルダー バッグ,かずカナ名詞
  27
  28 # Custom reading for former sumo wrestler
  29 朝青龍,朝青龍,アサショウリュウ,カスタム人名