Analyzer configuration
Analyzers are used to control how the content of a document
is broken into 'terms' (words) during the indexing process. For example, an
analyzer can remove common words, normalize plural to singular and can perform
other language-specific operations in order to improve the search
quality.
<analyzer>
<class>...</class>
<stemmer>...</stemmer>
<locale>...</locale>
</analyzer>
Configuration nodes
The following nodes are used to specify an analyzer:
- the <locale> node specifies the locale like
"de" or "en" used within an index configuration node to specify the
appropriate analyzer of the contents of an index.
- the <class> node specifies the package/class
name of the analyzer class.
- the <stemmer> node is used to specify
the stemmer algorithm of the analyzer.
Available analyzers
Currently, these analyzers are part of the OpenCms search package:
- org.apache.lucene.analysis.de.GermanAnalyzer
Analyzer for german language content. - org.apache.lucene.analysis.ru.RussianAnalyzer
Analyzer for russian language content. - org.apache.lucene.analysis.standard.StandardAnalyzer
Analyzer for english and other language content. - org.apache.lucene.analysis.snowball.SnowballAnalyzer
Analyzer for
various languages, see the snowball homepage. For this analyzer, the language is
specified using the additional parameter with values: Danish, Dutch, English,
Finnish, French, German, Italian, Lovins, Norwegian, Porter, Portuguese,
Russian, Spanish, Swedish
Example
This example shows how to configure an analyzer for contents in french
language:
<analyzer>
<class>org.apache.lucene.analysis.snowball.SnowballAnalyzer</class>
<stemmer>French</stemmer>
<locale>fr</locale>
</analyzer>
|