OpenCms 6.0 documentation :: Search configuration: Analyzers

OpenCms 6.0 interactive documentation:

Search configuration: Analyzers

[Alkacon Documentation]

OpenCms search documentation

* Analyzers

Analyzer configuration

Analyzers are used to control how the content of a document is broken into 'terms' (words) during the indexing process. For example, an analyzer can remove common words, normalize plural to singular and can perform other language-specific operations in order to improve the search quality.

<analyzer>
	<class>...</class>
	<stemmer>...</stemmer>
	<locale>...</locale>
</analyzer>

Configuration nodes

The following nodes are used to specify an analyzer:

the <locale> node specifies the locale like "de" or "en" used within an index configuration node to specify the appropriate analyzer of the contents of an index.
the <class> node specifies the package/class name of the analyzer class.
the <stemmer> node is used to specify the stemmer algorithm of the analyzer.

Available analyzers

Currently, these analyzers are part of the OpenCms search package:

org.apache.lucene.analysis.de.GermanAnalyzer
Analyzer for german language content.
org.apache.lucene.analysis.ru.RussianAnalyzer
Analyzer for russian language content.
org.apache.lucene.analysis.standard.StandardAnalyzer
Analyzer for english and other language content.
org.apache.lucene.analysis.snowball.SnowballAnalyzer
Analyzer for various languages, see the snowball homepage.
For this analyzer, the language is specified using the additional parameter with values: Danish, Dutch, English, Finnish, French, German, Italian, Lovins, Norwegian, Porter, Portuguese, Russian, Spanish, Swedish

Example

This example shows how to configure an analyzer for contents in french language:

<analyzer>
	<class>org.apache.lucene.analysis.snowball.SnowballAnalyzer</class>
	<stemmer>French</stemmer>
	<locale>fr</locale>
</analyzer>