<< Back - Alkacon logo

OpenCms 6.0 interactive documentation:

Search configuration: Documenttypes

OpenCms logo - Forward >>

Documenttype configuration

A documenttype node specifies which document factory should be used to pull the contents of an OpenCms resource with a distinct resource type and/or mimetype into a Lucene index document. For any matching combination of the specified resource types and the specified mimetypes, the given document factory is used.

<documenttype>
	<name>...</name>
	<class>...</class>
	<mimetypes>
		<mimetype>...</mimetype>
		...
	</mimetypes>
	<resourcetypes>
		<resourcetype>...</resourcetype>
		...
	</resourcetypes>
</documenttype>

Configuration nodes

The following nodes are used to specify a documenttype:

  • the <name> node gives the documenttype a unique name
  • the <class> node specifies the package/class name of the document factory 
  • either zero or more <mimetype> nodes specify a mimetype for resource contents handled with the given document factory. When indexing a resource, its mimetype is derived from the extension of the resource name.
  • one ore more <resourcetype> nodes specify an OpenCms resource type of resources handled with the given document factory

Example 1

This example shows how to configure a documenttype for PDF documents:

<documenttype>
	<name>pdf</name>
	<class>org.opencms.search.documents.CmsDocumentPdf</class>
	<mimetypes>
		<mimetype>application/pdf</mimetype>
	</mimetypes>
	<resourcetypes>
		<resourcetype>binary</resourcetype>
		<resourcetype>plain</resourcetype>
	</resourcetypes>
</documenttype>

Example 2

This example shows how to configure a documenttype for a COS module:

<documenttype>
	<name>news</name>
	<class>com.opencms.legacy.CmsCosDocument</class>
	<mimetypes/>
	<resourcetypes>				
		<resourcetype>com.alkacon.news.CmsNewsContent</resourcetype>
	</resourcetypes>					
</documenttype>

 

Available document classes

Currently, these document factories are part of the OpenCms search package:

  • org.opencms.search.documents.CmsDocumentGeneric
    Extracts index data from a VFS resource. This factory extracts only the property data like title, description and keywords, not the content and is used as base class of the other document factories.
  • org.opencms.search.documents.CmsDocumentPlainText
    Extracts index data from a document in plain text format.
  • org.opencms.search.documents.CmsDocumentRtf
    Extracts index data from a document in Rich Text (RTF) file format.
  • org.opencms.search.documents.CmsDocumentPdf
    Extracts index data from a document in Adobe Portable Document Format.
  • org.opencms.search.documents.CmsDocumentMsExcel
    Extracts index data from a document in Microsoft Excel 97(-2002) file format (BIFF8).
  • org.opencms.search.documents.CmsDocumentMsPowerPoint
    Extracts index data from a document in Microsoft Powerpoint file format.
  • org.opencms.search.documents.CmsDocumentMsWord
    Extracts index data from a document in Microsoft Word 97 file format.
  • org.opencms.search.documents.CmsDocumentXmlPage
    Extracts index data from a resource of type xmlpage.
    All tags in the content are filtered away, so the xmlpage elements can contain both XML and HTML data.
  • org.opencms.search.documents.CmsDocumentXmlContent
    Extracts index data from a resource of type xmlcontent.
  • com.opencms.legacy.CmsPageDocument
    Extracts index data from a resource of type page (belonging to the former xml template mechanism).
  • com.opencms.legacy.CmsCosDocument
    Extracts index data from any cos resource based on the OpenCms CmsMasterDataSet class.

Available resource types

Currently, OpenCms uses the following resource types:

  • binary (org.opencms.file.types.CmsResourceTypeBinary)
  • folder (org.opencms.file.types.CmsResourceTypeFolder)
  • image (org.opencms.file.types.CmsResourceTypeImage)
  • jsp (org.opencms.file.types.CmsResourceTypeJsp)
  • page (com.opencms.legacy.CmsResourceTypePage)
  • plain (org.opencms.file.types.CmsResourceTypePlain)
  • pointer (org.opencms.file.types.CmsResourceTypePointer)
  • xmlpage (org.opencms.file.types.CmsResourceTypeXmlPage)
  • xmlcontent (org.opencms.file.types.CmsResourceTypeXmlContent)

©2005 Alkacon Software GmbH (http://www.alkacon.com) - The OpenCms experts