602SQL Documentation Index  

CREATE FULLTEXT Statement

statement_CREATE_FULLTEXT ::= CREATE FULLTEXT [ scheme. ] identifier LANGUAGE language_number 
    [ [ NOT ] LEMMATIZED ] [ [ NOT ] SUBSTRUCTURED ] [ [ NOT ] SEPARATED ] [ [ NOT ] BIGINT_ID ]
    [ LIMITS pattern=size { , pattern=size }... ]
    [ WORD_STARTERS word_starters ] 
    [ WORD_CHARS word_chars ]
    [ ( fulltext_trigger { , fulltext_trigger }... ) ]
fulltext_trigger ::= table_name document_id_column document_column [, mode ] [ , format ]
word_starters ::= string
word_chars ::= string

The CREATE FULLTEXT statement creates a new fulltext system in the specified schema with the specified identifier. It creates a word table, reference table and sequence for document numbering in the schema. Executing this statement creates all mentioned objects and relations, but also builds the fulltext system with data present from the specified table.

The language number is a number from the following table:

Number Language Number Language
0 Czech 3 English (US)
1 Slovak 4 English (UK)
2 German    

The LEMMATIZED flag designates a fulltext system that uses words in the basic form. The NOT LEMMATIZED flag designates words in the forms they are present in the document. If nothing is specified, LEMMATIZED is assumed.

The SUBSTRUCTURED flag designates a fulltext system that uses a package (e.g. ZIP). The NOT SUBSTRUCTURED flag does not index a package. If nothing is specified, SUBSTRUCTURED is assumed.

The SEPARATED flag designates a fulltext system that builds the index outside the database file to a separate file on the disk. The folder for these files is the same as the folder for the main database file by default (can be changed by the ExtFulltextDir server property or in the Runtime Parameters window). If this flag is not set, NOT SEPARATED is used.

The BIGINT_ID flag sets the range of the document numbers (document identification in the fulltext). If not specified, type INT is assumed.

The LIMITS flag restricts indexing of big external files. Description on the page File size restriction.

The WORD_STARTERS and WORD_CHARS clauses allow to define which character string is regarded as a word. WORD_STARTERS defines which character starts a word; WORD_CHARS determines which characters the word contains starting from the second character in a word. A word can start either with a letter or with a character contained within the word_starters string. A word then continues as long as some letters or characters contained within the word_chars string follow. If the WORD_STARTERS clause is not introduced during the fulltext system creation, a word can start with (apart from letters) characters '_@0123456789'. If the WORD_CHARS clause is not introduced during the fulltext system creation, a word can continue with (apart from letters) characters '_@0123456789-'.

A fulltext trigger is a request for automatic indexing of certain documents (this is not a normal database trigger). The table name and columns are set in the trigger. The document column ID must be INT or BIGINT type. The document column must be CLOB, BLOB or string type.

A fulltext trigger need not be specified - in such case, an empty fulltext system will be created that is not filled "automatically" but only after explicit invocation of the Fulltext_index_doc function. This is used in situations when on-line indexing of large documents takes so much server capacity that it slows down other client work; therefore indexing is postponed e.g. to the night hours; it is done together for multiple records - for more details, see Fulltext systems - indexing.

The mode is a number that specifies the document storage method:

Mode Storage method
0 Stored directly in the table using the Document column
1 Stored on the file system with the filename stored in the table using the Document column

If mode is not specified, 0 is assumed.

Format is a character string that defines the document format. If empty or not specified, the fulltext system will try to determine the document format based on the document contents. If specified, only documents in the that format are indexed. The supported formats are:

Format Description
PLAIN | TXT | TEXT Unformatted text
HTM | HTML HTML format
RTF RTF format
DOC DOC format (Microsoft Word or OpenOffice.org)
XLS XLS format (Microsoft Excel or OpenOffice.org)
PDF PDF format
PPT PPT format (Microsoft PowerPoint or OpenOffice.org)
OOO OpenOffice.org ver. 1 (*.sxw, *.sxc etc.) or 2 (*.odt, *.ods etc.) formats

Examples

The simplest case - indexing of values of a single column called DOC in the DOCTAB table, the column with unique values is called ID:

CREATE FULLTEXT FT_TEST LANGUAGE 4 NOT LEMMATIZED NOT SUBSTRUCTURED
(DOCTAB ID_DOC DOC,'TXT')

Indexing of binary documents of various formats, including the zipped ones, inserted into a BLOB called BINARYDOC, the fulltext will be external (i.e. it will be stored outside of the database file):

CREATE FULLTEXT FT_BINARYTEST LANGUAGE 4 NOT LEMMATIZED SUBSTRUCTURED SEPARATED
(DOCTAB ID_DOC BINARYDOC,'')

Indexing of questions and answers in a discussion panel (FORUM_TAB table), both columns use a single ID column - in the search result, it will not be possible to distinguish the columns.

CREATE FULLTEXT FT_FORUM LANGUAGE 3 NOT LEMMATIZED NOT SUBSTRUCTURED
(FORUM_TAB ID F_QUESTION,'TXT',
 FORUM_TAB ID F_ANSWER,'TXT')

Same as above, but each indexed table has its own ID column, both ID columns are filled with one sequence - in the search result, it will be possible to distinguish the columns.

CREATE FULLTEXT FT_FORUM LANGUAGE 3 NOT LEMMATIZED NOT SUBSTRUCTURED
(FORUM_TAB ID_Q F_QUESTION,'TXT',
 FORUM_TAB ID_A F_ANSWER,'TXT')

A fulltext system that is filled manually, i.e. not using the fulltext triggers, but using the Fulltext_index_doc function.

CREATE FULLTEXT FT_FORUM LANGUAGE 3 LEMMATIZED NOT SUBSTRUCTURED
()

See