Create Database Tables and Transfers Based on an XML Schema or a XSL-FO Form

XML schema (XSD) is often used to describe XML documents. An XML schema may specify what XML documents are used in a system and the contents of the XML documents may be verified against this XML schema. An XML schema may also be used as a basis for creating a database structure that stores XML documents in an SQL database.

XSL-FO forms (used in 602XML product) contain XML data structure, convertible to a XSD scheme that's why the further description can be common.

The 602SQL Client contains an interactive designer for developing XML transfers. The 602SQL Client also contains a wizard that will create one or more database tables, an analytic DAD connecting XML documents to these tables and other objects for XML data import and export (a choice Start by creating tables and mapping for a XML schema or Start by creating tables and mapping for a FO form). The rules that affect the created tables and DAD will be described later.

The wizard uses the SOM interface from Xerces in the current version. This tool is a part of the installation of the 602SQL, common for Windows and Linux as well.

XML schema is a tool for describing XML document classes. Not every XML document class is suitable for representation in a relational database. Documents with freely structured contents (e.g. documents similar to HTML pages) are usually not converted to data and stored in a relational database.

Simple Types

Simple type data is present in an XML file either as attribute values or as text contents of some element. Columns will be created in database tables for storing these values. Relations between data types described in the XSD and column types are described in the following table:

XSD (FO) type	Database type
Boolean	BOOLEAN
date	DATE
time	TIME
dateTime	TIMESTAMP
gYear, gMonth, gYearMonth, gDay, gMonthDay	CHAR(13)
duration	CHAR(14)
base64Binary	BLOB
hexBinary	BLOB
anyType (if the internal structure is not analyzed)	NCLOB
integer, int, long, short, byte, unsignedInt, unsignedLong, unsignedShort, unsignedByte, nonPositiveInteger, nonNegativeInteger, positiveInteger, negativeInteger, decimal with fractionDigits=0	INT, BIGINT, SMALLINT, TINYINT (chosen according the type and totalDigits)
decimal with fractionDigits>0	NUMERIC(n,d), where n, d is chosen according to totalDigits and fractionDigits
double, float, decimal without fractionDigits	REAL
string, normalizedString, anyUri, token (and inferred)	CHAR(n) or NCHAR(n), where n is chosen according to length, maxLength, enumeration or pattern

The database type for a numeric XML type attempts to contain all values without a loss of precision (if possible). The length of the character strings is chosen according to the restriction in XSD.

Compositors, Empty and Non-specified Types

Compositors (sequence, choice or all) are equivalent with columns for all items (particles) of this type in a database table. Only columns that have content specified in the XML file will be filled with data. Only non-NULL attributes and elements will be exported to XML.

If an element does not have content or attributes, it will not have an equivalent column in the database. An exception to this case is when the occurrence of such element is optional (minOccurs=0). In order to reflect this information, the database table must contains a column of the BOOLEAN type bound to the element using two conversion SQL functions.

If an element has unspecified content (type any), it will be equivalent to the database NCLOB type. The entire content of the element, including nested elements, is copied into/from the database during import/export.

Supplemental Parameters for DAD and Table Creation

The wizard that creates database tables and the initial DAD design from an XSD file (or from a FO file) allows you to specify the following supplemental parameters:

Prefix of the table names	Beginning prefix of the created table names. The remaining part of the table name is formed by the name of the element, whose occurrence in the XML document corresponds with a record in the table. This may be empty.
Default string length	Length of the string corresponding with the character entry that cannot be derived from the length, maxLength, enumeration or pattern XSD specification.
Use Unicode strings	Choice of the CHAR(n) or NCCHAR(n) database type for character data.
ID column type	Type of columns that are used for relational links of database tables.
DAD name	A unique name of the new DAD.
Referential integrity constraints in tables	If more than one table is created for data, then relational links between the tables can be reflected within referential integrity rules. Writing of these rules into the table designs can be influenced by setting one of the three options: Either the rules are not created, or they are created in all places where the optimized table structure allows it, or the table structure is subordinated to the possibility to define a maximum number of referential integrity rules. In the last case, multiple tables can come to being, since the occurrence of the same structures at different schema locations will not save to common table.

Repeating Occurrences of an Element in XML Document

One record in the highest level table corresponds with the XML document imported to the database. Other records in other tables may be linked to this record.

If some element occurs multiple times in an XML document (the maxOccurs flag in XSD is higher than 1 or unbounded), then the values contained in this element and attributes form one record in an child table for each occurrence of this element.

Records in the child and parent tables are connected with the key values in the columns with names ID and UPPER_ID. The values of the ID column are generated by a sequence object when importing an XML document. The wizard creates this sequence automatically and will name it <prefix>id_generator, where <prefix> is the specified prefix of the table names (see above).

XSD may specify identical repeatable substructures on multiple positions in the XML document. Contents of these substructures are stored in a single database table. If there are more such substructures linked to a single parent table stored in a single child table, the parent table will also have the ID2, ID3, etc. columns besides the ID column.

Recursive Structures in an XML Schema

Recursion occurs in a schema if some element may contain a subelement of the same name and type.

DAD supports recursion. You must check the Recursive occurrence with the same content model button in the parameter of the recursive occurrence of an element. Attributes and content are not described for a recursive element as they are considered identical to the occurrence of this element on a higher level.

Optional Occurrence of an Attribute and Voluntary Occurrence of an Element

The content of an XML file may be variable to a certain extent. An attribute bound to a database column may have the flag (use=) in XSD, meaning that it is not mandatory. An element, whose content is bound to a database column may have a minimum occurrence count (minOccurs) equal to zero or may be placed in the optional branch of the choice or all group.

The default value specified in the table definition will be placed in the database column when importing data in this case (usually the NULL value).

The relevant attribute or element should not appear when exporting data back to XML. The DAD creation wizard guarantees this by defining the condition for attribute or element output. These rules are as follows:

An optional attribute will not appear in XML if the value of the corresponding database column is NULL.
An element may have zero occurrences in an XML document:
- If it contains text, it will not appear in XML if the column value bound to the text is NULL.
- If it does not contain text, but has a mandatory subelement that contains text. The previous rule will be applied to this subelement.
- If it does not contain text and no mandatory subelement contains text, the element will not appear in XML if any subelement contains text or an attribute bound to a database value is not NULL.
- If no such subelement exists, the element output is governed by the value of the special Boolean column.

Additional DAD Modifications

The wizard creates the initial DAD and the tables according to the specified schema. These objects may be modified at anytime. A typical modification may be:

Modification of the string column lengths in tables based on additional information from XML data files, or by replacing the NChar(n) type columns with the NCLOB type (or vice versa).
Substitution of character strings with numeric entries using the transfer functions for data specified as strings with a list of possible values (enumeration).
Change column types from BLOB to binary string of a fixed length. If the length of binary data in the XML file is fixed or limited.
Adding further attributes to the root element (e.g. information about the XSD file location).

Tables created by the wizard use a common sequence for generating keys (in the ID column, ID2 etc.). You may use different sequences for different tables. However, a common sequence is necessary if recursion is used or if the same substructures are repeated on multiple positions in the schema (in the same table).

Conditional Output to XML

XML Support

XML Programming Interface