There are several levels to this, depending on whether you're using a DTD or a more powerful schema language like XML Schema (XSD):
1. DTD Data Types (Limited)
DTDs have very limited support for data types. Essentially, everything is treated as text, although there are a few special keywords that provide some constraints.
- #PCDATA (Parsed Character Data):
- This is the most common type for element content.
- It means the element can contain text. The text will be parsed by the XML processor, meaning that entity references (like < and &) will be replaced with their corresponding characters, and nested elements will be processed.
- Example: <!ELEMENT title (#PCDATA)>
- CDATA (Character Data):
- Used for attribute values (not element content).
- Means the attribute can contain any text. Entity references will not be processed within a CDATA attribute. This is the most common attribute type.
- Example: <!ATTLIST book title CDATA #REQUIRED>
- EMPTY:
- Specifies that an element must be empty (have no content).
- Example: <!ELEMENT br EMPTY> (similar to the HTML <br> tag)
- ANY:
- Specifies that an element can contain any content (elements or text). This is generally discouraged because it provides no constraints.
- Example: <!ELEMENT anything ANY>
- Enumerated Types (for Attributes):
- You can specify a list of allowed values for an attribute.
- Example: <!ATTLIST payment type (cash | creditcard | check) "cash">
- This means the type attribute of the payment element can only have the values "cash", "creditcard", or "check". The default value is "cash".
- ID:
- An attribute type that must be unique within the entire XML document. Used to create identifiers for elements.
- Example: <!ATTLIST product id ID #REQUIRED>
- IDREF:
- An attribute type that references an ID attribute of another element within the same document. Used to create relationships between elements.
- Example: <!ATTLIST order itemRef IDREF #REQUIRED>
- IDREFS:
- An attribute type for space separated list of IDREF.
- NMTOKEN:
- A name token. The value must start with a letter, an underscore (), or a colon (:). It can be followed by letters, digits, hyphens (-), underscores (), colons (:), or full stops (.).
- NMTOKENS:
- Space seperated NMTOKEN.
- ENTITY Types (ENTITY, ENTITIES, NOTATION): These are rarely used in modern XML and relate to DTD-specific features for defining entities and notations.
2. XML Schema (XSD) Data Types (Rich and Powerful)
XML Schema (XSD) provides a much richer set of data types, allowing for precise control over the content of elements and attributes. This is a major advantage of XSD over DTDs.
- Built-in Data Types: XSD has a large number of built-in data types, categorized as:
- String Types: string, normalizedString, token, language, Name, NCName, ID, IDREF, IDREFS, ENTITY, ENTITIES, NMTOKEN, NMTOKENS
- Numeric Types: integer, decimal, float, double, positiveInteger, nonPositiveInteger, negativeInteger, nonNegativeInteger, long, int, short, byte, unsignedLong, unsignedInt, unsignedShort, unsignedByte
- Date and Time Types: date, time, dateTime, duration, gYear, gMonth, gDay, gYearMonth, gMonthDay
- Boolean Type: boolean (values are true or false, or 1 or 0)
- Binary Types: base64Binary, hexBinary
- AnyURI: anyURI (for Uniform Resource Identifiers)
- Simple Types vs. Complex Types:
- Simple Types: Define the data type for the content of an element or the value of an attribute. The built-in types above are simple types.
- Complex Types: Define the structure of an element, including its child elements and attributes. Used to create more complex, nested structures.
- User-Defined Types: You can create your own custom data types by:
- Restriction: Limiting the allowed values of an existing type (e.g., creating a type that's a string but only allows certain values, or a number within a specific range).
- List: Creating a type that is a list of values of another type.
- Union: Creating a type that can be one of several different types.
- Example (XSD):
XML
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="product">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="price" type="xs:decimal"/>
<xs:element name="quantity" type="xs:positiveInteger"/>
<xs:element name="available" type="xs:boolean"/>
<xs:element name="releaseDate" type="xs:date"/>
</xs:sequence>
<xs:attribute name="id" type="xs:ID" use="required"/>
</xs:complexType>
</xs:element>
</xs:schema>
This XSD defines a product element with:
- A name (string).
- A price (decimal number).
- A quantity (positive integer).
- An available flag (boolean).
- A releaseDate (date).
- A required id attribute (unique identifier).
3. RELAX NG Data Types
RELAX NG also supports rich data typing, similar to XSD. It can use the same built-in data types as XSD, and it also allows for defining custom types.
Key Differences and Recommendations
- DTD: Very limited data typing (mostly just text).
- XSD: Rich and powerful data typing, allowing for precise control over data formats.
- RELAX NG: Similar data typing capabilities to XSD, often considered more user-friendly.
Recommendations:
- For simple validation where data types aren't critical, DTDs can be used, but XSD is generally preferred.
- For any situation where you need to enforce specific data types (numbers, dates, booleans, etc.), use XSD or RELAX NG.
- For web APIs and data exchange, JSON Schema is often used with JSON data, providing similar data typing capabilities to XSD.
In summary, understanding data types is essential for creating well-structured and valid XML documents. While DTDs offer basic type constraints, XML Schema (XSD) and RELAX NG provide far more powerful and flexible mechanisms for defining the precise data types of your elements and attributes, leading to better data quality and interoperability.