An XML document has a specific structure, composed of several key components. Understanding this anatomy is crucial for creating well-formed and valid XML. Here's a breakdown:
1. XML Declaration (Optional, but Recommended)
- Purpose: Provides information about the XML document itself.
- Placement: Must be the very first line of the document, if present.
- Syntax:
XML
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
- Attributes:
- version: Specifies the XML version (usually "1.0").
- encoding: Specifies the character encoding used in the document (e.g., "UTF-8", "ISO-8859-1"). UTF-8 is highly recommended for broad compatibility.
- standalone: Indicates whether the document relies on external resources (like a DTD). Values are "yes" or "no". This is less commonly used.
2. Root Element (Required)
- Purpose: The top-level element that encloses all other elements in the document. It's the parent of all other elements.
- Requirement: Every XML document must have exactly one root element.
- Example:
XML
<bookstore>
</bookstore>
In this case, <bookstore> is the root element.
3. Elements (Required)
- Purpose: The fundamental building blocks of an XML document. They contain the actual data.
- Structure:
- Start Tag: <elementName>
- Content: Can be text, other elements (nested), or a mixture of both.
- End Tag: </elementName>
- Empty Elements: Elements with no content can be represented in two ways:
- <elementName></elementName>
- <elementName /> (self-closing tag)
- Nesting: Elements can be nested within other elements, creating a hierarchical structure. Proper nesting is crucial for well-formedness.
- Example:
XML
<book>
<title>The Lord of the Rings</title>
<author>J.R.R. Tolkien</author>
<price>25.99</price>
</book>
4. Attributes (Optional)
- Purpose: Provide additional information about an element. They do not contain the primary data; they modify or describe the element.
- Placement: Attributes are placed within the start tag of an element.
- Syntax: attributeName="attributeValue"
- Rules:
- Attribute values must be enclosed in quotes (single or double).
- An element cannot have two attributes with the same name.
- Attribute names are case-sensitive.
- Example:
XML
<book category="fiction" isbn="978-0618260266">
<title>The Lord of the Rings</title>
</book>
Here, category and isbn are attributes of the book element.
5. Text Content
- Purpose: The actual data within an element.
- Placement: Between the start and end tags of an element.
- Example:
XML
<title>The Lord of the Rings</title>
"The Lord of the Rings" is the text content of the title element.
6. Comments (Optional)
- Purpose: Add explanatory notes or temporarily disable parts of the XML document. Comments are ignored by XML parsers.
- Syntax: ``
- Placement: Comments can appear anywhere in the document, except within a tag.
- Restrictions: Comments cannot be nested.
7. CDATA Sections (Optional)
- Purpose: Used to include blocks of text that might contain characters that would otherwise be interpreted as XML markup (e.g., <, >, &). Everything within a CDATA section is treated as plain text.
- Syntax: <![CDATA[ ... ]]>
- Example:
XML
<description>
<![CDATA[
This is a description that includes < and > characters,
which would normally need to be escaped.
]]>
</description>
8. Processing Instructions (Optional)
- Purpose: Provide instructions to applications that process the XML document. They are not part of the document's data.
- Syntax: <?target instructions?>
- Example:<?xml-stylesheet type="text/css" href="style.css"?> (This is a common processing instruction that links an XML document to a CSS stylesheet).
9. Entity References (Optional)
- Purpose: Used as a short hand for longer pieces of text, or to represent characters that need to be escaped.
- There are 5 pre-defined entities in XML.
- < represents <
- > represents >
- & represents &
- ' represents '
- " represents "
Complete Example:
XML
<?xml version="1.0" encoding="UTF-8"?> <bookstore> <book category="fiction"> <title>The Hitchhiker's Guide to the Galaxy</title> <author>Douglas Adams</author>
<year>1979</year>
<price>12.99</price>
<description>
<![CDATA[
This book contains < and > characters, and & (ampersand) characters.
]]>
</description>
<isbn> <ISBN> </isbn>
</book>
<book category="science">
<title>A Brief History of Time</title>
<author>Stephen Hawking</author>
<year>1988</year>
<price>18.99</price>
</book>
</bookstore>
Key Takeaways:
- XML Declaration: Provides metadata about the XML document.
- Root Element: The single top-level element that contains everything else.
- Elements: The building blocks, containing data and nested elements.
- Attributes: Provide additional information about elements.
- Text Content: The actual data within the elements.
- Comments: For notes and documentation.
- CDATA Sections: For including blocks of text with special characters.
- Well-Formedness: Following the syntax rules strictly is essential for XML.
By understanding these components and their rules, you can create well-formed XML documents that can be reliably processed by applications. This structure is fundamental to how XML is used for data storage, exchange, and configuration.