This is not an exhaustive list of all the constructs that appear in XML; it provides an introduction to the key constructs most often encountered in day-to-day use.
Except for a small number of specifically excluded control characters, any character defined by Unicode may appear within the content of an XML document.
XML includes facilities for identifying the encoding of the Unicode characters that make up the document, and for expressing characters that, for one reason or another, cannot be used directly.
Although the design of XML focuses on documents, the language is widely used for the representation of arbitrary data structures such as those used in web services.
Several schema systems exist to aid in the definition of XML-based languages, while programmers have developed many application programming interfaces (APIs) to aid the processing of XML data. XML-based formats have become the default for many office-productivity tools, including Microsoft Office (Office Open XML), Open and Libre Office (Open Document), and Apple's i Work. Apple has an implementation of a registry based on XML.
An example of a valid comment: XML 1.0 (Fifth Edition) and XML 1.1 support the direct use of almost any Unicode character in element names, attributes, comments, character data, and processing instructions (other than the ones that have special symbolic meaning in XML itself, such as the less-than sign, " The XML specification defines an XML document as a well-formed text, meaning that it satisfies a list of syntax rules provided in the specification.
Some key points in the fairly lengthy list include: The definition of an XML document excludes texts that contain violations of well-formedness rules; they are simply not XML.
XML also provides a mechanism whereby an XML processor can reliably, without any prior knowledge, determine which encoding is being used.
Encodings other than UTF-8 and UTF-16 are not necessarily recognized by every XML parser.
The characters making up an XML document are divided into markup and content, which may be distinguished by the application of simple syntactic rules.