Introducton to XML Schemas
byon 8-Dec-2009 at 09:00 AM (2094 Views)
New to Visual DataFlex 15.1 is XML Schema support, which is cool, but what if you don't know the first thing about XML Schemas?
Your first exposure to XML Schemas will most likely be a schema created by someone else. Perhaps you're expected to produce XML documents conforming to the specified schema. If you're lucky you will have been given an example of the document you're supposed to produce from your program, and maybe you can cheat a little and follow that instead of understanding the schema. But eventually you have to try to read and understand that gobbledygook of a schema.
As opposed to the old DTD technology, a Schema is actually pure XML itself. That makes schemas far more expressive and easier to use right away. If you've ever worked with DTDs, you'll appreciate the difference immediately. If you've never been exposed to DTDs before, lucky you. DTDs are rarely used anymore, and for the most part you're better off forgetting all about them.
XML Schemas are also intricately tied to XML namespaces, although it's possible to use schemas without namespaces, it's not very common. The Visual DataFlex 15.1 documentation has an excellent primer on XML namespaces, which I highly recommend. Keep in mind that namespaces often refers to both the actual namespace uri and the prefix (if any). The real name is the uri, while the prefix is arbitrary and really just a shortcut to the uri, it has no meaning outside the document where it was declared.
The XML Schema namespace uri is http://www.w3.org/2001/XMLSchema. Remember that the uri itself is what identifies the namespace, not what it points to. Because schemas are xml, there's in fact even a schema for XML schema.
An XML Schema starts like this:
As you can see, this example declares the namespace prefix xs to refer to the namespace uri http://www.w3.org/2001/XMLSchema. I will use the prefix xs in this article to refer to the schema namespace, but remember that the prefix can be anything. The prefix xs or xsd is by convention used to refer to the XML Schema namespace.
The top element of a schema is aptly named <xs:schema>. The most important attribute is targetNamespace, which specifies the namespace that this schema applies to. An additional namespace prefix tns is also defined in the example, which points to the target namespace. This allows us to easily refer to our own target namespace within the schema. Again, the prefix is arbitrary.
The elementFormDefault attribute specifies whether local (nested) elements are within the specified namespace or not. It's a complicated advanced option, and in most cases it's set to qualified, meaning that even nested elements are defined within the target namespace. In rare circumstances this can be unqualified instead.
In some cases you may also see attributeFormDefault, which can be set to qualified or unqualified. In most cases this is set to unqualified (which is the default if attributeFormDefault is not specified). Note that this is the opposite of the common use of qualified for elementFormDefault. It's a little confusing, just know that typically elementFormDefault is qualified and attributeFormDefault is unqualified.
If we look at the entire example schema in collapsed form, you can see it essentially has three separate sections. Remember that this is just an example, don't be confused by the content. This image is just to demonstrate the basic structure of a schema.
Schemas describe the structure and type of content of XML documents. The goal is to reduce misunderstandings and errors in interpretation of the data for both the producing and consuming endpoints. Schemas do this by describing the structure with <xs:complexType>, which specifies the names of elements and their hierarchical structure, as well as attribute names and where they are allowed. It also describes the actual data with <xs:simpleType>, similar to Visual DataFlex data types such as Date, Number, String etc. When you combine these two you can express rules such as "<zip> must be a child of <address> and the actual data for <zip>?????</zip> must be a number.
There are a bunch of built-in schema types, such as <xs:string>. Schema <xs:simpleType> definitions are typically inherited from another type in a hierarchy. But this is somewhat backwards compared to OOP class inheritance, where instead of augmenting and adding to the base type, you're typically restricting it further. Put another way, the valid data for a particular type is usually a subset of the valid data of the inherited type.
The base type is <xs:anyType>, which has no constraints other than it must be well formed XML. Basically, every XML document in the world conforms to <xs:anyType>. The built-in type <xs:decimal> for example, is restricted to a numeric value, allowing digits and the decimal point as well as +/- sign, and only in the form and order of a valid number. And then <xs:integer> is further restricted to only allow integral numbers. In this way schemas allow for a mechanism that restrict the data allowed for each type further and further.
In the same way as you can restrict numeric data, you can obviously inherit from <xs:string> and restrict the valid data to a specific set of strings. The above example defines a new simple type based on <xs:string> and restricts the valid data to specified set of strings, referred to as an enumeration.
Again, don't be confused by the enumeration values, which in this case are VDF datatype names. If it helps, pretend it's "car", "boat" etc. instead of the VDF type names "integer" and so on. I carefully considered whether to use this real Schema example or just make one up. I figured it's better with a real example than a fake, even if it unfortunately could be confusing with some of the names. As you might have guessed, the example Schema is used in Visual DataFlex for web services.
This simpleType definition can then be used with both attributes and elements to define the data represented.
Whereas a simpleType definition describes a data value, a complexType definition describes the structure of elements and attributes. A complexType is most commonly defined by a set of elements in order, which is done with <xs:sequence>. You can also use <xs:all> which specifies a set of elements in any order. There are also other more advanced techniques for defining a complexType with groups etc.
The most important attributes of <xs:element> are of course name which defines the element name, minOccurs, maxOccurs and type. minOccurs and maxOccurs specifies how many instances of the specified element are allowed. The type attribute refers to a simpleType or complexType schema definition, and thereby associates a type with the specified element name. Typically the type referred to is either a built-in type or a type defined within the same schema, but it could also be referring to external schema types.
Attributes are of course defined with <xs:attribute>, and the most important attributes besides name are type and use. The type attribute refers to a simpleType schema definition, similarly to type with <xs:element>, which specifies what kind of values are allowed for the attribute. The use attribute basically specifies whether the attribute must be present or not, and it can be optional, prohibited or required.
Top Level Element
Finally the schema typically also defines the top level elements allowed, and its type and structure. Often there's only one top level element allowed, which contains the entire document.
As can be seen in this example, complexType and simpleType definitions can also be anonymous types defined inline within an <xs:element> for example. This is typically used when the type is only used by one element in one place.
Reading the Schema
When reading a schema you typically start with one of the top level elements defined, and then follow type references.
In order to conform to this example schema there must be an element <webServiceObject> with the specified namespace http://www.dataaccess.com/schemas/WebServiceMetaData. This element contains a sequence of <documentation>, <types> and <operation> child elements in that order. It also has the name, title, namespace and soapBodyStyle attributes, which are all required.
If we look at the <types> element in detail, we find that it contains an unbounded number of child <struct> elements, each of the complexType named struct, which was previously defined in the schema. You can then follow the type references, elements and attributes in that manner.
Notice that the type references use the qualified name (e.g. type="tns:struct") of the type, with the namespace prefix which was referring to the target namespace of the schema, and the namespace prefix for XML Schema when referring to built-in types. Similarly, you can refer to types in other namespaces defined in other schemas as well. In order to be able to read and interpret a schema, these type references are probably the most important aspect.
This is in no way a complete description of XML Schema, but hopefully it will be enough of an introduction that if you previously didn't know anything about schemas, you should now be able to at least look at a schema and get a rough idea.
The XML Schema specification is the definitive source of information about XML schemas. It can be difficult to read, but it's where you go when you need answers. It's divided into three separate sections as follows:
XML Schema Primer: http://www.w3.org/TR/xmlschema-0/
XML Schema Structures: http://www.w3.org/TR/xmlschema-1/
XML Schema Datatypes: http://www.w3.org/TR/xmlschema-2/