CIS Logo

University of Oregon
Computer & Information Science

CIS 410/510 - Domain-Specific Languages
XML and XSLT
Fall 2000

[Cut to the chase] [Resources]

XML is relevant to domain-specific languages in two distinct ways: As an example of a domain-specific language for representing structured data, and as a representation with associated tools and methods for supporting domain-specific languages. It is like Yacc in regard to these two perspectives. Yacc source files describe concrete syntax in the form of a context-free grammar, while XML document type descriptions describe abstract syntax. With Yacc (or Bison, or JavaCC, or CUP, or ...) you can fairly easily describe a concrete syntax that is customized to your domain, but building an internal representation (abstract syntax graph, or whatever) is left to you. With XML, you describe the internal representation which is equivalent to an abstract syntax tree, and you get a kind of concrete syntax representation (tagged text) which looks more like the debugging dump of a tree than like the kind of concrete syntax you might have described with Yacc.

XML as a DSL

From the perspective of DSL design, the history of XML is at least as interesting as its current form. SGML was designed for exchange of structured documents among organizations. The mantra of SGML is separation of content from presentation, the idea being that each organization would run a tagged document through its own tools to obtain the kind of presentation (printed, online, etc.) that it wanted. Key to this capability was the capability to define document types with semantically meaningful tags. Interoperability was achieved to the extent that several organizations could agree on a set of tags appropriate to their domain. Docbook, for technical manuals, is a well-known example of a standardized SGML document design.

HTML kept the tagged concrete structure of SGML, but for the sake of simplicity and performance it imposed a simple, fixed set of tags and took away the ability to define new sets of tags. XML gives back that ability, and is a direct descendant of SGML (and not really a direct descendant of HTML), but with lots of house-cleaning to remove the SGML features that made it difficult to process. Interestingly, XML is not easier for humans to process --- it takes away SGML features that made human reading and writing easier, in order to make machine reading and writing easier. The most obvious example is removing the optional end-tag capability (like the optional </p> in html). Requiring end-tags makes it possible to unambiguously parse documents that lack DTDs, but makes documents more verbose.

XML as a DSL Construction Tool

When organizations agree on a standard XML document type definition for data exchange, they are essentially designing a language for representing data within a particular domain. Usually it is appropriate to focus on abstract syntax for designing data exchange languages, just as one would concentrate on database schemata and not on the particular displays that would be generated from a database. In this sense, XML is a (representational) tool for designing domain-specific languages.

Part of the leverage of using XML as the concrete representation of an abstract syntax for a given domain is that we can re-use any applicable tools that operate on generic XML, independent of document type. Nothing new here: We're just using "layering" of representations. We do the same thing when we use the Unix tool "wc" on both technical manuals and program source code, exploiting the fact that both are represented as Unix text files.

The main tools that we can leverage by using XML as a representation framework are:

XSLT: Deja vu all over again

In principle, the DOM API gives you everything you need to manipulate and transform XML trees. As we've discussed over and over again, a DSL is often an alternative to an API, with some advantages and some disadvantages. We can write Java or C++ code to transform XML trees using the DOM interface, or we can write tree transformation code in XSLT.

One of the advantages that a DSL often has over an API is a customized concrete syntax. XSLT doesn't quite fit the mold here: The concrete syntax of XSLT is XML (tagged text), and it isn't pretty. (There is at least one attempt to define a more concise "surface language" for XSL, including XSLT. It is called xslscript. Some other advantages of DSLs over APIs typically result from interpretation and late binding, and in this regard XSLT is typical.

XSLT is particularly well-suited to systematically walking over an XML tree and producing an HTML document, a formatted document using the other part of XSL, or even a text document whose content is derived from the XML document. You could do the same with the DOM interface, but the amount of Java or C or C++ code you would write would be larger, and the code/test/debug cycle would probably be a lot slower.

Resources

XML Pocket Reference by Robert Eckstein (O'Reilly, 1999) is a good, compact reference. XSLT was changing as that book went to press, so it isn't completely up to date in that respect.

Transforming XML with XSLT, Chapter 7 of Building Oracle XML Applications by Steve Meunch (O'Reilly 2000) is very helpful, and it's available as a sample chapter on-line. In fact, this is so good I've decided to forgive O'Reilly for the dreadfull UML in a Nutshell, which had put me off the whole Nutshell series. I don't have any particular interest in Oracle databases, but I may buy this book just for the lucid presentation of XSLT programming. (Thanks to Michael Richard for finding this.)

XSL Transformations, Chapter 14 of the XML Bible, also looks useful, though I haven't read much of it yet. (Thanks to Ze6ke* Wander for finding this.)

Another tutorial on using XSL is at http://nwalsh.com/docs/tutorials/xsl/ I think this is the one I printed and handed out.

Look in /cs/classes/cis510dsl/xml for XT (an XSL processor). xt.sh is a driver script.

I have also downloaded xslscript into the xml subdirectory; this is supposedly a human-readable surface syntax for xsl transforms. I have not tested it yet --- if you do, please drop me a note with your observations.

The xml/packages subdirectory of /cs/classes/cis510dsl contains zipped archives of xt and xslscript to make it a little easier for you to install them on a different machine or in a different location.

Your Assignment

Here's what I have in mind ... subject to discussion and negotiation of course.

GXL is an emerging standard XML representation for directed, attributed graphs. The "dot" representation has been used for many years for roughly the same purposes that GXL is likely to be used for. There are tools that produce printable and editable diagrams from the "dot" representation (in fact, "dot" is really the name of the primary graph layout program, and the file format is just named after the tool). Can we transform GXL graphs into dot graphs using XSLT? It sounds relatively straightforward, but I don't really know yet. Shall we try?

[Here] is my latest cut at a conversion script --- it's grown quite a bit since the version I walked through in class, and adjusted to some but not all of the recent changes to GXL. Can you improve it?

A particularly cool extension would be to let users supply a few parameters to determine how the graph is drawn, e.g., mapping the "type" attribute of nodes to various shapes and choosing which attributes are used as node and edge labels. There are two approaches one might use, which are not exclusive. One would be to structure the transformations so that it is relatively easy to write a new style sheet that imports this one but overrides a few pieces. (This has the feel of inheritance in object-oriented programming.) The other would be to use template parameters and variables as supported by XSLT.

Simpler extensions are acceptable ... have fun with this and don't let it take too much time away from your work on the email transaction processing DSL.

*In case you're a non-local reading this, "Ze6ke" is not a typo. The 6 is silent.


Last change Fri, Nov 3, 2000 by Michal Young