Serving SOAP through XSLT

Arjen Baart <arjen@andromeda.nl>

April 15, 2003

Abstract:
Web services involve one machine calling a function on another machine through the internet. Although there is nothing new about it, there is now a standard for exchanging data. That standard is called SOAP: Simple Object Access Protocol. Software is now emerging that implements the SOAP standard, both on the client and on the server side. The objective of that software is to make it easy for programmers to create SOAP applications. It is not hard to write your own code that implements a SOAP application. By using some readily available open source tools, it is actually quite simple to make a SOAP server. In this paper, I will show a SOAP server that creates an actual function call in an interpreted language by using an XSLT transformation. The programming language used to make the SOAP server is PHP, because of its availability in an Apache web server but any other interpreted language like Perl may be used as well. The remote procedure call that arrives as a SOAP message from the client is transformed into a piece of PHP code by using an XSLT style sheet. The generated piece of code is subsequently executed by the PHP interpreter.

1 Introduction

The concept of network services has been around for a few decades. Ever since Sun came up with the implementation of RPC (Remote Procedure Call) one machine has been able to call a function on another machine across a network. Although the idea is simple enough, actually using a remote procedure or serving one across the internet has never been a trivial job. Certainly when different platforms or programming languages are involved, having one machine call a function on another machine invariably creates more problems than it is worth. This situation may have improved a bit with the use of Java or CORBA but still did not seem to provide the ultimate solution.

The latest step in the right direction is a standard based on XML which allows different machines to exchange information in a platform independent manner. That standard is called SOAP: Simple Object Access Protocol. To call a function on another computer, a programmer puts the name of the function, along with its parameters in a prescribed XML message and sends that message to the destination. The destination must of course understand the message and must be prepared to execute the requested function. If all works out, the programmer receives a reply, also in an XML message, from the remotely executed function. Any protocol can be used to deliver the SOAP messages but the most popular are HTTP and SMTP.

The web service server, i.e. the computer that performs its little trick on behalf of other computers on the internet, needs to dissect the SOAP message to determine what function to execute and what the parameters are. Once this information is extracted, the function can be called with the appropriate parameters. The result of this operation is repacked into a SOAP message that is returned as a reply to the calling client. As you can see on, for example, http://www.xmethods.net/, there is a long list of web services that are available and the list is growing at a steady pace.

As stated before, SOAP is a standard that uses XML as its base syntax. The next section will therefore briefly explain the basic syntactical elements of XML. And, since we're talking about XML anyway, we also will discuss XSLT at an elementary level. XSLT (extensible Stylesheet Language Transformations) is another standard based upon XML that this paper leans on. We will get on target in section 3, in which I will briefly explain what a SOAP message looks like. In section 4 we will shift our attention to actually creating web services. In this section, I will discuss a few of the existing web service programming environments. Certainly those in the open source deserve our attention. Apart from those existing projects, a different way to provide a web service is presented in section 5. In this section we will go into the details of building a SOAP based web service by generating a piece of script that is subsequently executed. And, of course, any respectable paper needs a few conclusions and recommendations for future work, so I reserved the final section for those.

2 XML and XSLT

The syntax of XML looks a bit like that of HTML, although the syntactical rules are more strict. An XML document is a collection of self-describing data, organized in a tree structure. The tree is built with elements which may contain other elements, attributes and textual content. An element is any piece of the document that is enclosed in corresponding tags. Just like in HTML there is an open tag and the close tag to accompany it. Unlike HTML, every element must have a close tag. Here is a small example:

   <?xml version='1.0'?>
   <root>
      <element attribute='example'>
         Some content...
      </element>
      <element>
         Content of the second element.
      </element>
   </root>
As you can see from the example above, an XML document always starts with the <?xml ...?> processing instruction and an XML document has exactly one root element. Elements can also have attributes, which are put inside the open tag. The example clearly shows the syntax of an attribute. The value of an attribute must always be present and must be enclosed in quotes.

2.1 Namespaces in XML

The names of the tags in XML do not have any meaning until the designer of the XML application gives them a meaning. For example, in HTML, the <H1> tag has the meaning of 'level 1 heading' and is usually rendered in big bold letters. If you would write the same <H1> tag in XML, it may very well mean the chemical formula for a single hydrogen atom or whatever is relevant to your application. In XML, everyone can create their own tags. This will lead to problems when XML applications from different designers are combined. After all, chances are that tags with the same name but entirely different semantics will be used, leading to a clash in these names.

To prevent name clashes when XML documents from different sources are combined, the W3C invented the concept of namespaces. Each tag and attribute name can be prepended by a namespace prefix. The namespace prefix and the tag or attribute name are separated with a colon (:). The namespace prefix must be declared to refer to a globally unique namespace name, by using the reserved attribute xmlns. The XML example below shows the previous example, extended with namespace declarations:

   <?xml version='1.0'?>
   <root xmlns:first='http://www.andromeda.nl/'
            xmlns:second='http://some.unique.url/'>
      <first:element first:attribute='example'>
         Some content...
      </element>
      <second:element>
         Content of the second element.
      </element>
   </root>
By using namespaces, the uniqueness of the tag and attribute names is guaranteed.

2.2 Transformation style sheets

An XML document can be turned into just about anything by running it through an XSLT (eXtensible Stylesheet Language Transformation) processor. An XSLT processor takes two inputs: the XML document to transform and the style sheet that specifies the transformation. The output will be whatever the style sheet defines. In the style sheet, patterns (actually XPath expressions) are used to match specific parts of the source XML document and replace that part with something else in the output. The following example shows how you might replace the element element from the previous example with the text Replaced text:

   <xsl:template match='element'>
      Replaced text
   </xsl:template>
The style sheet declares what output should be produced when a pattern in the XML document is matched. The XSLT transformation process is rule based with the templates being the basis of the rule system. In that sense, XSLT is a bit like awk, in which the input is matched against a set of rules and the actions occur when a rule 'fires'. Inside an xsl:template element, several constructs are available to control the transformation process. For example, the xsl:apply-templates element instructs the XSLT processor to scan the child elements of the matched node and look for matches of those elements. Suppose, you have a simple XML document with book as the root element and heading elements to separate your chapters. The following XSLT templates will transform that structure into HTML:
<xsl:template match="book">
<html> <head> </head>
<body>
   <xsl:apply-templates/>
</body>
</html>
</xsl:template>

<xsl:template match="heading">
<h2><xsl:apply-templates/></h2>
</xsl:template>
There is a lot more to tell about XSLT but that would go beyond the scope of this paper. For a complete coverage of XSLT, see [5].

3 SOAP

SOAP (Simple Object Access Protocol) is a communication protocol for exchanging information in a platform independent way. Most web services available on the internet work by exchanging messages in the SOAP format, usually through the HTTP protocol. A client application uses the POST method to send a SOAP message to the server application and receives a SOAP response message. The SOAP specification (see [6]) describes a SOAP message as having three elements. Figure 1 shows the structure of a SOAP message. The root element is the Envelope, which contains a Header and a Body element. The Header element is optional, while the Envelope and Body elements are mandatory. The SOAP elements must be in the namespace declared with the namespace name http://schemas.xmlsoap.org/soap/envelope/, as shown in the example below:

      <SOAP:Envelope
            xmlns:SOAP='http://schemas.xmlsoap.org/soap/envelope/'>
        <SOAP:Header>
        <SOAP:Header>
        <SOAP:Body>
           <!-- The payload of the message comes here -->
        </SOAP:Body>
      </SOAP:Envelope>
The Header element can be used to provide additional information about the message, such as authentication or transaction information. The part we are interested in is the Body element, which must contain the name of the function or procedure the client wishes to call, followed by the arguments passed to that function. Here is a simplified example of a SOAP message that requests the currency exchange rate from US dollars into euros:
   <SOAP:Envelope
         xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/">
      <SOAP:Body>
        <ns1:getRate xmlns:ns1="urn:xmethods-CurrencyExchange">
         <country1>usa</country1>
         <country2>euro</country2>
        </ns1:getRate>
      </SOAP:Body>
   </SOAP:Envelope>
This is a web service that is actually available on services.xmethods.net. The function we're trying to call here is:
   getRate("usa", "euro");
The two arguments to this function are country1 = "usa" and country2 = "euro". The name of the function is the immediate child element of the SOAP's Body element: ns1:getRate. Note the namespace prefix which is subsequently declared with the xmlns:ns1 attribute. The arguments to the function are the child elements of the function element. The name of the element is the name of the parameter and the value of the parameter is in the content of the parameter's element.

4 SOAP in many languages

Several tools, mainly libraries of classes and functions, are available to create SOAP based client or server applications. Most of these SOAP development platforms provide support for Java and C++, although libraries are also available for other programming languages, such as Perl, Tcl, Python and PHP.

4.1 Web services in open source

The most well known SOAP project is probably DotGNU (http://dotgnu.org/). DotGNU is a Free Software project to create a platform for web services in a wide variety of programming languages. The core component of DotGNU is DGEE, the DotGNU Execution Environment, which provides the basic functionality for accepting web service requests. The Portable.NET is another project under the DotGNU umbrella which provides a suite of web services software tools.

Another prominent platform for web services is Apache AXIS, the successor for the Apache SOAP project. Both the client side and the server side of the web services functions are entirely written in Java and you will need a Java application server such as Tomcat to run a web service created with AXIS. Apache AXIS is available on http://ws.apache.org/axis/.

Running a quick search on freshmeat or sourceforge will reveal a few more projects related to SOAP. To name just a few:

5 Using XSLT to create a web service

From the previous sections, you may have gathered that SOAP messages are relatively simple pieces of XML text. Then, referring back to section 3, XML can be turned into any other kind of text by transforming the XML data through an XSLT style sheet. And since a program's source code is just another piece of text, it should be rather trivial to transform the SOAP message into a program or a program fragment. As you will see in this section, this is actually the case. All you need to build a web service with this recipe is three packages of readily available open source software. Here are the ingredients:

  1. An XSLT processor (e.g. gnome-XSLT)
  2. An interpreted language (e.g. PHP)
  3. A web server (e.g. Apache)
Each one of these is installed, almost by default, from most Linux distributions. Other UNIX vendors may have them available as optional packages or you may build your own version from the source code. Figure 2 shows a diagram of our SOAP server that transforms the SOAP message into a PHP function call.

The key ingredient here is the XSLT processor. Without an XSLT processor, we wouldn't be able to transform the SOAP message into anything, let alone a program's source code. The example presented here uses the one packaged with the XSLT C library for Gnome from http://xmlsoft.org/, but any other XSLT processor, like Xalan from http://xml.apache.org/ will do just fine as well. The second part is an interpreter for some programming language. Using an interpreted language allows us to generate a program and execute the program (or program fragment) all in one go. Nearly all popular scripting languages are capable of doing this job, so we could use Perl, Python, Tcl or even an ordinary shell script. In this example, I use PHP because it integrates nicely into the Apache web server. Strictly speaking, the web server is not necessary, but it the easiest way to make a web service available on the internet and HTTP is the protocol which is mostly used by web service clients.

5.1 Capturing the SOAP message

When a web services client wishes to invoke a function on a server, the client will pack the name of the function and its parameters into a SOAP message. The client then sends the SOAP message to the server by using the POST method of the HTTP protocol. Fortunately, all the mucking about with the HTTP protocol is handled quite nicely by the web server and the PHP module. All the web service PHP script needs to do is access the data in the global PHP variable $HTTP_RAW_POST_DATA. This is where PHP will store the POSTed data if the data did not come from an HTML form (i.e. the data is not URI encoded). If you would implement a web service with a program that is invoked through CGI, you would read the SOAP message from standard input (stdin).

5.2 Transforming SOAP into a script

Now that we have our SOAP message in a string variable in our script, we need to transform the content of this string into a piece of script. There are three steps to accomplish this task:

  1. Create an XSLT style sheet that defines the transformation.
  2. Start an XSLT processor and feed it the SOAP message, together with that style sheet
  3. Capture the output of the XSLT processor. This will be the script we want to execute.

The objective of the style sheet is to extract the name of the requested function and to construct a list of arguments to pass to that function. The name of the function is easy to find. It is the name of the element which is the immediate child node of the Body node in the SOAP message. So, we match a template on the Body's children (of which there is only one) and call another template to extract the list of arguments:

   <xsl:template match="SOAP-ENV:Body/*">
      <xsl:call-template name='method'/>
   </xsl:template>

The called template outputs the name of the function and creates the list of arguments. Generating the argument list is a slightly more complicated operation:

   <xsl:template name='method'>
      <xsl:value-of select='local-name()'/>
      (
         <xsl:for-each select='node()'>
            '<xsl:apply-templates/>'
            <xsl:if test='position()!=last()'>,</xsl:if>
         </xsl:for-each>
      );
   </xsl:template>
Note that the context of this template is the immediate child of the Body node. The name of this node is the name of the function requested by the client. The output of the template starts with this name through the xsl:value-of construct. The hard part in this template is to create a comma-separated list of the argument's values. The values we want are in the textual content of the child nodes. A xsl:for-each construct is used to scan this list of children and output the content of each child. Printing a node's content to the output is the default action of the xsl:apply-templates statement, so we don't need to create a special template for that. The comma that separates each argument from the next one is created with the xsl:if statement on the next line. The xsl:if is needed to prevent the transformation from creating a comma after the last argument.

The next step is to apply this style sheet to the SOAP message by invoking an XSLT processor. Recent versions of PHP include classes and functions to do just that. However, XSLT support in PHP in still in an experimental stage and most distributions, such as Red Hat Linux, leave it out of the PHP module. To perform the transformation in a PHP script, we either have to compile our own PHP module or spawn a separate process for an external XSLT processor. The latter option is probably the easiest solution until the XSLT classes are included in the mainstream distributions. The script shown below shows how the SOAP message is stored in a temporary file before invoking xsltproc, the XSLT processor from the Gnome XSLT library:

   //  Create a temporary file and store the SOAP message.

   $xmlfile = tempnam("/tmp", "PHP-SOAP");

   $fd = fopen($xmlfile, "w");
   fwrite($fd, $HTTP_RAW_POST_DATA);
   fclose($fd);

   $xslfile = 'soap2php.xsl';

   //  Invoke the external XSLT processor and capture its output.

   exec('/usr/bin/xsltproc ' . $xslfile . " " . $xmlfile, $output);
The XSLT transformation generates the PHP function call statement in a few lines of output. Here is an example:
getthedate
   (
         'l d M Y'
         ,
         '1'
   );
This output is captured in an array of strings, one array element for each line.

5.3 Executing the script and returning results

One of the nifty features of PHP (and most other scripting languages as well) is its ability to execute a piece of script from a text string inside the script. So, we crunch the $output array into a single string and pass that string to the eval() function, as shown below:

   $output = implode("", $output);
   $result = eval("return " . $output);
The return value of eval is whatever we return from our SOAP function. The web service client expects this result wrapped inside a proper SOAP message. The simplest way to accomplish this is to have the called function create the whole SOAP message and return it as a string value. The final phase of the SOAP server generates the proper output by making an XML header followed by the SOAP message itself:
   Header("HTTP/1.0 ". "200 ok");
   Header("Content-Type: text/xml; charset=\"utf-8\"");

   echo $result;
The full code for this example is available on
http://www.andromeda.nl/WebEngineering/soap-xslt.tar.bz2

6 Conclusions and future work

In contrast to some other protocols that start with an 'S', the 'S' in SOAP actually does stand for 'Simple'. Once you discover how simple SOAP is, you may conclude that it is not so hard to use web services. A moderately skilled programmer can quickly build a SOAP application, either as a server or as a client. The example in the previous section shows that an XSLT style sheet with just a few lines of code can create a program fragment from a SOAP message. Theoretically, this style sheet can generate any PHP function call with any list of arguments. However, this example is a bit simplified and suffers a few drawbacks. Certainly, the flexibility and robustness needs a bit of work.

The first problem you may want to solve before using this method in a real-life application is the order of the arguments. PHP requires a fixed order of arguments to functions, while in SOAP they are sorted out by their names. If the SOAP message would pass the the arguments in a different order, the function will probably not work as expected. As a second problem, the response to the client which contains the return value of the function must be another SOAP message. This requires the function that provides the actual web service to return a complete SOAP message in a string. You may have noticed, there are not many 'standard' functions that return a SOAP message as their return value, so this limits the availability of functions that can act as web services to a few specially crafted ones. On the other hand, this may be a good thing. After all, you wouldn't want a client application to request the execution of a "exec('rm -rf *')" !

This leads to the most important issue that is yet to be resolved. The one of detecting and properly handling anomalous situations. Before actually executing a function on behalf of a web service client, we have to ask ourselves if we want to make this function available to that client. Is this particular client authorized to have the function executed ? Do we have the correct number of arguments and are the arguments passed with their proper names ? Do the arguments contain sensible information ? Finally, how do we react if we discover a wrong answer to any of these questions ?

To start with the last question, the SOAP specification provides a means to respond to erroneous situations. Instead of a response from the function the SOAP reply would contain the Fault element which states the nature of the error. Discovering the error should start by validating the contents of the SOAP message sent by the client, for example with an XML Schema validation. After validating the message itself, the content must still be checked against application specific criteria. Before making a web service available on the internet, you would at least have to add a validation and a fault response to the web service scripts discussed in this paper.

7 References

[1] Mark Birbeck e.a. Professional XML, 2nd Edition Wrox Press, 2001

[2] Michael Kay XSLT 2nd Edition Programmer's Reference Wrox Press, 2001

[3] XML Base W3C Recommendation http://www.w3.org/TR/xmlbase/

[4] Namespaces in XML http://www.w3.org/TR/REC-xml-names/

[5] XSL Transformations (XSLT) Version 1.0 W3C Recommendation

http://www.w3.org/TR/xslt

[6] SOAP Version 1.2 W3C Candidate Recommendation

http://www.w3.org/TR/soap12-part0/