<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>ASF: XML Security Overview</title>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<link rel="stylesheet" type="text/css" href="resources/apache-xalan.css" />
</head>
<!--
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements. See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership. The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the  "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 -->
<body>
<div id="title">
<table class="HdrTitle">
<tbody>
<tr>
<th rowspan="2">
<a href="../index.html">
<img alt="Trademark Logo" src="resources/XalanC-Logo-tm.png" width="190" height="90" />
</a>
</th>
<th text-align="center" width="75%">
<a href="index.html">Xalan-C/C++ Version 1.11</a>
</th>
</tr>
<tr>
<td valign="middle">XML Security Overview</td>
</tr>
</tbody>
</table>
<table class="HdrButtons" align="center" border="1">
<tbody>
<tr>
<td>
<a href="http://www.apache.org">Apache Foundation</a>
</td>
<td>
<a href="http://xalan.apache.org">Xalan Project</a>
</td>
<td>
<a href="http://xerces.apache.org">Xerces Project</a>
</td>
<td>
<a href="http://www.w3.org/TR">Web Consortium</a>
</td>
<td>
<a href="http://www.oasis-open.org/standards">Oasis Open</a>
</td>
</tr>
</tbody>
</table>
</div>
<div id="navLeft">
<ul>
<li>
<a href="resources.html">Resources</a>
<br />
</li>
<li>
<a href="../index.html">Home</a>
</li></ul><hr /><ul>
<li>
<a href="index.html">Xalan-C++ 1.11</a>
</li>
<li>
<a href="whatsnew.html">What's New</a>
</li>
<li>
<a href="license.html">Licenses</a>
</li></ul><hr /><ul>
<li>
<a href="overview.html">Overview</a>
</li>
<li>
<a href="charter.html">Charter</a>
</li></ul><hr /><ul>
<li>
<a href="download.html">Download</a>
</li>
<li>
<a href="buildlibs.html">Build Libraries</a>
</li>
<li>
<a href="install.html">Installation</a>
</li>
<li>
<a href="builddocs.html">Build Documents</a>
</li></ul><hr /><ul>
<li>
<a href="samples.html">Sample Apps</a>
</li>
<li>
<a href="commandline.html">Command Line</a>
</li>
<li>
<a href="usagepatterns.html">Usage Patterns</a>
</li></ul><hr /><ul>
<li>
<a href="programming.html">Programming</a>
</li>
<li>
<a href="extensions.html">Extensions</a>
</li>
<li>
<a href="extensionslib.html">Extensions Library</a>
</li>
<li>
<a href="apiDocs/index.html">API Reference</a>
</li></ul><hr /><ul>
<li>
<a href="faq.html">Xalan-C FAQs</a>
</li></ul><hr /><ul>
<li>
<a href="whatsnew.html#bugs">Bugs</a>
</li>
<li>
<a href="http://xalan.apache.org/old/xalan-j/test/run.html#how-to-run-c">Testing</a>
</li>
<li>Web Security<br />
</li>
</ul>
</div>
<div id="content">
<h2>XML Security Overview</h2>
<ul>
<li>
<a href="#xsov_xmlParser">XML Parser Threats</a>
</li>
<li>
<a href="#xsov_resolvEntity">Resolving External Entities</a>
</li>
<li>
<a href="#xsov_trustEntity">Trusted External Entities</a>
</li>
<li>
<a href="#xsov_piThreat">Processing Instruction (PI) Threats</a>
</li>
<li>
<a href="#xsov_soapThreat">SOAP Simple Object Access Protocol</a>
</li>
<li>
<a href="#xsov_wsdlThreat">WSDL Web Service Description Language</a>
</li>
<li>
<a href="#xsov_uriThreat">URI Uniform Resource Identifiers</a>
</li>
<li>
<a href="#xsov_urlThreat">URL Uniform Resource Locators</a>
</li>
<li>
<a href="#xsov_malUtfStrings">Malformed UTF-8 and UTF-16 Strings</a>
</li>
<li>
<a href="#xsov_canonicalXML">Canonical XML Issues</a>
</li>
<li>
<a href="#xsov_xhtmlWorkaround">XHTML Output Mode - Workaround</a>
</li>
</ul>

<br />
<p>
<b>This document goes well beyond XSLT. Use it as a general reference.</b>
</p>
<p>There are numerous security issues and problems that are 
endemic to the XML architecture. 
I will try to identify some of the most common issues and threats 
and describe some mitigation strategies.
</p>
<p>The biggest threat issue is a matter of trust. 
How well do you trust your sources of XML data? 
What are the tools that can help increase the trust?
</p>
<p>Most Web Service communications uses HTTP over standard TCP ports. 
The HTTP protocol on standard TCP ports has free access through business firewalls. 
How well do your proxy servers handle the Web Service security issues 
required for your applications?
</p>
<p>How well are your resource identifiers protected? 
How well do your applications cope with resource identifier spoofing? 
Can your resource identifiers be trusted by outside clients? 
Can you trust the credentials of your clients?
</p>
<p>Will the SOAP interface for your Web Service send error messages 
to an untrusted Web Service address?
</p>
<p>Is your WSDL interface description file readily available for download, 
thus enabling persons with malicious intent to create targeted attacks on your Web Services?
</p>
<p>Can you trust the client credentials that use your Web Service application?
</p>
<p>There are numerous security issues that are not directly involved in 
the markup of XML or its processing. 
These issues relate to infrastructure.
</p>
<p>Can you trust your DNS (Domain Name Service) and reduce its vulnerability to hijacking?
</p>
<p>Are your web servers hardened against known application vulnerabilities?
</p>
<p>Are your applications hardened against 
cross site scripting and SQL injection?
</p>
<p>Can your client applications trust the scripts 
that are transmitted as web pages?
</p>
<p>Can your web server trust the scripts that are submitted?
</p>
<p>Is application data sanitized before being consumed by your applications?
</p>

<a name="xsov_xmlParser">‌</a>
<p align="right" size="2">
<a href="#content">(top)</a>
</p>
<h3>XML Parser Threats</h3>

<p>This list will help you find the XML threat vectors that need to be addressed.  
Some vectors cannot be easily resolved.
</p>
<ul>
<li>Resolving External Entities</li>
<li>Implicit Trust of Internal DTD</li>
<li>Resource Identifier Spoofing</li>
<li>Malformed UTF-8 and UTF-16</li>
<li>Secure the trust of external DTD descriptions</li>
<li>Secure the trust of external Schema definitions</li>
<li>Secure the trust of entity import and include constructs</li>
<li>Configuration of Entity Resolver Catalogs</li>
</ul>


<a name="xsov_resolvEntity">‌</a>
<p align="right" size="2">
<a href="#content">(top)</a>
</p>
<h3>Resolving External Entities</h3>

<p>The XML1.0 and XML1.1 standards specify a <code>DOCTYPE</code> format. 
The processing may uncover significant entity resolver deficiencies.
</p>

<p>
<code>&lt;!DOCTYPE name PUBLIC "public-id" "system-id" [internal-DTD]&gt;</code>
<br />
<code>&lt;!DOCTYPE name SYSTEM "system-id" [internal-DTD]&gt;</code>
</p>
<p>XML Parsers MUST process the <code>[internal-DTD]</code> if it exists.
</p>
<p>XML Parsers MAY process the external <code>"system-id"</code> if it can be found.
</p>
<p>XML Parsers MAY process the external <code>"public-id"</code> if it can be found.
</p>
<p>XML Parsers MAY prefer either the <code>"public-id"</code> or <code>"system-id"</code> 
if both are specified.
</p>
<p>XML Parsers MAY ignore both the <code>"public-id"</code> and <code>"system-id"</code> 
if present.
</p>
<p>Declaring a parameter entity notation <code>"%entity;"</code> 
in the <code>[internal-DTD]</code> and expanding the content within the 
<code>[internal-DTD]</code> will force the XML parser to import the content 
referenced by the <code>"%entity;"</code> notation.
</p>
<p>Declaring a general entity notation <code>"&amp;entity;"</code> in the 
<code>[internal-DTD]</code> and expanding the content within the body of 
the XML document will force the XML parser to import the content referenced 
by the <code>"&amp;entity"</code> notation.
</p>
<p>The default method of resolving external entities is by resolving entity 
name strings relative to DNS named hosts and/or path names relative to the 
local computer system.  When receiving XML documents from an outside source, 
these entity reference locations may be unreachable, unreliable, or untrusted.
</p>
<p>Web Service SOAP XML documents MUST NOT have <code>DOCTYPE</code> definitions. 
SOAP processors should not process DOCTYPE definitions. 
The conformance is implementation dependent.
</p>
<p>
<a href="http://www.w3.org/TR/soap">http://www.w3.org/TR/soap</a>
</p>


<a name="xsov_trustEntity">‌</a>
<p align="right" size="2">
<a href="#content">(top)</a>
</p>
<h3>Trusted External Entities</h3>

<p>The <b>
<i>OASIS XML Catalogs</i>
</b> specification, if implemented by an application, 
can specify a set of external entities that can be trusted by mapping known 
identifiers to local or trusted resources.  A secure application should 
not trust entity identifiers whose resources cannot be localized and secured.
</p>
<p>
<a href="http://www.oasis-open.org/committees/entity">http://www.oasis-open.org/committees/entity</a>
</p>
<p>A similar method can be designed specifically for each application.
</p>
<p>A trusted application may need to pre-screen any entity definitions in XML 
before passing the information into the core of the application.
</p>
<p>A trusted application should install some type of entity resolving catalog 
or database that can be trusted.
</p>


<a name="xsov_piThreat">‌</a>
<p align="right" size="2">
<a href="#content">(top)</a>
</p>
<h3>Processing Instruction (PI) Threats</h3>

<p>Processing instructions are a mechanism to send specific information 
into an application.  A common processing instruction is a 
stylesheet declaration.  
This information is part of an XML document and comes usually 
after the XML header and before the root element.
</p>
<p>A stylesheet declaration may cause an application to look for an 
untrusted XSLT stylesheet to use for transformation of the 
following root element.  A standard exists for associating style sheets with XML documents.
</p>
<p>
<a href="http://www.w3.org/TR/xml-stylesheet">http://www.w3.org/TR/xml-stylesheet</a>
</p>
<p>Examples in the xml-stylesheet recommendation describes how to use the 
processing instruction to associate CSS stylesheets for XHTML.  
Applications that use XSLT transformations will interpret the 
xml-stylesheet processing instruction as the location of a 
XSLT transformation stylesheet.
</p>
<p>As more processing instructions become standardized and in common use, 
their threat of misuse increases.
</p>


<a name="xsov_soapThreat">‌</a>
<p align="right" size="2">
<a href="#content">(top)</a>
</p>
<h3>SOAP Simple Object Access Protocol</h3>

<p>The SOAP specification explicitly forbids the transport of 
DOCTYPE definitions and PI processing instructions.
</p>
<p>The SOAP specifies a transport envelope that encapsulates 
an XML message for transport. SOAP can also handle various 
transmission status indicators implying confirmation of delivery, 
error messages, and queue status messages. 
SOAP transports can be loosely coupled and intermittent. 
SOAP is used extensively in the design and deployment of Web Service architectures. 
A companion Web Service specification is WSDL, the Web Service Definition Language.
</p>
<p>The SOAP protocol as widely deployed by Microsoft and other vendors 
is based on specifications that predate the adoption 
by the <a href="http://www.w3.org">World Wide Web Consortium (W3C)</a>. 
SOAP is not based on Microsoft technology. 
It is an open standard drafted by UserLand, Ariba, Commerce One, Compaq, 
Developmentor, HP, IBM, IONA, Lotus, Microsoft, and SAP. 
<a href="http://www.w3.org/TR/2000/NOTE-SOAP-20000508">SOAP 1.1</a> 
was presented to the W3C in May 2000 as an official Internet standard. 
</p>
<p>The original <a href="http://www.w3.org/TR/soap11">SOAP 1.1</a> standard 
is associated with this URI namespace prefix.
</p>
<p>
<code>http://schemas.xmlsoap.org/soap/</code>
</p>
<p>There are significant changes in naming conventions since SOAP 1.1 
was adopted by W3C as a recommended standard. 
The current iteration is <a href="http://www.w3.org/TR/soap12">SOAP 1.2</a> 
and is associated with this URI namespace prefix.
</p>
<p>
<code>http://www.w3.org/2003/05</code>
</p>
<p>The basic security threat to the SOAP architecture is 
the ability to spoof Web Service addresses and telling a 
SOAP server to respond to a rogue Web Service address 
when a <code>mustUnderstand</code> attribute is processed 
and an error indication is raised.
</p>
<p>Other intelligence that can be obtained might be the 
location of a public accessible WSDL definition 
of the messages being transported by SOAP, 
thus allowing additional malware attacks to be automatically generated.
</p>


<a name="xsov_wsdlThreat">‌</a>
<p align="right" size="2">
<a href="#content">(top)</a>
</p>
<h3>WSDL Web Service Description Language</h3>

<p>WSDL is known as the Web Service Description Language. 
The WSDL XML document is a an interface description that can be transformed 
into various programming languages. 
Such transformed interface descriptions are recognized as 
Java Interfaces and C++ Virtual Classes.
</p>
<p>The original <a href="http://www.w3.org/TR/wsdl">WSDL 1.1</a> standard 
is associated with this URI namespace prefix.
</p>
<p>
<code>http://schemas.xmlsoap.org/wsdl/</code>
</p>
<p>The current <a href="http://www.w3.org/TR/wsdl20">WSDL 2.0</a> standard 
is maintained by W3C in their namespace with prefix.
</p>
<p>
<code>http://www.w3.org/</code>
</p>
<p>The WSDL can provide a template for generating a compliant Web Service systems 
for multiple and hetrogeneous platforms.
</p>
<p>A WSDL document that can benefit developers can also be used by malware 
and hackers to taylor specific threats against targeted Web Services.
</p>
<p>The SOA (Service Oriented Architecure), 
SAAS (Software As A Service), 
PAAS (Platform As A Service) are families of 
Web Services used as interfaces into what is 
generally known as Cloud Computing.
</p>


<a name="xsov_uriThreat">‌</a>
<p align="right" size="2">
<a href="#content">(top)</a>
</p>
<h3>URI Uniform Resource Identifiers</h3>

<p>The URI does not need to specify the location of a resource. 
It merely provides a resource name. A catalog, database, 
or other mechanism is used to map URIs to resource locations.
</p>
<p>The security issue here is that most URIs are used with a 
DNS (Domain Name Service) to find a host and path to a resource. 
The URI is then treated as a URL (Uniform Resource Locator).
</p>
<p>The mitigation of these threats requires diligence of the 
application architects to ensure an appropriate level of trust 
for the URIs and URLs used in their applications.
</p>
<p>The transmission media is inherently untrusted. 
Often SOAP bindings and HTTP transports are used. 
Web Service addressing is readily spoofed.
</p>


<a name="xsov_urlThreat">‌</a>
<p align="right" size="2">
<a href="#content">(top)</a>
</p>
<h3>URL Uniform Resource Locators</h3>

<p>See: <a href="#xsov_uriThreat">URI Uniform Resource Identifiers</a>
</p>


<a name="xsov_malUtfStrings">‌</a>
<p align="right" size="2">
<a href="#content">(top)</a>
</p>
<h3>Malformed UTF-8 and UTF-16 Strings</h3>

<p>Public Key Infrastructure (X.509) certificates are leased from a 
certificate authority or are self-signed. 
The distinguished names and parts thereof are usually rendered in unicode.
</p>
<p>The value of zero is not a valid Unicode character. 
It is possible to create non-zero UTF-8 and UTF-16 sequences that equate to zero, 
which is not allowed. 
Some rogue hackers have successfully obtained wild-card PKI (X.509) certificates 
by prepending a UTF-8(zero) in a distinguished name when applying for a certificate. 
Such a certificate could be used to successfully sign anything.
</p>
<p>Applications should not blindly accept UTF-8 and UTF-16 strings 
without verifying the proper encoding for those strings. 
Contents that equate to bad Unicode character values should be denied.
</p>


<a name="xsov_canonicalXML">‌</a>
<p align="right" size="2">
<a href="#content">(top)</a>
</p>
<h3>Canonical XML Issues</h3>

<p>Canonical XML is a tranformation of an XML document into a 
canonical form useful for signing. 
This is used in some Web Service security implementations.
</p>
<p>There are several areas where Canonical XML will create XML documents 
that have severe application problems.
</p>
<p>The number values are rendered in Base-10 as decimal fractions. 
The computations performed by computers are usually in Base-2 floating point arithmetic. 
You therefore have truncation or roundoff issues when converting between 
decimal fractions and Base-2 fractions.
</p>
<p>The canonical process may collapse whitespace and transform 
multi-character line endings to single-character line endings. 
When whitespace is significant, the canonical issues for signing can cause problems.
</p>
<p>It is possible to create XHTML documents that will not work with some browsers. 
The empty &lt;a/&gt; anchor element is not allowed by many browsers, 
therefore &lt;a&gt;&lt;/a&gt; is required. 
A standard XML canonical process may collapse elements with no content into empty elements. 
The empty paragraph&lt;p/&gt; is disallowed.  The &lt;p&gt;&lt;/p&gt; is supported.
</p>
<p>The World Wide Web Consortium (W3C) has additional detailed discussion of 
<a href="http://www.w3.org/TR/C14N-issues/">canonicalization issues</a>.
</p>


<a name="xsov_xhtmlWorkaround">‌</a>
<p align="right" size="2">
<a href="#content">(top)</a>
</p>
<h3>XHTML Output Mode - Workaround</h3>

<p>The Xalan-C/C++ library currently has no XHTML output mode.
Since XHTML is to be well-formed XML, the desire is to use the XML output method.
</p>
<p>XHTML is based on HTML version 4.
</p>
<p>Empty elements declared by HTML-4 should have a space before the 
trailing '/&gt;' markup (i.e. &lt;br /&gt; and &lt;hr /&gt;). 
XML output mode does not normally have this space when using 
the &lt;xsl:element name="br" /&gt; in your stylesheet. 
Most modern browsers are ok with no space, but viewing the 
browser source shows a warning condition.
</p>
<p>Non-empty elements declared by HTML-4 should not be rendered as empty XML elements. 
If there is no content, the elements should be rendered with both a start-tag and end-tag 
(i.e. &lt;a name="xxx"&gt;&lt;/a&gt;) instead of an XML empty-element. 
XSLT processors usually create an empty-element 
(i.e. &lt;a name="xxx"/&gt;) when the element being defined has no content 
other than attributes.
</p>
<p>For XSLT processors creating XML documents for XHTML, 
you can create what looks like an element with no content by including 
the &amp;#8204; character 
(a zero-width non-joining character often known as &amp;zwnj;) 
as the element text content. 
This also allows transitional browsers the ability to find the end tag.
</p>
<p>
<blockquote class="source">
<pre>  DTD    &lt;!ENTITY zwnj    "&amp;#8204;"&gt;

  &lt;a name="marker"&gt;&amp;zwnj;&lt;/a&gt;</pre>
</blockquote>
</p>
<p>Transitional XHTML is not usually well-formed XML. 
It becomes a mix of HTML version 4 and XML markup. 
Strict XHTML is required to be well-formed XML.
</p>

<p align="right" size="2">
<a href="#content">(top)</a>
</p>
</div>
<div id="footer">Copyright © 1999-2012 The Apache Software Foundation<br />Apache, Xalan, and the Feather logo are trademarks of The Apache Software Foundation<div class="small">Web Page created on - Tue 2012-10-09</div>
</div>
</body>
</html>
