Security World: [NEWS] Java Runtime UTF-8 Decoder Smuggling Vector

The following security advisory is sent to the securiteam mailing list, and can be found at the SecuriTeam web site: http://www.securiteam.com
- - promotion

The SecuriTeam alerts list - Free, Accurate, Independent.

Get your security news from a reliable source.
http://www.securiteam.com/mailinglist.html

- - - - - - - - -

Java Runtime UTF-8 Decoder Smuggling Vector
------------------------------------------------------------------------

SUMMARY

On July 15 OuTian reported a vulnerability in Apache Tomcat[2] whereby
overwide byte sequences in utf-8 could bypass both Apache Tomcat access
control restrictions as well as path decoding logic.

On July 17 Simon Ryeo reported[3] a variation of the same vulnerability in
Apache httpd server when proxying content generated from Tomcat.

Remy Maucherat wrote a patch to address this particular expression of the
vector for Tomcat 6.0.x[4] which also mitigates against any similar but as
yet undiscovered decoding vulnerabilities. This patch has also been
ported to 5.5.x[5] and 4.1.x[6]. On July 31st the Apache Software
Foundation published a mitigation to this vulnerability as Apache Tomcat
release 6.0.18.[7] and added this vulnerability to the Apache Tomcat
security pages[8]. Releases for 5.5.x and 4.1.x will follow shortly. The
Tomcat vulnerability had been announced by Ryeo [9] but the full
implications remained undisclosed.

During the course of research, the Glassfish implementation was determined
not to be vulnerable to the specific exploit identified and reported by
OuTian/Ryeo. However, all implementations which accept overlong paths,
including Glassfish, remain vulnerable insofar as any access control is
implemented at the proxy or gateway layer of an http service. Apache
Tomcat release 6.0.18 is no longer vulnerable with respect to its URI
path, as 6.0.18 rejects all requests where the decoded value changes the
path representation, but is still exposed due to this vector in other
characteristics.

That said, the underlying vector for this vulnerability identified by Rowe
is actually within the UTF-8 charset implementation of the
java.nio.charset.CharsetDecoder. The onMaformedInput CodingErrorAction is
not triggered by the presence of overlong utf-8 octet sequences in a
number of vulnerable Java runtime implementations, including Sun's JRE,
OpenJDK, HP's RTE, BEA's JRocket, IBM's SDK, Apple's SDK and Apache
Harmony. Other implementations were not tested.

On July 18th, Rowe and Maucherat confirmed this flaw in Apache Harmony,
Sun's JRE and OpenJDK, and began distributing this information to affected
Java Runtime authors to allow all to prepare appropriate fixes.

On August 13th, this information was made available to various framework
authors such as Spring, BEA, IBM, etc and other affected developers as
identified by US-CERT to address their specific exposure and potential
vulnerabilities. It is the desire of the author that this announcement in
limited form coincide with Sun's Synchronized Security Release[1] of the
Java platform in October, with parallel releases by HP, Apple, OpenJDK,
Apache Harmony etc within that time frame.

DETAILS

Immune Systems:
* Sun Java version 6u11
* Sun Java version 1.5.0_17
* Sun Java version 1.4.2_19
* IBM J2SE 5.0 SR9
* IBM J2SE 1.4.2 SR13
* IBM Java SE 6 SR4

In RFC 3629 "UTF-8, a transformation format of ISO 10646" [10] and even as
early as the preceding RFC 2279 [11], F. Yergeau et. al. clearly
identified under section 6. "Security Considerations" the impact of
overlong byte sequences (and declaring same as invalid sequences) in
January 1998. Such Security Considerations were not discussed in the
preceding RFC 2044 [12] published October 1996.

Limiting consideration for the moment to the original vulnerability report
and the HTTP/1.1 URI syntax, it becomes immediately clear that; HTTP/1.1
does not specify an encoding for the URI (RFC 2616 [13] and RFC 2396 [14])
and treats it as a octet stream known to the client and origin server, and
otherwise transparent to intervening proxies. Specific characters in the
HTTP URI are significant, all of them within the US-ASCII character set
(which is a deliberate subset of UTF-8 and the first 128 code points of
Unicode). Many implementers and applications use UTF-8 encoding for their
URI patterns as permitted (but not required) by HTTP/1.1.

However, high octets have no specific meaning within RFC 2616 or RFC 2396.
Their presence, mapping two or more high octet bytes into a US-ASCII code
point, must be ignored by proxies, as such bytes are entirely appropriate
in other character sets and HTTP/1.1 does not attribute any UTF-8
properties to this string. Non-conforming implementations which treat the
entire URI as UTF-8, and which suffer from decoding overlong octet
sequences into the US-ASCII range, will behave differently than their
conforming cousins.

This mismatch of behavior results yet again in the same class of vectors
that were identified three years ago by Linhart, Klein, Heled and Orrin.
The essential premise of their HTTP Request Smuggling whitepaper [15]
holds that the subtle differences in request parsing yield surprisingly
disastrous results. The same is true where a CR-LF line termination,
delimiter, etc. can be tunneled through proxy layers which are conforming
across into a nonconforming endpoint.

The risks of this vector are not limited in any manner to the http request
line, however. Any multi-tier service may be at risk provided that 1) the
end point accepts invalid UTF-8 sequences, 2) an intermediate transport
layer performs no UTF-8 decoding, and 3) the intermediate transport layer
performs decoding, routing, or access control functions based on US-ASCII
assumptions about such invalid strings. Such services might be external
interfaces, or firewalled interfaces such as SQL query strings and
similar.

The authors of this note point out that the vulnerability is not to be
confused with the issue of normative canonical forms for string
comparison. As there should exist no mapping of code points > 127, any
code point in the range 0..127 should be available for parsing without an
awareness that the resulting string will be utf-8, provided all utf-8
high-bit octets are passed unmodified in the same sequence. Full string
comparisons for access control containing code points > 127 require a
normative form common to the input and reference strings, and authors must
take this into consideration when implementing any access control based on
UTF-8 where non-normative forms can be passed through any intermediate
access control, but are accepted and then transformed by the endpoint into
another representation.

Mitigating Abuse:
There are a number of layers which a service author must be concerned
with. At the simplest, if the request is read in UTF-8 for http or similar
request protocols, yet the protocol does not define the request stream as
UTF-8, or is handled as essentially ASCII for transport purposes, embedded
CR-LF line delimiters may be abused for smuggling attacks.

Any delimiters within the input must then be considered. For example, the
colon of a header line may be rendered invisible, permitting headers that
would otherwise be rejected, or the various comma and similar delimiters
between fields may be hidden rendering multiple tokens into a single
apparent value.

Finally, the text itself may be encoded with apparently unknown values. In
the case of http, these must be passed on as connection level headers
rather than transport layer (hop by hop) headers and ignored. So some
field such as Transport-Encoding: chunked or Content-Length:value can be
passed without a proxy or service provider recognizing them for what they
are (a disallowed combination). The impact upon the HTTP URI was already
clearly disclosed, however it is not difficult to identify other nefarious
effects which this can have.

If the application cannot be migrated to a corrected Java VM, the author
should examine the conversions to utf-8 component by component, and be
very cautious to reject and terminate any connection where overlong utf-8
sequences are identified. It's necessary to probe for these explicitly if
the VM will not reject them. Invalid patterns begin with the octets 0xC0,
0xC1, 0xE0 followed by a value < 0xA0, 0xF0 followed by a value < 0x90.
Since five and six byte values cannot be represented by UTF-16, the values
0xF5 and higher should be rejected out of hand.

Finally, if these overlong sequences are not explicitly parsed for, across
any sort of applications beyond http, note the following statement of fact
from RFC 3629;
o US-ASCII octet values do not appear otherwise in a UTF-8 encoded
character stream. This provides compatibility with file systems or other
software (e.g., the printf() function in C libraries) that parse based on
US-ASCII values but are transparent to other values.

and contrast this to the case of an errant implementation such as those
found in the affected JVM's; this assumption must be turned on it's head.
Multiply the cases affected by this error both into and out of the
filesystem and other resources from a given java-based service. It
becomes critical that all evaluation occurs after that translation, and
none before the string becomes Unicode.

References:
[1] OuTian, "Tomcat - Unicode decoding directory traversal vulnerability"
<http://outian.org/tomcat.pdf> http://outian.org/tomcat.pdf

[2] Ryeo, S., "Directory Traversal Vulnerability"
<https://issues.apache.org/bugzilla/show_bug.cgi?id=45417>
https://issues.apache.org/bugzilla/show_bug.cgi?id=45417

[3] Sun Microsystems, Java SE 6 Update 11 Release Notes
<http://java.sun.com/javase/6/webnotes/6u11.html>
http://java.sun.com/javase/6/webnotes/6u11.html

[4] Maucherat, R., "Additional normalization check"
<http://svn.apache.org/viewvc?rev=678137&view=rev>
http://svn.apache.org/viewvc?rev=678137&view=rev

[5] Thomas, M., "Additional normalization check"
<http://svn.apache.org/viewvc?rev=681029&view=rev>
http://svn.apache.org/viewvc?rev=681029&view=rev

[6] Thomas, M., "Additional normalization check"
<http://svn.apache.org/viewvc?rev=681065&view=rev>
http://svn.apache.org/viewvc?rev=681065&view=rev

[7] Maucherat, R., "[ANN] Apache Tomcat 6.0.18 released"

[8] "Tomcat Security Pages" <http://tomcat.apache.org/security.html>
http://tomcat.apache.org/security.html

[9] Ryeo, S., "Apache Tomcat Directory Traversal Vulnerability"
<http://www.securityfocus.com/archive/1/495318/30/0/threaded>
http://www.securityfocus.com/archive/1/495318/30/0/threaded

[10] Yergeau, F., "UTF-8, a transformation format of ISO 10646"
<http://www.ietf.org/rfc/rfc3629.txt> http://www.ietf.org/rfc/rfc3629.txt

[11] Yergeau, F., "UTF-8, a transformation format of ISO 10646"
<http://www.ietf.org/rfc/rfc2279.txt> http://www.ietf.org/rfc/rfc2279.txt

[12] Yergeau, F., "UTF-8, a transformation format of ISO 10646"
<http://www.ietf.org/rfc/rfc2044.txt> http://www.ietf.org/rfc/rfc2044.txt

[13] Fielding, R., et al., "HTTP/1.1"
<http://www.ietf.org/rfc/rfc2616.txt> http://www.ietf.org/rfc/rfc2616.txt

[14] Berners-Lee, T., R. Fielding, L. Masinter "URI Generic Syntax"
<http://www.ietf.org/rfc/rfc2396.txt> http://www.ietf.org/rfc/rfc2396.txt

[15] Linhart, C., A. Klein, R. Heled, S. Orrin "HTTP Request Smuggling"
<http://www.cgisecurity.com/lib/HTTP-Request-Smuggling.pdf>
http://www.cgisecurity.com/lib/HTTP-Request-Smuggling.pdf

ADDITIONAL INFORMATION

The information has been provided by <mailto:wrowe@rowe-clan.net> William
A. Rowe, Jr..

========================================

This bulletin is sent to members of the SecuriTeam mailing list.
To unsubscribe from the list, send mail with an empty subject line and body to: list-unsubscribe@securiteam.com
In order to subscribe to the mailing list, simply forward this email to: list-subscribe@securiteam.com

====================
====================

DISCLAIMER:
The information in this bulletin is provided "AS IS" without warranty of any kind.
In no event shall we be liable for any damages whatsoever including direct, indirect, incidental, consequential, loss of business profits or special damages.

Security World

Search This Blog

Saturday, January 10, 2009

[NEWS] Java Runtime UTF-8 Decoder Smuggling Vector

No comments: