When using the URIs
String myUri = "https://evil.example.com\\.good.example.org/"; // or String myUri = "https://evil.example.com\\@good.example.org/";
in Java on Android, the backslash in the host or user information of the authority part of the URI causes a mismatch between how Android’s android.net.Uri
and android.webkit.WebView
parse the URI with regard to its host.
- The
Uri
class (and cURL) treatevil.example.com\.good.example.org
(first example) or evengood.example.org
(second example) as the URI’s host. - The
WebView
class (and Firefox and Chrome) treatevil.example.com
(both examples) as the URI’s host.
Is this known, expected or correct behavior? Do the two classes simply follow different standards?
Looking at the specification, it seems neither RFC 2396 nor RFC 3986 allows for a backslash in the user information or authority.
Is there any workaround to ensure a consistent behavior here, especially for validation purposes? Does the following patch look reasonable (to be used with WebView
and for general correctness)?
Uri myParsedUri = Uri.parse(myUri); if ((myParsedUri.getHost() == null || !myParsedUri.getHost().contains("\\")) && (myParsedUri.getUserInfo() == null || !myParsedUri.getUserInfo().contains("\\"))) { // valid URI } else { // invalid URI }
One possible flaw is that this workaround may not catch all the cases that cause inconsistent hosts to be parsed. Do you know of anything else (apart from a backslash) that causes a mismatch between the two classes?
2 Answers
Answers 1
It's known that Android WebView 4.4 converts some URLs, in the linked issue are some steps described how to prevent that. From your question is not completely clear if your need is based in that issue or something else.
You can mask the backslashes and other signs with there according number in the character-table. In URLs the the number is written in hexademcimal.
Hexadecimal: 5C Dezimal: 92 Sign: \
The code is the prepended with a %
for each sign in the URL, your code looks like this after replacement:
String myUri = "https://evil.example.com%5C%5C.good.example.org/"; // or String myUri = "https://evil.example.com%5C%5C@good.example.org/";
it might be required still to add a slash to separate domain and path:
String myUri = "https://evil.example.com/%5C%5C.good.example.org/"; // or String myUri = "https://evil.example.com/%5C%5C@good.example.org/";
Is it possible that the backslashes never shall be used for network-communication at all but serve as escaping for some procedures like regular expressions or for output in JavaScript (Json) or some other steps?
Bonus ;-)
Below is a php-script that prints a table for most UTF-8-signs with the corresponding Numbers in hex and dec. (it still should be wrapped in an html-template including css perhaps):
<?php $chs = array('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F'); $chs2 = $chs; $chs3 = $chs; $chs4 = $chs; foreach ($chs as $ch){ foreach ($chs2 as $ch2){ foreach ($chs3 as $ch3){ foreach ($chs4 as $ch4){ echo '<tr>'; echo '<td>'; echo $ch.$ch2.$ch3.$ch4; echo '</td>'; echo '<td>'; echo hexdec($ch.$ch2.$ch3.$ch4); echo '</td>'; echo '<td>'; echo '&#x'.$ch.$ch2.$ch3.$ch4.';'; echo '</td>'; echo '</tr>'; } } } } ?>
Answers 2
Is this known, expected or correct behavior?
IMO, it is not. For both URI
and WebView
. Because RFC won't allow a backslash, they could have warn it. However it is less important because it does not affect the working at all if the input is as expected.
Do the two classes simply follow different standards?
The URI
class and WebView
strictly follows the same standards. But due to the fact that they are different implementations, they may behave differently to an unexpected input.
For example, "^(([^:/?#]+):)?((//([^/?#]*))?([^?#]*)(\\?([^#]*))?)?(#(.*))?"
this is the regular expression in URI which is used to parse URIs. The URI parsing of WebView is done by native CPP methods. Even though they follow same standards, chances are there for them to give different outcome (At least for unexpected inputs).
Does the following patch look reasonable?
Not really (See the answer of next question).
Do you know of anything else (apart from a backslash) that causes a mismatch between the two classes?
Because you are so concerned about the consistent behavior, I won't suggest a manual validation. Even the programmers who wrote these classes can't list all of such scenarios.
The solution
If I understand correctly, you need to load URLs which is supplied by untrustable external sources (which attackers can exploit if there is a loop hole), but you need to identify it's host correctly.
In that case, you can parse it using URI
class itself and use URI#getHost()
to identify the host. But for WebView
, instead of passing the original URL string, pass URI#toString()
.
0 comments:
Post a Comment