When using the URIs

String myUri = "https://evil.example.com\\.good.example.org/"; // or String myUri = "https://evil.example.com\\@good.example.org/";

in Java on Android, the backslash in the host or user information of the authority part of the URI causes a mismatch between how Android’s android.net.Uri and android.webkit.WebView parse the URI with regard to its host.

The Uri class (and cURL) treat evil.example.com\.good.example.org (first example) or even good.example.org (second example) as the URI’s host.
The WebView class (and Firefox and Chrome) treat evil.example.com (both examples) as the URI’s host.

Is this known, expected or correct behavior? Do the two classes simply follow different standards?

Looking at the specification, it seems neither RFC 2396 nor RFC 3986 allows for a backslash in the user information or authority.

Is there any workaround to ensure a consistent behavior here, especially for validation purposes? Does the following patch look reasonable (to be used with WebView and for general correctness)?

Uri myParsedUri = Uri.parse(myUri);  if ((myParsedUri.getHost() == null || !myParsedUri.getHost().contains("\\")) && (myParsedUri.getUserInfo() == null || !myParsedUri.getUserInfo().contains("\\"))) {     // valid URI } else {     // invalid URI }

One possible flaw is that this workaround may not catch all the cases that cause inconsistent hosts to be parsed. Do you know of anything else (apart from a backslash) that causes a mismatch between the two classes?

2 Answers

Answers 1

It's known that Android WebView 4.4 converts some URLs, in the linked issue are some steps described how to prevent that. From your question is not completely clear if your need is based in that issue or something else.

You can mask the backslashes and other signs with there according number in the character-table. In URLs the the number is written in hexademcimal.

Hexadecimal: 5C Dezimal: 92 Sign: \

The code is the prepended with a % for each sign in the URL, your code looks like this after replacement:

String myUri = "https://evil.example.com%5C%5C.good.example.org/"; // or String myUri = "https://evil.example.com%5C%5C@good.example.org/";

it might be required still to add a slash to separate domain and path:

String myUri = "https://evil.example.com/%5C%5C.good.example.org/"; // or String myUri = "https://evil.example.com/%5C%5C@good.example.org/";

Is it possible that the backslashes never shall be used for network-communication at all but serve as escaping for some procedures like regular expressions or for output in JavaScript (Json) or some other steps?

Bonus ;-)
Below is a php-script that prints a table for most UTF-8-signs with the corresponding Numbers in hex and dec. (it still should be wrapped in an html-template including css perhaps):

<?php     $chs = array('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F');     $chs2 = $chs;     $chs3 = $chs;     $chs4 = $chs;     foreach ($chs as $ch){         foreach ($chs2 as $ch2){                 foreach ($chs3 as $ch3){                 foreach ($chs4 as $ch4){                     echo '<tr>';                     echo '<td>';                     echo $ch.$ch2.$ch3.$ch4;                     echo '</td>';                     echo '<td>';                     echo hexdec($ch.$ch2.$ch3.$ch4);                     echo '</td>';                     echo '<td>';                     echo '&#x'.$ch.$ch2.$ch3.$ch4.';';                     echo '</td>';                     echo '</tr>';                 }             }         }     } ?>

Answers 2

Is this known, expected or correct behavior?

IMO, it is not. For both URI and WebView. Because RFC won't allow a backslash, they could have warn it. However it is less important because it does not affect the working at all if the input is as expected.

Do the two classes simply follow different standards?

The URI class and WebView strictly follows the same standards. But due to the fact that they are different implementations, they may behave differently to an unexpected input.

For example, "^(([^:/?#]+):)?((//([^/?#]*))?([^?#]*)(\\?([^#]*))?)?(#(.*))?" this is the regular expression in URI which is used to parse URIs. The URI parsing of WebView is done by native CPP methods. Even though they follow same standards, chances are there for them to give different outcome (At least for unexpected inputs).

Does the following patch look reasonable?

Not really (See the answer of next question).

Do you know of anything else (apart from a backslash) that causes a mismatch between the two classes?

Because you are so concerned about the consistent behavior, I won't suggest a manual validation. Even the programmers who wrote these classes can't list all of such scenarios.

The solution

If I understand correctly, you need to load URLs which is supplied by untrustable external sources (which attackers can exploit if there is a loop hole), but you need to identify it's host correctly.

In that case, you can parse it using URI class itself and use URI#getHost() to identify the host. But for WebView, instead of passing the original URL string, pass URI#toString().

Coding Question

Sunday, June 17, 2018

URI and WebView classes parsing URLs containing backslashes differently

2 Answers

Answers 1

Answers 2

The solution

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment

Search

Popular Posts

Labels

Blog Archive

Find Us On Facebook