Showing posts with label dotnet-httpclient. Show all posts
Showing posts with label dotnet-httpclient. Show all posts

Thursday, October 4, 2018

HttpClient.GetAsync times out when connected to VPN

Leave a Comment

C# 4.5.2 framework HttpClient.GetAsync() method works fine on Windows 10 when system is not using VPN.

When VPN is connected HttpClient.GetAsync() call to the same address just blocks until it times out. Both Edge and Chrome have no issues accessing that same address.

Is there a way to see what is happening? What is HttpClient doing differently?

Update: Got some interesting clues by calling Dns.GetHostEntry(). Without VPN this call returned only IPv4 addresses that all could be connected to. With VPN client connected Dns.GetHostEntry() returned additional IPv6 addresses at the top of the list. Connection to all IPv6 addresses timed out but all IPv4 ones still worked OK. Now is there a way to figure out without trying to connect which addresses work and which ones do not?

2 Answers

Answers 1

In my experience, this sounds like a VPN / firewall issue to me. One quick thing to toggle in windows is under you VPN adapter properties, try unchecking "Use default gateway on remote network" - I know it sounds like a long shot but have had this problem in the past... enter image description here

Answers 2

Have to answer this myself as this problem has a simple cause but very confusing symptoms.

The root cause:

DNS reports only IPv4 addresses for the host when system is not connected to VPN. All IPv4 addresses are usable.

When VPN connection is active DNS returns IPv6 addresses in addition to IPv4. IPv4 addresses are still accessible but IPv6 are not.

The cause of such invalid network configuration is still a mystery that deserves its own separate post.

Confusing part:

Some apps work no matter what VPN connection status is.

"But web browser can connect to the same host with or without VPN." True. Browsers may use Happy eyeballs approach attempting to connect using both IPv4 and IPv6 at the same time.

"But my old app has not problems connecting." Also true. Some older and not so old apps use IPv4 protocol by default. Support for IPv6 or IPv4+IPv6 has to be explicitly implemented.

"But it works sometimes". This happens when VPN connections are not reliable. It leads to all sorts of solutions that are mere coincidences.

What exactly is happening:

HttpClient.GetAsync() uses default DNS resolution and can connect using both IPv4 and IPv6 addresses. It does not discriminate and there is no direct way to influence protocol selection. If DNS returns inaccessible address then HttpClient may use that invalid address to connect resulting in timeout.

Possible workarounds:

The best: ask IT to fix IPv6 DNS issues. DNS should not report inaccessible addresses.

Good: implement Happy eyeballs approach. Connect to both IPv6 and IPv4 host addresses using numeric IP instead of automatic resolution using host name.

OK: Always connect to IPv4 using numeric IP.

Here is the piece of code that shows how to connect to a specific IP address:

// Get DNS entries for the host. var hostEntry = Dns.GetHostEntry(uri.Host);   // Get IPv4 address var ip4 = hostEntry.AddressList.First(addr => addr.AddressFamily == AddressFamily.InterNetwork); // Build URI with numeric IPv4 var uriBuilderIP4 = new UriBuilder(uri);  uriBuilderIP4.Host = ip4.ToString());  var uri4 = uriBuilder4.Uri;   // Get IPv6 address var ip6 = hostEntry.AddressList.First(addr => addr.AddressFamily == AddressFamily.InterNetworkV6);  // Build URI with numeric IPv6 var uriBuilderIP6 = new UriBuilder(uri);  uriBuilderIP6.Host = $"[{ip6}]";  var uri6 = uriBuilder6.Uri;  

For HTTPS connections numeric addresses work only with "host" header with the name of the host (not an IP address) in it. Here is the way to add it.

var client = new HttpClient();  // Add "host" header with real host name e.g. stackoverflow.com  client.DefaultRequestHeaders.Add("Host", uri.Host);  
Read More

Tuesday, March 15, 2016

How to hadle recaptcha on third-party site in my client application

Leave a Comment

I was curious about how people build third-party apps for sites with NO public APIs, but I could not really find any tutorials on this topic. So I decided to just give it a try. I created a simple desktop application, which uses HttpClient to send GET requests to the site I frequently use, and then parses the response and displays the data in my WPF window. This approach worked pretty well (probably because the site is fairly simple).

However, today I tried to run my application from a different place, and I kept getting 403 errors in response to my application's requests. It turned out, that the network I was using went through a VPN server, while the site I was trying to access used CloudFlare as protection layer, which apparently forces VPN users to enter reCaptcha in order to access the target site.

var baseAddress = new Uri("http://www.cloudflare.com"); using (var client = new HttpClient() { BaseAddress = baseAddress }) {    var message = new HttpRequestMessage(HttpMethod.Get, "/");    //this line returns CloudFlare home page if I use regualr network and reCaptcha page, when I use VPN    var result = await client.SendAsync(message);    //this line throws if I use VPN (403 Forbidden)    result.EnsureSuccessStatusCode(); } 

Now the question is: what is the proper way to deal with CloudFlare protection in client application? Do I have to display the reCaptcha in my application just like the web browser does? Do I have to set any particular headers in order to get a proper response instead of 403? Any tips are welcome, as this is a completely new area to me.

P.S. I write in C# because this is the laguage I'm most comfortable with, but I don't mind aswers using any other language as long as they answer the question.

2 Answers

Answers 1

I guess, one way to go about it is to handle captcha in web browser, outside the client application.

  1. Parse the response to see if it is a captcha page.
  2. If it is - open this page in browser.
  3. Let user solve the captcha there.
  4. Fetch the CloudFlare cookies form browser's cookie storage. You gonna need __cfduid (user ID) and cf_clearance (proof of solving the captcha).
  5. Attach those cookies to requests sent by client application.
  6. Use application as normal for the next 24 hours (until CloudFlare cookies expire).

Now the hard part here is (4). It's easy to manually copy-paste the cookies to make the code snippet in my question work with VPN:

var baseAddress = new Uri("http://www.cloudflare.com"); var cookieContainer = new CookieContainer(); using (var client = new HttpClient(new HttpClientHandler() { CookieContainer = cookieContainer } , true) { BaseAddress = baseAddress }) {     var message = new HttpRequestMessage(HttpMethod.Get, "/");     //I've also copy-pasted all the headers from browser     //some of those might be optional     message.Headers.Add("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0");     message.Headers.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");     message.Headers.Add("Accept-Encoding", "gzip, deflate" });     message.Headers.Add("Accept-Language", "en-US;q=0.5,en;q=0.3");     //adding CloudFlare cookies     cookieContainer.Add(new Cookie("__cfduid", "copy-pasted-cookie-value", "/", "cloudflare.com"));     cookieContainer.Add(new Cookie("cf_clearance", "copy-pasted-cookie-value", "/", "cloudflare.com"));     var result = await client.SendAsync(message);     result.EnsureSuccessStatusCode(); } 

But I think its going to be a tricky task to automate the process of fetching the cookies, due to different browsers storing cookies in different places and/or formats. Not to metion the fact that you need to use external browser for this approach to work, which is really annoying. Still, its something to consider.

Answers 2

Answer to "build third-party apps for sites with NO public APIs" is that even though some Software Vendors don't have a public api's they have partner programs.

Good example is Netflix, they used to have a public api. Some of the Apps developed when the Public Api was enabled allowed to continue api usage.

In your scenario, your client app acts as a web crawler (downloading html content and trying to parse information). What you are trying to do is to Crawl the Cloudfare data which is not meant to be crawled by a third party app (bot). From the cloudfare side, they have done the correct thing to have a Captcha which prevents automated requests.

Further, if you try to send requests at a high frequency (requests/sec), and if the Cloudfare has Threat detection mechanisms, your ip address will be blocked. I assume that they already identified the VPN server IP address you are trying to use and blacklisted that, that's why you are getting a 403.

Basically you solely depend on security holes in Cloudfare pages you try to access via the client app. This is sort of hacking Cloudfare (doing something cloudfare has restricted) which I would not recommend.

If you have a cool idea, better to contact their developer team and discuss about that.

Read More