Tuesday, March 15, 2016

How to hadle recaptcha on third-party site in my client application

Leave a Comment

I was curious about how people build third-party apps for sites with NO public APIs, but I could not really find any tutorials on this topic. So I decided to just give it a try. I created a simple desktop application, which uses HttpClient to send GET requests to the site I frequently use, and then parses the response and displays the data in my WPF window. This approach worked pretty well (probably because the site is fairly simple).

However, today I tried to run my application from a different place, and I kept getting 403 errors in response to my application's requests. It turned out, that the network I was using went through a VPN server, while the site I was trying to access used CloudFlare as protection layer, which apparently forces VPN users to enter reCaptcha in order to access the target site.

var baseAddress = new Uri("http://www.cloudflare.com"); using (var client = new HttpClient() { BaseAddress = baseAddress }) {    var message = new HttpRequestMessage(HttpMethod.Get, "/");    //this line returns CloudFlare home page if I use regualr network and reCaptcha page, when I use VPN    var result = await client.SendAsync(message);    //this line throws if I use VPN (403 Forbidden)    result.EnsureSuccessStatusCode(); } 

Now the question is: what is the proper way to deal with CloudFlare protection in client application? Do I have to display the reCaptcha in my application just like the web browser does? Do I have to set any particular headers in order to get a proper response instead of 403? Any tips are welcome, as this is a completely new area to me.

P.S. I write in C# because this is the laguage I'm most comfortable with, but I don't mind aswers using any other language as long as they answer the question.

2 Answers

Answers 1

I guess, one way to go about it is to handle captcha in web browser, outside the client application.

  1. Parse the response to see if it is a captcha page.
  2. If it is - open this page in browser.
  3. Let user solve the captcha there.
  4. Fetch the CloudFlare cookies form browser's cookie storage. You gonna need __cfduid (user ID) and cf_clearance (proof of solving the captcha).
  5. Attach those cookies to requests sent by client application.
  6. Use application as normal for the next 24 hours (until CloudFlare cookies expire).

Now the hard part here is (4). It's easy to manually copy-paste the cookies to make the code snippet in my question work with VPN:

var baseAddress = new Uri("http://www.cloudflare.com"); var cookieContainer = new CookieContainer(); using (var client = new HttpClient(new HttpClientHandler() { CookieContainer = cookieContainer } , true) { BaseAddress = baseAddress }) {     var message = new HttpRequestMessage(HttpMethod.Get, "/");     //I've also copy-pasted all the headers from browser     //some of those might be optional     message.Headers.Add("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0");     message.Headers.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");     message.Headers.Add("Accept-Encoding", "gzip, deflate" });     message.Headers.Add("Accept-Language", "en-US;q=0.5,en;q=0.3");     //adding CloudFlare cookies     cookieContainer.Add(new Cookie("__cfduid", "copy-pasted-cookie-value", "/", "cloudflare.com"));     cookieContainer.Add(new Cookie("cf_clearance", "copy-pasted-cookie-value", "/", "cloudflare.com"));     var result = await client.SendAsync(message);     result.EnsureSuccessStatusCode(); } 

But I think its going to be a tricky task to automate the process of fetching the cookies, due to different browsers storing cookies in different places and/or formats. Not to metion the fact that you need to use external browser for this approach to work, which is really annoying. Still, its something to consider.

Answers 2

Answer to "build third-party apps for sites with NO public APIs" is that even though some Software Vendors don't have a public api's they have partner programs.

Good example is Netflix, they used to have a public api. Some of the Apps developed when the Public Api was enabled allowed to continue api usage.

In your scenario, your client app acts as a web crawler (downloading html content and trying to parse information). What you are trying to do is to Crawl the Cloudfare data which is not meant to be crawled by a third party app (bot). From the cloudfare side, they have done the correct thing to have a Captcha which prevents automated requests.

Further, if you try to send requests at a high frequency (requests/sec), and if the Cloudfare has Threat detection mechanisms, your ip address will be blocked. I assume that they already identified the VPN server IP address you are trying to use and blacklisted that, that's why you are getting a 403.

Basically you solely depend on security holes in Cloudfare pages you try to access via the client app. This is sort of hacking Cloudfare (doing something cloudfare has restricted) which I would not recommend.

If you have a cool idea, better to contact their developer team and discuss about that.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment