I was curious about how people build third-party apps for sites with NO public APIs, but I could not really find any tutorials on this topic. So I decided to just give it a try. I created a simple desktop application, which uses HttpClient to send GET requests to the site I frequently use, and then parses the response and displays the data in my WPF window. This approach worked pretty well (probably because the site is fairly simple).
However, today I tried to run my application from a different place, and I kept getting 403 errors in response to my application's requests. It turned out, that the network I was using went through a VPN server, while the site I was trying to access used CloudFlare as protection layer, which apparently forces VPN users to enter reCaptcha in order to access the target site.
var baseAddress = new Uri("http://www.cloudflare.com"); using (var client = new HttpClient() { BaseAddress = baseAddress }) { var message = new HttpRequestMessage(HttpMethod.Get, "/"); //this line returns CloudFlare home page if I use regualr network and reCaptcha page, when I use VPN var result = await client.SendAsync(message); //this line throws if I use VPN (403 Forbidden) result.EnsureSuccessStatusCode(); } Now the question is: what is the proper way to deal with CloudFlare protection in client application? Do I have to display the reCaptcha in my application just like the web browser does? Do I have to set any particular headers in order to get a proper response instead of 403? Any tips are welcome, as this is a completely new area to me.
P.S. I write in C# because this is the laguage I'm most comfortable with, but I don't mind aswers using any other language as long as they answer the question.
2 Answers
Answers 1
I guess, one way to go about it is to handle captcha in web browser, outside the client application.
- Parse the response to see if it is a captcha page.
- If it is - open this page in browser.
- Let user solve the captcha there.
- Fetch the CloudFlare cookies form browser's cookie storage. You gonna need
__cfduid(user ID) andcf_clearance(proof of solving the captcha). - Attach those cookies to requests sent by client application.
- Use application as normal for the next 24 hours (until CloudFlare cookies expire).
Now the hard part here is (4). It's easy to manually copy-paste the cookies to make the code snippet in my question work with VPN:
var baseAddress = new Uri("http://www.cloudflare.com"); var cookieContainer = new CookieContainer(); using (var client = new HttpClient(new HttpClientHandler() { CookieContainer = cookieContainer } , true) { BaseAddress = baseAddress }) { var message = new HttpRequestMessage(HttpMethod.Get, "/"); //I've also copy-pasted all the headers from browser //some of those might be optional message.Headers.Add("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0"); message.Headers.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"); message.Headers.Add("Accept-Encoding", "gzip, deflate" }); message.Headers.Add("Accept-Language", "en-US;q=0.5,en;q=0.3"); //adding CloudFlare cookies cookieContainer.Add(new Cookie("__cfduid", "copy-pasted-cookie-value", "/", "cloudflare.com")); cookieContainer.Add(new Cookie("cf_clearance", "copy-pasted-cookie-value", "/", "cloudflare.com")); var result = await client.SendAsync(message); result.EnsureSuccessStatusCode(); } But I think its going to be a tricky task to automate the process of fetching the cookies, due to different browsers storing cookies in different places and/or formats. Not to metion the fact that you need to use external browser for this approach to work, which is really annoying. Still, its something to consider.
Answers 2
Answer to "build third-party apps for sites with NO public APIs" is that even though some Software Vendors don't have a public api's they have partner programs.
Good example is Netflix, they used to have a public api. Some of the Apps developed when the Public Api was enabled allowed to continue api usage.
In your scenario, your client app acts as a web crawler (downloading html content and trying to parse information). What you are trying to do is to Crawl the Cloudfare data which is not meant to be crawled by a third party app (bot). From the cloudfare side, they have done the correct thing to have a Captcha which prevents automated requests.
Further, if you try to send requests at a high frequency (requests/sec), and if the Cloudfare has Threat detection mechanisms, your ip address will be blocked. I assume that they already identified the VPN server IP address you are trying to use and blacklisted that, that's why you are getting a 403.
Basically you solely depend on security holes in Cloudfare pages you try to access via the client app. This is sort of hacking Cloudfare (doing something cloudfare has restricted) which I would not recommend.
If you have a cool idea, better to contact their developer team and discuss about that.
0 comments:
Post a Comment