Thursday, August 11, 2016

Proxy+Selenium+PhantomJS can't change User-Agent

Leave a Comment

When using proxy with phantomjs it uses default python user-agent.

Running: Python 3.5.1 on ubuntu 14.04

service_args = []  if self.proxy:     service_args.extend([         '--proxy={}:{}'.format(self.proxy.host, self.proxy.port),         '--proxy-type={}'.format(self.proxy.proto),     ])      if self.proxy.username and self.proxy.password:         service_args.append(             '--proxy-auth={}:{}'.format(self.proxy.username, self.proxy.password)         )  dcap = dict(DesiredCapabilities.PHANTOMJS) dcap["phantomjs.page.settings.userAgent"] = (     "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 "     "(KHTML, like Gecko) Chrome/15.0.87" )  self.webdriver = webdriver.PhantomJS(service_args=service_args, desired_capabilities=dcap) 

And error:

Message: Error Message => 'Unable to find element with css selector '#navcnt td.cur'' caused by Request => {"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"105","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:39281","User-Agent":"Python-urllib/3.5"}...

In similar question there was conclusion that problem was caused by proxy provider by setting user-agent at server level, however I doubt that's the case here since I can modify it using proxy with chrome.

1 Answers

Answers 1

This is what worked for me:

In my case I took a closer look at the capabilities of the PhantomJS driver:

dcap = dict(DesiredCapabilities.PHANTOMJS) dcap["phantomjs.page.settings.userAgent"] = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 (KHTML, like Gecko) Chrome/15.0.87"  service_args = [     '--proxy=5.135.176.41:3123',     '--proxy-type=http', ] phantom = webdriver.PhantomJS(js_path, desired_capabilities=dcap, service_args =service_args) print(phantom.capabilities) 

The output was:

{'databaseEnabled': False, 'handlesAlerts': False, 'rotatable': False, 'browserConnectionEnabled': False, 'browserName': 'phantomjs', 'takesScreenshot': True, 'nativeEvents': True, 'locationContextEnabled': False, 'phantomjs.page.settings.userAgent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 (KHTML, like Gecko) Chrome/15.0.87', 'platform': 'linux-unknown-64bit', 'version': '2.1.1', 'applicationCacheEnabled': False, 'driverName': 'ghostdriver', 'webStorageEnabled': False, 'javascriptEnabled': True, 'cssSelectorsEnabled': True, 'proxy': {'proxyType': 'direct'}, 'acceptSslCerts': False, 'driverVersion': '1.2.0'} 

Which means the userAgent was actually correctly set ('phantomjs.page.settings.userAgent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 (KHTML, like Gecko) Chrome/15.0.87'), but somehow it did not take the proxy I set with the service-args. Manipulating the capabilities manually like this worked out quite nice though:

dcap = dict(DesiredCapabilities.PHANTOMJS) dcap["phantomjs.page.settings.userAgent"] = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 (KHTML, like Gecko) Chrome/15.0.87"  phantom = webdriver.PhantomJS(js_path, desired_capabilities=dcap)  phantom.capabilities["acceptSslCerts"] = True phantom.capabilities["proxy"] = {"proxy": "5.135.176.41:3123",                                  "proxy-type": "http"} max_wait = 20  phantom.set_window_size(1024, 768) phantom.set_page_load_timeout(max_wait) phantom.set_script_timeout(max_wait) phantom.get(url) 

Thanks for this question, I was actually looking into proxies with PhantomJS for quite a while and this question brought me on the right track. I hope this helps!

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment