Sunday, July 8, 2018

Using scrapy-splash clicking a button

Leave a Comment

I am trying to use Scrapy-splash to click a button on a page that I'm being redirected to.

I have tested manually clicking on the page, and I am redirected to the correct page after I have clicked the button that gives my consent. I have written a small script to click the button when I am redirected to the page, but this is not working.

I have included a snippet of my spider below - am I missing something in my code?:

script=""" function main(splash)     splash:go(splash.args.yahoo_url)     splash:wait(1)     splash:runjs('document.querySelector("input.btn.btn-primary.agree").click()')     splash:wait(1)     return {         html = splash:html(),     } end """   class FoobarSpider(scrapy.Spider):     name = "foobar"                def start_requests(self):         urls = ['https://finance.yahoo.com/quote/IBM/']          for url in urls:             yield SplashRequest(url=url, callback=self.parse,                     endpoint='render.html',                     args={'wait': 3},                     meta = {'yahoo_url': url }                 )        def parse(self, response):         url = response.url          if 'guce.oath.com/collectConsent' in url:             print('About to attempt to authenticate ...')             yield SplashRequest(                                     url,                                      callback = self.get_price,                                      endpoint = 'execute',                                     args = {'lua_source': script, 'yahoo_url': response.meta.get('yahoo_url'), 'timeout': 3600},                                     meta = response.meta                                  )          else:             self.get_price(response)         def get_price(self, response):             yahoo_price = None                    try:             # Get Price ...             temp1 = response.css('div.D\(ib\).Mend\(20px\)')             if temp1 and len(temp1) > 1:                 temp2 = temp1[1].css('span')                 if len(temp2) > 0:                     yahoo_price = convert_to_float(temp2[0].xpath('.//text()').extract_first().replace(',','') )              if not yahoo_price:                 val = response.css('span.Trsdu\(0\.3s\).Trsdu\(0\.3s\).Fw\(b\).Fz\(36px\).Mb\(-4px\).D\(b\)').xpath('.//text()').extract_first().replace(',','')                 yahoo_price = convert_to_float(val)           except Exception as err:             pass                   def handle_error(self, failure):         pass 

How do I fix this so that I can correctly give consent, so I'm directed to the page I want?

1 Answers

Answers 1

Rather than clicking the button, try submitting the form:

document.querySelector("form.consent-form").submit() 

I tried running the JavaScript command input.btn.btn-primary.agree").click() in my console and would get an error message "Oops, Something went Wrong" but the page loads when using the above code to submit the form.

Because I'm not in Europe I can't fully recreate your setup but I believe that should get you past the issue. My guess is that this script is interfering with the other method.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment