I am trying to use Scrapy-splash to click a button on a page that I'm being redirected to.
I have tested manually clicking on the page, and I am redirected to the correct page after I have clicked the button that gives my consent. I have written a small script to click the button when I am redirected to the page, but this is not working.
I have included a snippet of my spider below - am I missing something in my code?:
script=""" function main(splash) splash:go(splash.args.yahoo_url) splash:wait(1) splash:runjs('document.querySelector("input.btn.btn-primary.agree").click()') splash:wait(1) return { html = splash:html(), } end """ class FoobarSpider(scrapy.Spider): name = "foobar" def start_requests(self): urls = ['https://finance.yahoo.com/quote/IBM/'] for url in urls: yield SplashRequest(url=url, callback=self.parse, endpoint='render.html', args={'wait': 3}, meta = {'yahoo_url': url } ) def parse(self, response): url = response.url if 'guce.oath.com/collectConsent' in url: print('About to attempt to authenticate ...') yield SplashRequest( url, callback = self.get_price, endpoint = 'execute', args = {'lua_source': script, 'yahoo_url': response.meta.get('yahoo_url'), 'timeout': 3600}, meta = response.meta ) else: self.get_price(response) def get_price(self, response): yahoo_price = None try: # Get Price ... temp1 = response.css('div.D\(ib\).Mend\(20px\)') if temp1 and len(temp1) > 1: temp2 = temp1[1].css('span') if len(temp2) > 0: yahoo_price = convert_to_float(temp2[0].xpath('.//text()').extract_first().replace(',','') ) if not yahoo_price: val = response.css('span.Trsdu\(0\.3s\).Trsdu\(0\.3s\).Fw\(b\).Fz\(36px\).Mb\(-4px\).D\(b\)').xpath('.//text()').extract_first().replace(',','') yahoo_price = convert_to_float(val) except Exception as err: pass def handle_error(self, failure): pass How do I fix this so that I can correctly give consent, so I'm directed to the page I want?
1 Answers
Answers 1
Rather than clicking the button, try submitting the form:
document.querySelector("form.consent-form").submit() I tried running the JavaScript command input.btn.btn-primary.agree").click() in my console and would get an error message "Oops, Something went Wrong" but the page loads when using the above code to submit the form.
Because I'm not in Europe I can't fully recreate your setup but I believe that should get you past the issue. My guess is that this script is interfering with the other method.
0 comments:
Post a Comment