Friday, October 27, 2017

How to locate the four elements using selenium in python

Leave a Comment

I am trying to post several parameters to this url and press 'submit' to download a csv file generated.

As the picture below shows, I think 5 steps are needed at least.

1 select 'proprietary' at 'availability'

2 pick 'EPIC exposures' instead of the default 'pointed observations'

3 click button 'submit' and wait one or two minutes

4 in the next page, when the search is done and all columns appear, press 'save table as csv'

For example, the first one:

 find_element_by_class_name('ufa-gwt-DropdownList-TextBox').send_keys("proprietary")  find_element_by_xpath(".//*[@title='Observation Availability'][@type='text']").send_keys("Proprietary") 

Its structure is like:

<input title="Observation Availability" style="width: 100%;" readonly="" class="ufa-gwt-DropdownList-TextBox" type="text"> 

picture

A new page appears after clicking 'submit'. log Then clicking 'save table as' is needed. Clicking 'back to search' will go back.

3 Answers

Answers 1

Unfortunately, I don't think you're going to be able to do this via requests. As far as I can tell, there is no POST being made when you click "Submit". It appears as though all the data is being generated by JavaScript, which requests can't deal with.

You could try using something like Selenium to automate a browser (which can handle the JS) and then scrape data from there.

Answers 2

Try this. You need to process the rest according to your need. Here is the gist part. It produces below results:

import requests   url = "http://nxsa.esac.esa.int/nxsa-sl/servlet/observations-metadata?RESOURCE_CLASS=OBSERVATION&ADQLQUERY=SELECT%20DISTINCT%20OBSERVATION.OBSERVATION_OID,OBSERVATION.MOVING_TARGET,OBSERVATION.OBSERVATION_ID,EPIC_OBSERVATION_IMAGE.ICON,EPIC_OBSERVATION_IMAGE.ICON_PREVIEW,RGS_FLUXED_OBSERVATION_IMAGE.ICON,RGS_FLUXED_OBSERVATION_IMAGE.ICON_PREVIEW,EPIC_MOVING_TARGET_OBSERVATION_IMAGE.ICON,EPIC_MOVING_TARGET_OBSERVATION_IMAGE.ICON_PREVIEW,RGS_FLUXED_MOVING_TARGET_OBSERVATION_IMAGE.ICON,RGS_FLUXED_MOVING_TARGET_OBSERVATION_IMAGE.ICON_PREVIEW,OM_OBSERVATION_IMAGE.ICON_PREVIEW_V,OM_OBSERVATION_IMAGE.ICON_PREVIEW_B,OM_OBSERVATION_IMAGE.ICON_PREVIEW_L,OM_OBSERVATION_IMAGE.ICON_PREVIEW_U,OM_OBSERVATION_IMAGE.ICON_PREVIEW_M,OM_OBSERVATION_IMAGE.ICON_PREVIEW_S,OM_OBSERVATION_IMAGE.ICON_PREVIEW_W,OM_OBSERVATION_IMAGE.ICON_V,OM_OBSERVATION_IMAGE.ICON_B,OM_OBSERVATION_IMAGE.ICON_L,OM_OBSERVATION_IMAGE.ICON_U,OM_OBSERVATION_IMAGE.ICON_M,OM_OBSERVATION_IMAGE.ICON_S,OM_OBSERVATION_IMAGE.ICON_W,OBSERVATION.REVOLUTION,OBSERVATION.PROPRIETARY_END_DATE,OBSERVATION.RA_NOM,OBSERVATION.DEC_NOM,OBSERVATION.POSITION_ANGLE,OBSERVATION.START_UTC,OBSERVATION.END_UTC,OBSERVATION.DURATION,OBSERVATION.TARGET,PROPOSAL.TYPE,PROPOSAL.CATEGORY,PROPOSAL.AO,PROPOSAL.PI_FIRST_NAME,PROPOSAL.PI_SURNAME,TARGET_TYPE.DESCRIPTION,OBSERVATION.LII,OBSERVATION.BII,OBSERVATION.ODF_VERSION,OBSERVATION.PPS_VERSION,OBSERVATION.COORD_OBS,OBSERVATION.COORD_TYPE%20FROM%20FIELD_NOT_USED%20%20WHERE%20OBSERVATION.PROPRIETARY_END_DATE%3E%272017-10-18%27%20%20AND%20%20(PROPOSAL.TYPE=%27Calibration%27%20OR%20PROPOSAL.TYPE=%27Int%20Calibration%27%20OR%20PROPOSAL.TYPE=%27Co-Chandra%27%20OR%20PROPOSAL.TYPE=%27Co-ESO%27%20OR%20PROPOSAL.TYPE=%27GO%27%20OR%20PROPOSAL.TYPE=%27HST%27%20OR%20PROPOSAL.TYPE=%27Large%27%20OR%20PROPOSAL.TYPE=%27Large-Joint%27%20OR%20PROPOSAL.TYPE=%27Triggered%27%20OR%20PROPOSAL.TYPE=%27Target-Opportunity%27%20OR%20PROPOSAL.TYPE=%27TOO%27%20OR%20PROPOSAL.TYPE=%27Triggered-Joint%27)%20%20%20ORDER%20BY%20OBSERVATION.OBSERVATION_ID&PAGE=1&PAGE_SIZE=100&RETURN_TYPE=JSON" res = requests.get(url) data = res.json() result = data['data']  for item in result:     ID = item['OBSERVATION__OBSERVATION_ID']        Surname = item['PROPOSAL__PI_SURNAME']     Name = item['PROPOSAL__PI_FIRST_NAME']     print(ID,Surname,Name) 

Partial results (ID and Name):

0740071301 La Palombara Nicola 0741732601 Kaspi Victoria 0741732701 Kaspi Victoria 0741732801 Kaspi Victoria 0742150101 Grosso Nicolas 0742240801 Roberts Timothy 

Btw, when you reach the target page you will notice two tabs there. This results are derived from (OBSERVATIONS) tab. The link i used above can be found in the chrome developer tools as well.

Answers 3

Since no one has posted a solution yet, here you go. You won't get far with requests, so selenium is your best choice here. If you want to use the below script without any modification, check that:

  • you are on linux or macos, or change dl_dir = '/tmp' to some directory you want
  • you have chromedriver installed, or change the driver to firefox in code (and adapt the download dir configuration according to what firefox wants)

Here is the environment tested with:

$ python -V Python 3.5.3 $ chromedriver --version ChromeDriver 2.33.506106 (8a06c39c4582fbfbab6966dbb1c38a9173bfb1a2) $ pip list --format=freeze | grep selenium selenium==3.6.0 

I commented almost each and every line so let the code do the talk:

import os import time from selenium import webdriver from selenium.webdriver.common import by from selenium.webdriver.remote.webelement import WebElement from selenium.webdriver.support import ui, expected_conditions as EC   def main():     dl_dir = '/tmp'  # temporary download dir so I don't spam the real dl dir with csv files     # check what files are downloaded before the scraping starts (will be explained later)     csvs_old = {file for file in os.listdir(dl_dir) if file.startswith('NXSA-Results-') and file.endswith('.csv')}      # I use chrome so check if you have chromedriver installed     # pass custom dl dir to browser instance     chrome_options = webdriver.ChromeOptions()     prefs = {'download.default_directory' : '/tmp'}     chrome_options.add_experimental_option('prefs', prefs)     driver = webdriver.Chrome(chrome_options=chrome_options)     # open page     driver.get('http://nxsa.esac.esa.int/nxsa-web/#search')      # wait for search ui to appear (abort after 10 secs)     # once there, unfold the filters panel     ui.WebDriverWait(driver, 10).until(         EC.element_to_be_clickable((by.By.XPATH, '//td[text()="Observation and Proposal filters"]'))).click()     # toggle observation availability dropdown     driver.find_element_by_xpath('//input[@title="Observation Availability"]/../../td[2]/div/img').click()     # wait until the dropdown elements are available, then click "proprietary"     ui.WebDriverWait(driver, 10).until(         EC.element_to_be_clickable((by.By.XPATH, '//div[text()="Proprietary" and @class="gwt-Label"]'))).click()     # unfold display options panel     driver.find_element_by_xpath('//td[text()="Display options"]').click()     # deselect "pointed observations"     driver.find_element_by_id('gwt-uid-241').click()     # select "epic exposures"     driver.find_element_by_id('gwt-uid-240').click()      # uncomment if you want to go through the activated settings and verify them     # when commented, the form is submitted immediately     #time.sleep(5)      # submit the form     driver.find_element_by_xpath('//button/span[text()="Submit"]/../img').click()     # wait until the results table has at least one row     ui.WebDriverWait(driver, 10).until(EC.presence_of_element_located((by.By.XPATH, '//tr[@class="MPI"]')))     # click on save     driver.find_element_by_xpath('//span[text()="Save table as"]').click()     # wait for dropdown with "CSV" entry to appear     el = ui.WebDriverWait(driver, 10).until(EC.element_to_be_clickable((by.By.XPATH, '//a[@title="Save as CSV, Comma Separated Values"]')))     # somehow, the clickability does not suffice - selenium still whines about the wrong element being clicked     # as a dirty workaround, wait a fixed amount of time to let js finish ui update     time.sleep(1)     # click on "CSV" entry     el.click()      # now. selenium can't tell whether the file is being downloaded     # we have to do it ourselves     # this is a quick-and-dirty check that waits until a new csv file appears in the dl dir     # replace with watchdogs or whatever     dl_max_wait_time = 10  # secs     seconds = 0     while seconds < dl_max_wait_time:         time.sleep(1)         csvs_new = {file for file in os.listdir(dl_dir) if file.startswith('NXSA-Results-') and file.endswith('.csv')}         if csvs_new - csvs_old:  # new file found in dl dir             print('Downloaded file should be one of {}'.format([os.path.join(dl_dir, file) for file in csvs_new - csvs_old]))             break         seconds += 1      # we're done, so close the browser     driver.close()   # script entry point if __name__ == '__main__':     main() 

If everything is fine, the script should output:

Downloaded file should be one of ['/tmp/NXSA-Results-1509061710475.csv'] 
If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment