Monday, November 13, 2017

Python - Manipulate and read browser from current browser

Leave a Comment

I am struggling to find a method in python which allows you to read data in a currently used web browser. Effectively, I am trying to download a massive dataframe of data on a locally controlled company webpage and implement it into a dataframe. The issue is that the website has a fairly complex authentication token process which I have not been able to bypass using Selenium using a slew of webdrivers, Requests, urllib, and cookielib using a variety of user parameters. I have given up on this front entirely as I am almost positive that there is more to the authentication process than can be achieved easily with these libraries.

However, I did manage to bypass the required tokenization process when I quickly tested opening a new tab in a current browser which was already logged in using WebBrowser. Classically, WebBrowser does not offer a read function meaning that even though the page can be opened the data on the page cannot be read into a pandas dataframe. This got me thinking I could use Win32com, open a browser, login, then run the rest of the script, but again, there is no general read ability of the dispatch for internet explorer meaning I can't send the information I want to pandas. I'm stumped. Any ideas?

I could acquire the necessary authentication token scripts, but I am sure that it would take a week or two before anything would happen on that front. I would obviously prefer to get something in the mean time while I wait for the actual auth scripts from the company.

Update: I received authentication tokens from the company, however it requires using a python package on another server I do not have access too, mostly because its an oddity that I am using Python in my department. Thus the above still applies - need a method for reading and manipulating an open browser.

1 Answers

Answers 1

1) Start browser with Selenium. 2) Script should start waiting for certain element that inform you that you got required page. 4) You can use this new browser window to logint to page. 5) Script detects that you are logged 3) Script processes page.

from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC  # start webdriver chrome = webdriver.Chrome()  # initialise waiter with 300 seconds to wait. waiter = WebDriverWait(chrome , 300)  # Will wait for appear of #logout element. # I assume it shows that you are logged in. wait.until(EC.presence_of_element_located(By.ID, "logout"))  # Extract data etc. 

It might be easier to log in if you use user's profile.

options = webdriver.ChromeOptions()  options.add_argument("user-data-dir=FULL_PATH__TO_PROFILE") chrome = webdriver.Chrome(chrome_options=options) 

Maybe you should even get your page. You might have session continued so you might be already logged in

chrome.get("https://your_page_here") 
If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment