Sunday, December 31, 2017

Retrieve data from a table of aspx page using Excel VBA

Leave a Comment

I am trying to retrieve table data from aspx page using excel vba.I know how to get table data from a URL but below is the main problem.

Problem

There is an aspx page (say www.abc.aspx). I am currently on this page.Let this page be page1.

Now I click a page2 link on the current page. What is worth noticing is that after clicking this link, the old URL (www.abc.aspx) doesn't change but the content changes.( Content is of page2 )

If you view page1 source code it has

<form method="post" action="page1 url" id="Form1"> 

Whatever is the action on page1 (page2 click) , it posts back the same page1 url.

So how can I get page2 table data in excel VBA since I don't know its URL?

Code

This is what I had used to fetch table data.

I used internet explorer object.Then navigated to the link and saved the document in htmldoc.

ie.navigate "url"  Do While ie.READYSTATE <> READYSTATE_COMPLETE Application.StatusBar = "Fetching data..." DoEvents Loop  Set htmldoc = ie.document  'Column headers Set eleColth = htmldoc.getElementsByTagName("th") j = 0 'start with the first value in the th collection         For Each eleCol In eleColth 'for each element in the td collection             ThisWorkbook.Sheets(1).Range("A1").Offset(i, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time             j = j + 1 'move to next element in td collection         Next eleCol 'rinse and repeat   'Content Set eleColtr = htmldoc.getElementsByTagName("tr")  'This section populates Excel     i = 0 'start with first value in tr collection     For Each eleRow In eleColtr 'for each element in the tr collection         Set eleColtd = htmldoc.getElementsByTagName("tr")(i).getElementsByTagName("td") 'get all the td elements in that specific tr         j = 0 'start with the first value in the td collection         For Each eleCol In eleColtd 'for each element in the td collection             ThisWorkbook.Sheets(1).Range("D3").Offset(i, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time             j = j + 1 'move to next element in td collection         Next eleCol 'rinse and repeat         i = i + 1 'move to next element in td collection     Next eleRow 'rinse and repeat  ie.Quit Set ie = Nothing 

EDIT:

Example

If we click on questions in Stack Overflow (https://stackoverflow.com/questions) and now click on page2 of questions (new link is https://stackoverflow.com/questions?page=2&sort=newest)

In my case, if we click on page2, the new link is not updated.It is the same old link.

EDIT: I have found a similar question here

How do I get url that is hidden by javascript on external website?

Thanks.

2 Answers

Answers 1

Ok, I sympathise, there is a school of thought (including Tim Berners-Lee) that says every separate page should have its own URI and that these don't change.

But webmasters can and do mess you around. They can redirect you HTTP request and can obfuscate the navigation as in your case. They can rewrite HTTP requests.

You have two options

Option 1 - Let Internet Explorer resolve the new content for you

So, if the content is visible on the screen then it must be in the Document Object Model (DOM). In IE, or indeed in Chrome, one can right-click and get the context menu and then choose Inspect to see where in the DOM that element resides.

I think your code demonstrates enough expertise to drill in. However, sometimes some websites like to disable the Inspect menu option to avoid programers poking around. (EDIT: As in your case now that I have read the comments)

Option 2 - Use an HTTP sniffing Tool like Fiddler to detect the HTTP redirect/rewrite

As I said above, HTTP requests can be rewritten and redirected by the web-server but the HTTP protocol does give notifications of redirects. There are tools to detect this. A popular tool is Fiddler, today I have discovered that there is a specific IE Fiddler add-on.

To be honest though the developer tools that ship with the browsers themselves, particularly Chrome (Ctrl+Shift+I, then Network tab), show network traffic to a level of detail increasingly on a par with any sniffing tool.

Sorry you got down-voted, this seems like a perfectly reasonable question.

Answers 2

A bird's eye view on the problem:

You have a requirement that you seem to not be able to let go: Use Excel VBA. I emphasize this point since often times answers provide solutions satisfying alternative premises from what is posted in the OP.

A possible solution:

So you have to Interface Excel VBA with another tool having the capability for revealing the contents of html redirects or obfuscated URLs.

Google Chrome Developer Tools reveals all contents, and you can interface very nicely Google Chrome with Excel VBA, using the Selenium VBA Wrapper. Download here.

It is quite versatile, e.g., you can see how to scrape web data.

As for getting obfuscated contents, there are a few items that may help

how to get innerHTML of whole page in selenium driver? (not VBA but useful)

Selenium + VBA to Control Chrome

(Note: the author of the wrapper is usually eager to answer in SO, and precise in his answers).

I guess YMMV, there are always people trying to obfuscate their data, with various techniques, and often for good reasons...

If you have a real example for your http://www.abc.aspx, it may help.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment