Saturday, April 30, 2016

Obtain child anchor element within WebBrowser control

Leave a Comment

Preamble

I'm using the WebBrowser control, which a user will interact with, so a solution will need to work with a visible WebBrowser control.

Question

How do I check if an element has an anchor as a child? All browsers are able to distinguish that an element contains an anchor (<a href=""...), and offers "open in new tab" functionality. That is what I am attempting to replicate. However, when I right click on a HtmlElement I'm only able to obtain the parent element.

Example

Taking the BBC website as an example, when I right click on the highlighted element (picture below), my output is DIV, but viewing the source code there is an anchor element as a child of this div.

bbc homepage example

SSCCE

using System; using System.Diagnostics; using System.Windows.Forms;  namespace BrowserLinkClick {     public partial class Form1 : Form     {         private WebBrowser wb;         private bool firstLoad = true;          public Form1()         {             InitializeComponent();         }          private void Form1_Load(object sender, EventArgs e)         {             wb = new WebBrowser();             wb.Dock = DockStyle.Fill;             Controls.Add(wb);             wb.Navigate("http://bbc.co.uk");             wb.DocumentCompleted += wb_DocumentCompleted;         }          private void Document_MouseDown(object sender, HtmlElementEventArgs e)         {             if (e.MouseButtonsPressed == MouseButtons.Right)             {                 HtmlElement element = wb.Document.GetElementFromPoint(PointToClient(MousePosition));                 //I assume I need to check if this element has child elements that contain a TagName "A"                 if (element.TagName == "A")                     Debug.WriteLine("Get link location, open in new tab.");                 else                     Debug.WriteLine(element.TagName);             }         }           private void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)         {             if (firstLoad)             {                 wb.Document.MouseDown += new HtmlElementEventHandler(Document_MouseDown);                 firstLoad = false;             }         }      } } 

Please test any proposed solution using the BBC website and the highlighted headline (the headline changes, but the DOM remains the same).

4 Answers

Answers 1

You have to get the child elements of element before checking if it's an anchor:

HtmlElement element = wb.Document.GetElementFromPoint(PointToClient(MousePosition)); foreach (HtmlElement child in element.Children) {     if (child.TagName == "A")         Debug.WriteLine("Get link location, open in new tab."); } 

Answers 2

To access the needed properties you need to cast the HtmlElement to one of the unmanaged MSHTML interfaces, e.g. IHTMLAnchorElement

You have to add Microsoft HTML Object Library COM reference to your project.
(The file name is mshtml.tlb.)

foreach (HtmlElement child in element.Children) {     if (String.Equals(child.TagName, "a", StringComparison.OrdinalIgnoreCase))     {         var anchorElement = (mshtml.IHTMLAnchorElement)child.DomElement;         Console.WriteLine("href: [{0}]", anchorElement.href);     } } 

There are plenty of such interfaces but MSDN will help you choose. :)

Scripting Object Interfaces (MSHTML)

Answers 3

I propose you the following solution:
url variable will have url of your desired output, you'll be able to see it in debugger window.

private void Document_MouseDown(object sender, HtmlElementEventArgs e) {         if (e.MouseButtonsPressed == MouseButtons.Right)         {             HtmlElement element = wb.Document.GetElementFromPoint(PointToClient(MousePosition));             //I assume I need to check if this element has child elements that contain a TagName "A"             if (element.TagName == "A")             {                 Debug.WriteLine("Get link location, open in new tab.");                 var urlRaw = element.OuterHtml;                 string hrefBegin = "href=";                 var idxHref = urlRaw.IndexOf(hrefBegin) + hrefBegin.Length + 1;                 var idxEnd = urlRaw.IndexOf("\"", idxHref + 1);                 var url = urlRaw.Substring(idxHref, idxEnd - idxHref);                 Debug.WriteLine(url);             }              else                 Debug.WriteLine(element.TagName);         }     } 

Answers 4

There has to be something else wrong with your program. On the BBC website your code works for the news articles (although I see the non UK version of the site). On other websites where there are anchor elements as children the code below works

 private void Document_MouseDown(object sender, HtmlElementEventArgs e)     {         if (e.MouseButtonsPressed == MouseButtons.Right)         {             HtmlElement element = wb.Document.GetElementFromPoint(PointToClient(MousePosition));             if (element.Children.Count > 0)             {                 foreach (HtmlElement child in element.Children)                 {                     if (child.TagName == "A")                         Debug.WriteLine("Get link location, open in new tab.");                 }             }             else             {                 //I assume I need to check if this element has child elements that contain a TagName "A"                 if (element.TagName == "A")                     Debug.WriteLine("Get link location, open in new tab.");                 else                     Debug.WriteLine(element.TagName);             }         }     } 
If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment