Extract or Get data from HTML Element in Excel using VBA

You can easily extract the contents of a Web Page from Excel. Web queries are one simple feature in Excel that you can use to import data from any website. You can further automate this process by writing a simple macro using VBA. The example here shows how to extract or read the contents inside an HTML element (any element) from Excel using a simple VBA macro.

Using this example, you can extract specific contents from any websites or a webpage. For example, you can extract news headlines from a news portal, or get stock quotes from a web page etc. This process is also known by the term Screen Scraping.

The contents of a webpage are inserted inside HTML elements. Every HTML element has a tag and you can identify each element by its tag. To read the content of an element (from Excel using VBA), you will first have to locate the element by its tag on the webpage. In-addition, some elements will also have a unique id.

For example, I want to read all the headings from my blog’s home page. I have inserted the headings (with <h2> tags) inside a <ul> tag (it’s also called the unordered list element) and it has an id (ids are unique).

See this image!

Extract HTML Element Contents in WebPage using VBA in Excel

How do you know which element on the webpage? Simple, go to this site and you will see the headings (in bold) for each post on the home page. Press Ctrl+U keys. This will open the view source page on your browser. Now, find ulpost in the source page. You can even right click the header text and select the inspect option (for chrome and fire fox).

Here’s the code that will locate the <ul> tag (with its id) on the webpage and find the <h2> tags (the header elements) that has the texts.

Open an Excel file and save the file in .xlsm format (macro format). Then, press Ctrl+F11 to open the VBA editor.

Add Microsoft HTML Object Library Reference

To make the macro work with an external source such as a webpage, we’ll need the Microsoft’s Internet Explore (to open the page and will remain hidden) and to read and extract contents of HTML elements, we’ll have to create few objects using a library.

Therefore, first add the Microsoft HTML Object Library reference. From the top menu of your VBA editor, click Tools -> References…. In the References window, find and select Microsoft HTML Object Library and click OK.

Option Explicit

Const sSiteName = "https://www.encodedna.com/"

Private Sub GetHTMLContents()
    ' Create Internet Explorer object.
    Dim IE As Object
    Set IE = CreateObject("InternetExplorer.Application")
    IE.Visible = False          ' Keep this hidden.
    
    IE.Navigate sSiteName
    
    ' Wait till IE is fully loaded.
    While IE.ReadyState <> 4
        DoEvents
    Wend
    
    Dim oHDoc As HTMLDocument     ' Create document object.
    Set oHDoc = IE.Document
    
    Dim oHEle As HTMLUListElement     ' Create HTML element (<ul>) object.
    Set oHEle = oHDoc.getElementById("ulPost")   ' Get the element reference using its ID.
    
    Dim iCnt As Integer
    
    ' Loop through elements inside the <ul> element and find <h2>, which has the texts we want.
    With oHEle
        For iCnt = 0 To .getElementsByTagName("h2").Length - 1
            Debug.Print .getElementsByTagName("h2").Item(iCnt).getElementsByTagName("a").Item(0).innerHTML
        Next iCnt
    End With
    
    ' Clean up.
    IE.Quit
    Set IE = Nothing
    Set oHEle = Nothing
    Set oHDoc = Nothing
End Sub

In the beginning of this marco, I have defined the URL of the site. You can define any other site’s URL (and follow the procedures I have explained above).

I have not added any reference of the Internet Explorer in my application, rather I am just created an object of IE using the CreateObject() method.

Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")

You can keep the Internet Explore hidden. There is no need open the page physically on a browser. Then it will wait the page loads completely.

While IE.ReadyState <> 4
    DoEvents
Wend

Next, I’ve create two HTML objects from properties namely, HTMLDocument and HTMLUListElement.

Dim oHDoc As HTMLDocument
Dim oHEle As HTMLUListElement

The second property HTMLUListElement provides methods to manipulate the unordered list elements. Since, I have used <ul> tag to add headers on my webpage. Similarly, you can use the other properties such as the HTMLDivElement to extract contents inside a <div> element.

Finally, I’ll loop through each element inside the <ul> tag and find the <h2> tags. Each <h2> tag has a <a> tag (or the anchor link), which actually has the texts. The innerHTML property will return the text of an element.

.getElementsByTagName("h2").Item(iCnt).getElementsByTagName("a").Item(0).innerHTML

Don’t forget to clean the memory and close the Internet Explorer (which is still running while the macro fetches the content).

Note: I writing the data in my VBA’s immediate window using Debug.print method. You can write the data in your Excel’s worksheet like this …

Cells(iCnt + 1, 1) = .getElementsByTagName("h2").Item(iCnt).getElementsByTagName("a").Item(0).innerHTML

That’s it. Thanks for reading.

Previous - Using VBA FileSystemObject to Create, Write and Read text file in Excel



Like this Article? Subscribe now, and get all the latest articles and tips, right in your inbox.

Enter your email id

Delivered by FeedBurner
Tweet this article Google+

Related Posts:

Join our Google Plus Community and be a part of a discussion!