Home

SiteMap

Extract or Get data from HTML Element in Excel using VBA

← PrevNext →

Last updated: 8th May 2024

You can easily extract the contents of a Web Page from Excel. Web queries are one simple feature in Excel that you can use to import data from any website. You can further automate this process by writing a simple macro using VBA. The example here shows how to extract or read the contents inside an HTML element (any element) from Excel using a simple macro.

What this macro will do?

Using the script (below), you can extract specific contents from any websites or a webpage. For example, you can extract news headlines from a news portal, or get stock quotes from a web page etc. This process is popularly known by the term Screen Scraping.

Contents in webpage, are embeded inside HTML elements. For example, to add a "paragraph" in a webpage, you can use the <p></p> element. Every HTML element has a tag and you can identify each element by its tag. To read the content of an element (from Excel using VBA), you will first have to locate the element by its tag on the webpage. In-addition, some elements will also have a unique id.

For example, I want to read all the headings under "Recent Articles" from this web page. The header texts are inside a <p> tag (or paragraph). All the "headers" are embeded inside a DIV element, which is the parent. The parent has an id (a unique id). The script will use this id to locate the elements that we need.

See the below image.

Extract HTML Element Contents in WebPage using VBA in Excel

How do you know which HTML element it is? Its simple. Go to this page, scroll down and find "Recent Articles" section. Set the cursor on the header and "right click" the mouse, choose "Inspect" option and click it. (for Chrome and Edge browsers). This will open the browsers "console" window, highlighting the element.

So, lets get on with the example.

Open an Excel file and save the file in .xlsm format (macro format). Then, press Ctrl+F11 to open the VBA editor.

Add Microsoft HTML Object Library Reference

To make the macro work with an external source such as a webpage, we’ll need Internet Explorer reference. In-addition, to read and extract contents of HTML elements, we’ll have to create few objects using a library.

Microsoft has ended support for IE. However, the example that I am sharing here still works in Windows 11.

Therefore, first add the Microsoft HTML Object Library reference to the application. From the top menu of your VBA editor, click Tools -> References…. In the References window, find and select Microsoft HTML Object Library and click OK.

Option Explicit

Const sSiteName = "https://www.encodedna.com/"

Private Sub getHTMLContents()
    ' create internet explorer object.
    Dim IE As Object
    Set IE = CreateObject("InternetExplorer.Application")
    IE.Visible = False          ' keep this hidden.
    
    IE.navigate sSiteName
    
    ' wait till ie is fully loaded.
    While IE.readyState <> 4
        DoEvents
    Wend
    
    Dim oHDoc As HTMLDocument     ' create document object.
    Set oHDoc = IE.document
    
    Dim oHEle As HTMLDivElement     ' create html element (<div>) object.
    Set oHEle = oHDoc.getElementById("latest")   ' get the element reference using its id. this is the parent element.
    
    Dim iCnt As Integer
    
    ' loop through elements inside the parent <div> element and find <p>,
        ' which has the content we want.
    With oHEle
        For iCnt = 0 To .getElementsByTagName("p").Length - 1
            Debug.Print .getElementsByTagName("p").Item(iCnt).innerHTML   ' show the result in immediate window.
        Next iCnt
    End With
    
    ' clean up.
    IE.Quit
    Set IE = Nothing
    Set oHEle = Nothing
    Set oHDoc = Nothing
End Sub

You can see the output in your VBA editors immediate window.

In the beginning of the code, I have defined the URL of the website. You can define any other web page URL.

I have not added any reference of the Internet Explorer in my application. Instead, I have just created an object of IE using the CreateObject() method.

Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")

You can keep the Internet Explorer hidden. There is no need to open the page physically on a browser. Then it will wait till the page loads completely.

While IE.ReadyState <> 4
    DoEvents
Wend

Next, I’ve create two HTML objects of properties namely, HTMLDocument and HTMLUListElement.

Dim oHDoc As HTMLDocument
Dim oHEle As HTMLUListElement

The second property HTMLUListElement provides methods to manipulate the DOM elements. Similarly, you can use the other properties such as the HTMLDivElement to extract contents inside a <div> element.

Finally, it loops through each element inside the <div> tag (the parent element) and looks for the <p> tags. The innerHTML property will return the content inside the element.

.getElementsByTagName("p").Item(iCnt).innerHTML

Don’t forget to clean the memory and close the Internet Explorer (which is still running while the macro fetches the content).

The script above prints the data in the immediate window using Debug.print method. However, you can write the data in your Excel’s worksheet like this …

Cells(iCnt + 1, 1) = .getElementsByTagName("p").Item(iCnt).innerHTML

Hope you find the macro useful. Let me know if you have any query.

Happy coding. 🙂

← PreviousNext →