Extract or Get data from HTML Element in Excel using VBA

← PrevNext →

You can easily extract the contents of a Web Page from Excel. Web queries are one simple feature in Excel that you can use to import data from any website. You can further automate this process by writing a simple macro using VBA. The example here shows how to extract or read the contents inside an HTML element (any element) from Excel using a simple VBA macro.

Using this example, you can extract specific contents from any websites or a webpage. For example, you can extract news headlines from a news portal, or get stock quotes from a web page etc. This process is also known by the term Screen Scraping.

The contents of a webpage are inserted inside HTML elements. Every HTML element has a tag and you can identify each element by its tag. To read the content of an element (from Excel using VBA), you will first have to locate the element by its tag on the webpage. In-addition, some elements will also have a unique id.

For example, I want to read all the headings from my blog’s home page. I have inserted the headings (with <h1> tags) inside a <ul> tag (it’s also called the unordered list element) and it has an id (ids are unique).

See this image!

Extract HTML Element Contents in WebPage using VBA in Excel

How do you know which element on the webpage? Simple, go to this site and you will see the headings (in bold) for each post on the home page. Press Ctrl+U keys. This will open the view source page on your browser. Now, find ulpost in the source page. You can even right click the header text and select the inspect option (for chrome and firefox).

The example code here, will locate the <ul> tag (with its id) on the webpage and will find the <h1> tags (the header elements) that has the texts.

Open an Excel file and save the file in .xlsm format (macro format). Then, press Ctrl+F11 to open the VBA editor.

Add Microsoft HTML Object Library Reference

To make the macro work with an external source such as a webpage, we’ll need Microsoft’s Internet Explore (to open the page and will remain hidden). In-addition, to read and extract contents of HTML elements, we’ll have to create few objects using a library.

Therefore, first add the Microsoft HTML Object Library reference to the application. From the top menu of your VBA editor, click Tools -> References…. In the References window, find and select Microsoft HTML Object Library and click OK.

👉 Web Scrapping – How to fill a Web Form from Excel using VBA
Fill a Web Form automatically from Excel

Option Explicit

Const sSiteName = "https://www.encodedna.com/"

Private Sub getHTMLContents()
    ' Create Internet Explorer object.
    Dim IE As Object
    Set IE = CreateObject("InternetExplorer.Application")
    IE.Visible = False          ' Keep this hidden.
    
    IE.Navigate sSiteName
    
    ' Wait till IE is fully loaded.
    While IE.ReadyState <> 4
        DoEvents
    Wend
    
    Dim oHDoc As HTMLDocument     ' Create document object.
    Set oHDoc = IE.Document
    
    Dim oHEle As HTMLUListElement     ' Create HTML element (<ul>) object.
    Set oHEle = oHDoc.getElementById("ulPost")   ' Get the element reference using its ID.
    
    Dim iCnt As Integer
    
    ' Loop through elements inside the <ul> element and find <h1>, which has the texts we want.
    With oHEle
        For iCnt = 0 To .getElementsByTagName("h1").Length - 1
            Debug.Print .getElementsByTagName("h1").Item(iCnt).getElementsByTagName("a").Item(0).innerHTML
        Next iCnt
    End With
    
    ' Clean up.
    IE.Quit
    Set IE = Nothing
    Set oHEle = Nothing
    Set oHDoc = Nothing
End Sub

Note: You can see the output of this example in your VBA editors immediate window.

In the beginning of the code, I have defined the URL of the website. You can define any other site’s URL (and follow the procedures I have explained above).

I have not added any reference of the Internet Explorer in my application, rather I have just created an object of IE using the CreateObject() method.

Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")

You can keep the Internet Explorer hidden. There is no need to open the page physically on a browser. Then it will wait till the page loads completely.

While IE.ReadyState <> 4
    DoEvents
Wend

Next, I’ve create two HTML objects of properties namely, HTMLDocument and HTMLUListElement.

Dim oHDoc As HTMLDocument
Dim oHEle As HTMLUListElement

The second property HTMLUListElement provides methods to manipulate the unordered list elements. Since, I have used <ul> tag to add headers on my webpage. Similarly, you can use the other properties such as the HTMLDivElement to extract contents inside a <div> element.

Finally, I’ll loop through each element inside the <ul> tag and find the <h1> tags. Each <h1> tag has a <a> tag (or the anchor link), which actually has the texts. The innerHTML property will return the text of an element.

.getElementsByTagName("h1").Item(iCnt).getElementsByTagName("a").Item(0).innerHTML

Don’t forget to clean the memory and close the Internet Explorer (which is still running while the macro fetches the content).

Note: I writing the data in my VBA’s immediate window using Debug.print method. You can write the data in your Excel’s worksheet like this …

Cells(iCnt + 1, 1) = .getElementsByTagName("h1").Item(iCnt).getElementsByTagName("a").Item(0).innerHTML

That’s it. Thanks for reading.

← PreviousNext →