Using this example, you can extract specific contents from any websites or a webpage. For example, you can extract news headlines from a news portal, or get stock quotes from a web page etc. This process is also known by the term Screen Scraping.
The contents of a webpage are inserted inside HTML elements. Every HTML element has a tag and you can identify each element by its tag. To read the content of an element (from Excel using VBA), you will first have to locate the element by its tag on the webpage. In-addition, some elements will also have a unique id.
For example, I want to read all the headings from my blog’s home page. I have inserted the headings (with <h1> tags) inside a <ul> tag (it’s also called the unordered list element) and it has an id (ids are unique).
See this image!
The example code here, will locate the <ul> tag (with its id) on the webpage and will find the <h1> tags (the header elements) that has the texts.
Open an Excel file and save the file in .xlsm format (macro format). Then, press Ctrl+F11 to open the VBA editor.
Add Microsoft HTML Object Library Reference
To make the macro work with an external source such as a webpage, we’ll need Microsoft’s Internet Explore (to open the page and will remain hidden). In-addition, to read and extract contents of HTML elements, we’ll have to create few objects using a library.
Therefore, first add the Microsoft HTML Object Library reference to the application. From the top menu of your VBA editor, click Tools -> References…. In the References window, find and select Microsoft HTML Object Library and click OK.
Option Explicit Const sSiteName = "https://www.encodedna.com/" Private Sub getHTMLContents() ' Create Internet Explorer object. Dim IE As Object Set IE = CreateObject("InternetExplorer.Application") IE.Visible = False ' Keep this hidden. IE.Navigate sSiteName ' Wait till IE is fully loaded. While IE.ReadyState <> 4 DoEvents Wend Dim oHDoc As HTMLDocument ' Create document object. Set oHDoc = IE.Document Dim oHEle As HTMLUListElement ' Create HTML element (<ul>) object. Set oHEle = oHDoc.getElementById("ulPost") ' Get the element reference using its ID. Dim iCnt As Integer ' Loop through elements inside the <ul> element and find <h1>, which has the texts we want. With oHEle For iCnt = 0 To .getElementsByTagName("h1").Length - 1 Debug.Print .getElementsByTagName("h1").Item(iCnt).getElementsByTagName("a").Item(0).innerHTML Next iCnt End With ' Clean up. IE.Quit Set IE = Nothing Set oHEle = Nothing Set oHDoc = Nothing End Sub
In the beginning of the code, I have defined the URL of the website. You can define any other site’s URL (and follow the procedures I have explained above).
I have not added any reference of the Internet Explorer in my application, rather I have just created an object of IE using the CreateObject() method.
Dim IE As Object Set IE = CreateObject("InternetExplorer.Application")
You can keep the Internet Explorer hidden. There is no need to open the page physically on a browser. Then it will wait till the page loads completely.
While IE.ReadyState <> 4 DoEvents Wend
Next, I’ve create two HTML objects of properties namely, HTMLDocument and HTMLUListElement.
Dim oHDoc As HTMLDocument Dim oHEle As HTMLUListElement
The second property HTMLUListElement provides methods to manipulate the unordered list elements. Since, I have used <ul> tag to add headers on my webpage. Similarly, you can use the other properties such as the HTMLDivElement to extract contents inside a <div> element.
Finally, I’ll loop through each element inside the <ul> tag and find the <h1> tags. Each <h1> tag has a <a> tag (or the anchor link), which actually has the texts. The innerHTML property will return the text of an element.
Don’t forget to clean the memory and close the Internet Explorer (which is still running while the macro fetches the content).
Note: I writing the data in my VBA’s immediate window using Debug.print method. You can write the data in your Excel’s worksheet like this …
Cells(iCnt + 1, 1) = .getElementsByTagName("h1").Item(iCnt).getElementsByTagName("a").Item(0).innerHTML
That’s it. Thanks for reading. ☺