How to get Meta and Title description of live URL in C# and Vb - HtmlAgilityPack

← PrevNext →

Last updated: 20th April 2024

In this tutorial I am going to show you how to get the "title" and "meta" description of a live URL in Asp.Net using C# and VB. The process of automatically parsing or extracting information (like title, meta etc.) from the web page, is often known as Screen Scraping. To parse the web page I am using a library called HtmlAgiltyPack in Asp.Net.

What is Screen Scraping?

Screen Scraping (similar to web scraping) is not a new concept. It’s a process of extracting a wide range of information from a web page, such as, "meta descriptions", title and other vital details like current stock price etc. See the below image. Facebook has this feature too and it delights its users by extracting not just title and Meta descriptions, but images also.

The below image shows how the code will extract web page info like the "title", meta descriptions like the "keywords", "author of the page" and "page viewport", after you provide a live URL.

Get title and meta description of a webpage using C#

See this demo

What is HtmlAgilityPack?

HtmlAgilityPack is a library (.dll) for .Net that provides necessary methods and properties, using which a developer can conveniently extract any kind of information from a web page. There’s one thing that I found very useful and I thought is worth sharing, is that it can extract data even if the page has bad markup. In HTML, a tag starts with an opening and closing tag. If you have missed the closing tag, it will still extract data of that particular tag.

How Do I Install HtmlAgilityPack?

First, you need the HtmlAgility.Pack.dll library file in your computer.

Like I said, the library has the methods and properties for data extraction. Therefore, if you are using .Net 4 or later, you must have access to Nuget Packages with Visual Studio.

Follow these procedure.

Create a new website using Visual Studio. Open "Solution Explorer", right click solution and click Manage Nuget Packages… option.

Manage Nuget Package

In the Nuget packages window, type HtmlAgilityPack in the search box and click the Install button.

Search HtmlAgility with Nuget

In case, you could not install the library using Nuget package, you can straightway download the library from their website page. You will download a zip file, extract the file and copy the library (version HtmlAgilityPack.1.4.6 – Net20) inside the "bin" folder of your project. If you don’t find bin, create the folder in the root directory of your project.

Well, that’s it. You got the library. Now, let’s code.

The Markup

In the markup section, I have added few basic controls. I have a textbox control with "AutoPostBack" set to "true". I wish to extract data when I enter the URL in the box. Therefore, I have added the ontextchanged event that will call a code behind procedure "parseWeb".

The extracted data (info) will be displayed in a DIV element.

<!DOCTYPE html>
<html>
<body>
    <form runat="server">
    <div class="page">
        <div class="main">

            <div style="line-height:18px; clear:both;">
                <div>
                    <asp:TextBox ID="tbEditor" 
                        placeholder="Enter the URL"
                        AutoPostBack="true" 
                        ontextchanged="parseWeb"
                        Width="400px" 
                        Height="40px"
                        TextMode="MultiLine" 
                        runat="server">
                    </asp:TextBox>
                </div>

                <%--show extracted data here--%>
                <div id="divPageDescription" 
                    style="width:400px; padding:10px 0;" 
                    runat="server">
                </div>
            </div>
        </div>
    </div>
    </form>
</body>
</html>

Screen Scraping in C# using HTMLAgilityPack

Add the HtmlAgilityPack library in your project, by adding "using" statement.

using HtmlAgilityPack;

Here’s the complete code.

using System;
using HtmlAgilityPack;

public partial class SiteMaster : System.Web.UI.MasterPage
{
    protected void parseWeb(object sender, EventArgs e)
    {
        string url = null;
        url = tbEditor.Text;

        HtmlWeb HtmlWEB = new HtmlWeb();
        HtmlDocument HtmlDocument = HtmlWEB.Load(url);

        // First get the title of the web page.
        var sTitle = HtmlDocument.DocumentNode.SelectNodes("//title");
        divPageDescription.InnerHtml = "<b> Page title </b>: " + 
            sTitle["title"].InnerText + "<br />";

        // Now, parse <META> tag details.
        var metaTags = HtmlDocument.DocumentNode.SelectNodes("//meta");

        if (metaTags != null)
        {
            foreach (var tag in metaTags)
            {
                if ((tag.Attributes["name"] != null) & (tag.Attributes["content"] != null))
                {
                    divPageDescription.InnerHtml = divPageDescription.InnerHtml + "<br /> " + 
                        "<b> Page " + tag.Attributes["name"].Value + " </b>: " + 
                            tag.Attributes["content"].Value + "<br />";
                }
            }
        }

    }
}
See this demo
Vb

Use Import statement to get access to the "HtmlAgilityPack" methods and properties.

Imports HtmlAgilityPack

Option Explicit On

Imports HtmlAgilityPack

Partial Class Site
    Inherits System.Web.UI.MasterPage

    Protected Sub parseWeb(sender As Object, e As EventArgs)

        Dim url As String
        url = tbEditor.Text

        Dim HtmlWEB As HtmlWeb = New HtmlWeb()
        Dim HtmlDocument As HtmlDocument =HtmlWEB.Load(url)

        // First get the title of the web page.
        Dim sTitle = HtmlDocument.DocumentNode.SelectNodes("//title")
        divPageDescription.InnerHtml = "<b> Page title </b>: " & sTitle.Item("title").InnerText & "<br />"

        ' Now, parse <META> tag details.
        Dim metaTags = HtmlDocument.DocumentNode.SelectNodes("//meta")
        Dim tag

        If Not IsNothing(metaTags) Then
            For Each tag In metaTags
                If Not IsNothing(tag.Attributes("name")) And Not IsNothing(tag.Attributes("content")) Then
                    divPageDescription.InnerHtml = divPageDescription.InnerHtml & "<br /> " & _
                        "<b> Page " & tag.Attributes("name").value & " </b>: " & tag.Attributes("content").value & "<br />"
                End If
            Next
        End If

    End Sub
End Class

Hope you find this article and example useful for your project. You are now in possession of a library with which you can conveniently parse a web page and extract information for analyzing and other purposes.

Happy coding.

← PreviousNext →