If you are trying to do web-scraping or scrape web-page in C# then in this article, I have provided console application to scrape website content using AngleSharp NuGet package in C#.
So first, we will create a new console application project in Visual Studio, I am using VS 2022 with .NET Core 5, but you can use .NET Core 6 or old .NET framework.
Once Visual Studio generates console application template, you will need to install NuGet Package(AngleSharp), by navigating to Tools -> NuGet package manager -> NuGet Package manager console.
Install-Package AngleSharp
Now, we will be scraping this website data (http://books.toscrape.com/catalogue/category/books/mystery_3/index.html)
We will scrape Book name with prices in C# AngleSharp code from above website.
So here is the C# Console application code
using AngleSharp;
using System;
using System.Linq;
using System.Threading.Tasks;
namespace WebScrapingCsharp
{
public class Program
{
static async Task Main(string[] args)
{
var config = Configuration.Default.WithDefaultLoader();
var address = "http://books.toscrape.com/catalogue/category/books/mystery_3/index.html";
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(address);
var cellSelector = "ol.row li h3 a"; //html element to get book names
var cells = document.QuerySelectorAll(cellSelector);
var titles = cells.Select(m => m.TextContent).ToList();
var cellSelector2 = "ol.row li p.price_color"; //html element to get prices of each book
var cells2 = document.QuerySelectorAll(cellSelector2);
var prices = cells2.Select(m => m.TextContent).ToList();
for(var i=0; i < titles.Count(); i++)
{
Console.WriteLine(titles[i] + " : " + prices[i]);
}
}
}
}
The output of the above code is as below
Sharp Objects : £47.82
In a Dark, Dark ... : £19.63
The Past Never Ends : £56.50
A Murder in Time : £16.64
The Murder of Roger ... : £44.10
The Last Mile (Amos ... : £54.21
That Darkness (Gardiner and ... : £13.92
Tastes Like Fear (DI ... : £10.69
A Time of Torment ... : £48.35
A Study in Scarlet ... : £16.73
Poisonous (Max Revere Novels ... : £26.80
Murder at the 42nd ... : £54.36
Most Wanted : £35.28
Hide Away (Eve Duncan ... : £11.84
Boar Island (Anna Pigeon ... : £59.48
The Widow : £27.26
Playing with Fire : £13.71
What Happened on Beale ... : £25.37
The Bachelor Girl's Guide ... : £52.30
Delivering the Truth (Quaker ... : £20.89
In the above code, we are:
- Setting up the configuration for supporting document loading
- Asynchronously get the document in a new context using the configuration
- Using QuerySelector, we get all cells with the content of interest
- Then using Linq, we get our required data.
Advantages of using Anglesharp
There are many other popular web-scraping Nuget packages like HtmlAgilityPack but AngleSharp has some benefits
- As it can handle CSS and SVG also
- Supports LINQ with DOM elements
- The performance of Anglesharp is better than HtmlAgilityPack in most cases.
- Extensible (extend with your own services)
- Allows Form submission (easily log in everywhere)
You can easily scrape any web page data using Anglesharp in C#.
You may also like to read:
Best Web Scraping Tools To Extract Data (Free/Paid)
Compare JSON using C# and Get difference
Creating Toggle (Switch) button using Javascript and HTML