How to convert HTML text into plain text using C#?

Question

I have a requirement in which I need to render/display plain text from HTML content, so I would like to know how can I do it using C#? For example, below is my HTML code

vikas_jk · Accepted Answer

You can create a helper class to convert your HTML content into plain text and use the below method written in C#

 public static string HTMLToText(string HTMLCode)         {             // Remove new lines since they are not visible in HTML             HTMLCode = HTMLCode.Replace("
", " ");             // Remove tab spaces             HTMLCode = HTMLCode.Replace("	", " ");             // Remove multiple white spaces from HTML             HTMLCode = Regex.Replace(HTMLCode, "\s+", " ");             // Remove HEAD tag             HTMLCode = Regex.Replace(HTMLCode, "", ""                                 , RegexOptions.IgnoreCase | RegexOptions.Singleline);             // Remove any JavaScript             HTMLCode = Regex.Replace(HTMLCode, "", ""               , RegexOptions.IgnoreCase | RegexOptions.Singleline);             // Replace special characters like &, <, >, " etc.             StringBuilder sbHTML = new StringBuilder(HTMLCode);             // Note: There are many more special characters, these are just             // most common. You can add new characters in this arrays if needed             string[] OldWords = {" ", "&", """, "<",    ">", "®", "©", "•", "™","'"};             string[] NewWords = { " ", "&", "\"", "<", ">", "Â®", "Â©", "â€¢", "â„¢","\'" };             for (int i = 0; i < OldWords.Length; i++)             {                 sbHTML.Replace(OldWords[i], NewWords[i]);             }             // Check if there are line breaks (
) or paragraph ()             sbHTML.Replace("
", "

");             sbHTML.Replace("
]*>", "");         }

The above method takes the HTML content and remove's all the HTML tags and code and provide you output as plain text, you can check the online sample here http://rextester.com/AKMG13869

Go to the above link and run it, you can see output as

Some text here Some more text

done.

OR

you can also use HTML HtmlAgilityPack to convert HTML to text in C#

Example:

var sampleText = HtmlUtilities.ConvertToPlainText(string html);

Thanks.

How to convert HTML text into plain text using C#?

Related Articles

Subscribe Now

Related Questions