How to compare two pdf in c#?


How can I compare Two pdf file in asp.net c#? Any examples or link, thanks.


Asked by:- VaibhavKamble
0
: 301 At:- 8/16/2021 5:34:45 AM
C# Compare PDF using C#






1 Answers
profileImage Answered by:- vikas_jk

You can perform following steps:

  • Install Nuget Package 'iTextsharp'
    Install-Package iTextSharp
  • Once you have installed the above Nuget package, extract text from the both PDF files
     var ExtractedPDFToString = ExtractTextFromPdf(@"C:\Users\CT\Desktop\pdf-sample.pdf");?
  • Where ExtractTextFromPDF function is as below or check "read pdf file in C#"

     public static string ExtractTextFromPdf(string path)
            {
                using (PdfReader reader = new PdfReader(path))
                {
                    StringBuilder text = new StringBuilder();
    
                    for (int i = 1; i <= reader.NumberOfPages; i++)
                    {
                        text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
                    }
    
                    return text.ToString();
                }
            }
  • Once you have text from both the PDF files you can compare the text using C# and get difference
                var ExtractedPDFToString1= ExtractTextFromPdf(@"E:\samplepdf\sample1.pdf");
                var ExtractedPDFToString2 = ExtractTextFromPdf(@"E:\samplepdf\sample2.pdf");
    
                IEnumerable<string> onlyB = ExtractedPDFToString2.Except(ExtractedPDFToString1).ToList();
    
                foreach(var lin in onlyB)
                {
                    Console.WriteLine(lin); //difference line by line
                }?

Complete code

using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ReadFromPDF
{
    class Program
    {
        static void Main(string[] args)
        {

            var ExtractedPDFToString1= ExtractTextFromPdf(@"E:\samplepdf\sample1.pdf");
            var ExtractedPDFToString2 = ExtractTextFromPdf(@"E:\samplepdf\sample2.pdf");

            IEnumerable<string> onlyB = ExtractedPDFToString2.Except(ExtractedPDFToString1).ToList();

            foreach(var lin in onlyB)
            {
                Console.WriteLine(lin); //difference line by line
            }
        }


        public static string[] ExtractTextFromPdf(string path)
        {
            using (PdfReader reader = new PdfReader(path))
            {
                StringBuilder text = new StringBuilder();

                for (int i = 1; i <= reader.NumberOfPages; i++)
                {
                    text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
                }

                return text.ToString().Split(new[] { '\r', '\n' });
            }
        }
    }
}

Output:

difference-pdf-csharp-min.gif

Above image shows both pdf1/pdf 2 and difference in console as output

2
At:- 8/16/2021 7:04:43 AM
Excellent details example, thanks for this. 0
By : bhanu - at :- 8/29/2021 3:09:55 PM






Login/Register to answer
Or
Register directly by posting answer/details

Full Name *

Email *




By posting your answer you agree on privacy policy & terms of use