How can I compare Two pdf file in asp.net c#? Any examples or link, thanks.
You can perform following steps:
Install-Package iTextSharp
var ExtractedPDFToString = ExtractTextFromPdf(@"C:\Users\CT\Desktop\pdf-sample.pdf");?
Where ExtractTextFromPDF function is as below or check "read pdf file in C#"
public static string ExtractTextFromPdf(string path)
{
using (PdfReader reader = new PdfReader(path))
{
StringBuilder text = new StringBuilder();
for (int i = 1; i <= reader.NumberOfPages; i++)
{
text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
}
return text.ToString();
}
}
var ExtractedPDFToString1= ExtractTextFromPdf(@"E:\samplepdf\sample1.pdf");
var ExtractedPDFToString2 = ExtractTextFromPdf(@"E:\samplepdf\sample2.pdf");
IEnumerable<string> onlyB = ExtractedPDFToString2.Except(ExtractedPDFToString1).ToList();
foreach(var lin in onlyB)
{
Console.WriteLine(lin); //difference line by line
}?
Complete code
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ReadFromPDF
{
class Program
{
static void Main(string[] args)
{
var ExtractedPDFToString1= ExtractTextFromPdf(@"E:\samplepdf\sample1.pdf");
var ExtractedPDFToString2 = ExtractTextFromPdf(@"E:\samplepdf\sample2.pdf");
IEnumerable<string> onlyB = ExtractedPDFToString2.Except(ExtractedPDFToString1).ToList();
foreach(var lin in onlyB)
{
Console.WriteLine(lin); //difference line by line
}
}
public static string[] ExtractTextFromPdf(string path)
{
using (PdfReader reader = new PdfReader(path))
{
StringBuilder text = new StringBuilder();
for (int i = 1; i <= reader.NumberOfPages; i++)
{
text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
}
return text.ToString().Split(new[] { '\r', '\n' });
}
}
}
}
Output:
Above image shows both pdf1/pdf 2 and difference in console as output
Subscribe to our weekly Newsletter & Keep getting latest article/questions in your inbox weekly