Sometimes we might need to convert PDF into HTML and extract text from it using C#, so in this article, I have explained, how you can convert PDF into HTML using C# in ASP.NET MVC and using External .exe file.

So before we begin would have to download pdftohtml.exe

Once you have downloaded the library, you can create a new Project in your  Visual Studio(File-> New->Project->From web(Left pane)-> Select "Asp.Net Web Application"(right pane)) .  

    a) Enter a Name, Click "Ok"

   b) Select "MVC" template from project and click "Ok"

Once the project with template is created, we will create a folder named as "Output" on the root of project, and will also create second folder with name "pdftohtml" inside the solution ( You can add new project inside solution by right clicking the project and selecting "Add" then "New Folder")

After adding folders in the project, copy "pdftohtml.exe" inside folder "pdftohtml" of your solution.

Now, we will go to HomeController.cs inside "Controllers" folders and create a ActionMethod to upload pdf file and convert pdf to html using "pdftohtml.exe"

Here is the code for it in C#

        [HttpPost]
        public ActionResult Index(HttpPostedFileBase file)
        {
            if (file != null && file.ContentLength > 0)
                try
                {  //Server.MapPath takes the absolte path of folder 'Uploads'
                    string path = Path.Combine(Server.MapPath("~/Output"),
                                               Path.GetFileName(file.FileName));
                    

                    //Get the File Name. Remove space characters from File Name.
                    string fileName = file.FileName.Replace(" ", string.Empty);

                    //Save the PDF file.
                    string inputPath = Server.MapPath("~/Output/") + Path.GetFileName(fileName);
                    file.SaveAs(path);

                    //Set the HTML file path.
                    string outputPath = Server.MapPath("~/Output/") + Path.GetFileNameWithoutExtension(fileName) + ".html";

                    ProcessStartInfo startInfo = new ProcessStartInfo();

                    //Set the PDF File Path and HTML File Path as arguments.
                    startInfo.Arguments = string.Format("{0} {1}", inputPath, outputPath);

                    //Set the Path of the PdfToHtml exe file.
                    startInfo.FileName = Server.MapPath("~/pdftohtml/pdftohtml.exe");

                    //Hide the Command window.
                    startInfo.WindowStyle = ProcessWindowStyle.Hidden;
                    startInfo.CreateNoWindow = true;

                    //Execute the PdfToHtml exe file.
                    using (Process process = Process.Start(startInfo))
                    {
                        process.WaitForExit();
                    }
                    ViewBag.Message = "File uploaded successfully";
                }
                catch (Exception ex)
                {
                    ViewBag.Message = "ERROR:" + ex.Message.ToString();
                }
            else
            {
                ViewBag.Message = "You have not specified a file.";
            }
            return View();
        }

In the above code we are passing PDF file to Index ActionMetho using POST and then uploaded PDF file is saved to the Output folder.

We are also passing path of input pdf and outout html path as arguments to ProcessStartInfo object, with the path to "pdftohtml.exe" also to run process using it.

Once the PdfToHtml.exe file is executed  and the PDF file is converted into html, and html is saved into output folder.

Here is the complete HomeController.cs code

using System;
using System.Diagnostics;
using System.IO;
using System.Web;
using System.Web.Mvc;

namespace PDFtoHTMLinMVC.Controllers
{
    public class HomeController : Controller
    {
        public ActionResult Index()
        {
            return View();
        }

        [HttpPost]
        public ActionResult Index(HttpPostedFileBase file)
        {
            if (file != null && file.ContentLength > 0)
                try
                {  //Server.MapPath takes the absolte path of folder 'Uploads'
                    string path = Path.Combine(Server.MapPath("~/Output"),
                                               Path.GetFileName(file.FileName));
                    

                    //Get the File Name. Remove space characters from File Name.
                    string fileName = file.FileName.Replace(" ", string.Empty);

                    //Save the PDF file.
                    string inputPath = Server.MapPath("~/Output/") + Path.GetFileName(fileName);
                    file.SaveAs(path);

                    //Set the HTML file path.
                    string outputPath = Server.MapPath("~/Output/") + Path.GetFileNameWithoutExtension(fileName) + ".html";

                    ProcessStartInfo startInfo = new ProcessStartInfo();

                    //Set the PDF File Path and HTML File Path as arguments.
                    startInfo.Arguments = string.Format("{0} {1}", inputPath, outputPath);

                    //Set the Path of the PdfToHtml exe file.
                    startInfo.FileName = Server.MapPath("~/pdftohtml/pdftohtml.exe");

                    //Hide the Command window.
                    startInfo.WindowStyle = ProcessWindowStyle.Hidden;
                    startInfo.CreateNoWindow = true;

                    //Execute the PdfToHtml exe file.
                    using (Process process = Process.Start(startInfo))
                    {
                        process.WaitForExit();
                    }
                    ViewBag.Message = "File uploaded and converted";
                }
                catch (Exception ex)
                {
                    ViewBag.Message = "ERROR:" + ex.Message.ToString();
                }
            else
            {
                ViewBag.Message = "You have not specified a file.";
            }
            return View();
        }
    }
}

We are done with C# part, we need to create Index.cshtml View to upload pdf file, so here the code for it.

@{
    ViewBag.Title = "Home Page";
}

@using (Html.BeginForm("Index",
                        "Home",
                        FormMethod.Post,
                        new { enctype = "multipart/form-data" }))
{
    <label for="file">Upload PDF File:</label>
    <input type="file" name="file" id="file" /><br><br>
    <input type="submit" value="Upload" />
    <br><br>
    @ViewBag.Message
}

That's it we are done, build your project and run it in browser.

Here is the sample pdf file which I have used in the below sample. it is two page simple pdf file.

Output HTML

Sample Gif Image of complete demo.

pdf-to-html-using-csharp-mvc-min.gif

You can download the sample project.

You might also like to read

Export HTML to PDF in asp.net MVC using iTextSharp or Rotativa (Step by step explanation)

File Upload in ASP.NET MVC (Single and Multiple files upload example)

File uploading using DropZone js & HTML5 in MVC