Extracting Values from PDFs in .NET Core 8 without ASP.NET

Extracting data from PDF files is a common necessity for various tasks such as data analysis, content indexing, and information retrieval. While ASP.NET Core 8 offers robust tools for PDF manipulation, there are instances where developers may prefer alternatives for flexibility or specific project requirements. In this article, we'll explore how to extract values from PDF files within the .NET Core 8 ecosystem without relying on ASP.NET, using the PdfSharpCore library. We'll provide a step-by-step guide along with examples in C# to demonstrate how to accomplish this task effectively.

Understanding PdfSharpCore: PdfSharpCore is a popular .NET library for PDF document manipulation. It provides functionalities to create, modify, and extract content from PDF files. In this guide, we'll focus on utilizing PdfSharpCore to extract text from PDF documents.
Installing PdfSharpCore: Before we can start using PdfSharpCore in our .NET Core application, we need to install the PdfSharpCore NuGet package. This can be done via the NuGet Package Manager Console or the .NET CLI.

Using the NuGet Package Manager Console:

Install-Package PdfSharpCore

Using the .NET CLI

dotnet add package PdfSharpCore

Extracting Text from PDFs in C#: Now we have PdfSharpCore installed, let's dive into how we can extract text from PDF files using C#.

using PdfSharpCore.Pdf;
using PdfSharpCore.Pdf.IO;
using System;

public class PdfTextExtractor
{
    public static string ExtractTextFromPdf(string filePath)
    {
        using (PdfDocument document = PdfReader.Open(filePath, PdfDocumentOpenMode.Import))
        {
            string text = "";
            foreach (PdfPage page in document.Pages)
            {
                text += page.GetText();
            }
            return text;
        }
    }

    // Example usage:
    public static void Main(string[] args)
    {
        string pdfText = ExtractTextFromPdf("sample.pdf");
        Console.WriteLine(pdfText);
    }
}

In this example, we've created a PdfTextExtractor class with a static method ExtractTextFromPdf that takes the file path of the PDF as input and returns the extracted text. Inside the method, we use PdfSharpCore to open the PDF file, iterate through its pages, and extract text from each page. Finally, the extracted text is concatenated and returned.

All about Asp.net, C#, SQL Server, Java Script, jQuery, Angular, React, Node JS, MSSQL

Search This Blog

Extracting Values from PDFs in .NET Core 8 without ASP.NET

Labels

Comments

Post a Comment

Popular posts from this blog

How To Implement NLog With WebAPI In Asp.Net(C#).

Generating serial numbers and keys in Asp.net(C#).