1. Home
  2. Computing & Technology
  3. Delphi Programming
Zarko Gajic
Zarko's Delphi Programming Blog

By Zarko Gajic, About.com Guide to Delphi Programming

How to Get Info for a PDF Document? Is It Crypted? What's the Version? Does it have ANY text?

Friday October 3, 2008
in Questions :: I'm working on a project where I need to process some existing PDF documents. What I need to know about a particular PDF document is the following:
  • Is a PDF encrypted?
  • What is the version of the PDF?
  • Does it have any text (or only images)? The actual text is irrelevant.

Note: I have found the solution: Gnostice PDFtoolkit does the trick! Details at the end of the post...

Now, there are quite a few Delphi VCL components to create or print a PDF document from code. There are not too many components that can load an existing PDF document and provide some info. So far, I have found only one library (VersyPDF) that can answer the first two questions.

To answer the third question I have tried using the IFilter interface and the code provided by ShorterPath (function GetFileContentsFromIFilter in the spFilter.pas unit). Note: ShorterPath provides a free Delphi component set - the zip file contains two units related to the IFilter interface and its Delphi implementation. The SPFilter.pas file contains a function that can extract text from a document using the IFilter interface.

To have the IFilter DLL for PDF documents I've used Foxit PDF IFilter - any everything worked well. I can easily grab the entire text from a PDF!

Here's the problem I'm having - and I need your help!
I cannot install PDF Filter on users machines - do not ask why.

If anybody knows a way to somehow just check if there's *any* text in a PDF document .. please share...

Related:

Edit: We have a "winner"!

Gnostice PDFtoolkit VCL has the anwser to my questions!! Here's the code:

//is a PDF crypted
if gtPDFDocument1.IsEncrypted then
begin
  //
end;

//pdfVersion property reveals the PDF version
gtPDFDocument1.PDFVersion

//here's how to get the text
var
  pel : TgtPDFPageElementList;
begin
  OpenDialog1.Execute;
  try
    pel := TgtPDFPageElementList.Create;
    PDFDoc.LoadFromFile(OpenDialog1.FileName);
    pel := PDFDoc.GetPageElements(1,[etText],muPoints);
    //Similarly etImage can be used to find Images in the PDF
    if pel.Count > 0 then
      ShowMessage('The PDF has searchable text in it')
    else
      ShowMessage('The PDF has no text in it');
  finally
    FreeAndNil(pel);
  End;

Comments

October 3, 2008 at 7:42 am
(1) Quentin says:

What I do in some case like this is to build the neccessary files into my Delphi exe as resources and extract them on load of the app. If your situation requires you to ‘Install’ the app, then it might be a bit tricky….

October 3, 2008 at 8:18 am
(2) Zarko Gajic says:

Quentin, thanks for the idea - but I’m looking for some “code-way” to see if there’s any text in a PDF document…

October 3, 2008 at 8:25 am
(3) Jan Derk says:

I recently purchased Gnostice pdfToolkit to extract and insert pages into pdf documents. But it also has isencrypted, pdfversion properties and a extracttext method. So it should be able to do what you are looking for without having to install anything on the customers computer.

October 3, 2008 at 8:53 am
(4) Zarko Gajic says:

Jan,

Hm, I’ve contacted (just a few days ago) Gnostice and got the following answer regarding the pdfVersion:

“Currently there are no functionalities in PDFtoolkit by which you can retrieve the version of the DPF been used”.

Strange. Ok, I’ll download the trial and try…

October 3, 2008 at 9:04 am
(5) Zarko Gajic says:

Jan, thanks again!

It turns out that I had a misunderstanding with Gnostice support - they do have pdfVersion!

They do not have a property that will tell me what Adobe version was used to create the PDF.

Great!

October 8, 2008 at 10:32 am
(6) didier says:

Hi all,
relatively to this (but not exactly same subject), which architecture would you implement to index a folder with PDF’s? More precisely: a user types a searched key word, and the Delphi app is retrieving all PDF’s containing this key word.
Thanks for your advice

October 8, 2008 at 1:25 pm
(7) Zarko Gajic says:

Didier,

Use IFilter implementation from ShorterPath to extract text from a PDF; to locate all PDFs in a folder you can use standard Delphi’s “find file” code: http://delphi.about.com/od/vclusing/a/findfile.htm

~zarko

Leave a Comment

Line and paragraph breaks are automatic. Some HTML allowed: <a href="" title="">, <b>, <i>, <strike>

Explore Delphi Programming
About.com Special Features

Stay connected and entertained with reviews on tips on the latest HDTVs, cellphones and more. More >

Easy ways to connect two computers for networking purposes. More >

  1. Home
  2. Computing & Technology
  3. Delphi Programming

©2009 About.com, a part of The New York Times Company.

All rights reserved.