How to Get Info for a PDF Document? Is It Crypted? What's the Version? Does it have ANY text?
- Is a PDF encrypted?
- What is the version of the PDF?
- Does it have any text (or only images)? The actual text is irrelevant.
Note: I have found the solution: Gnostice PDFtoolkit does the trick! Details at the end of the post...
Now, there are quite a few Delphi VCL components to create or print a PDF document from code. There are not too many components that can load an existing PDF document and provide some info. So far, I have found only one library (VersyPDF) that can answer the first two questions.
To answer the third question I have tried using the IFilter interface and the code provided by ShorterPath (function GetFileContentsFromIFilter in the spFilter.pas unit). Note: ShorterPath provides a free Delphi component set - the zip file contains two units related to the IFilter interface and its Delphi implementation. The SPFilter.pas file contains a function that can extract text from a document using the IFilter interface.
To have the IFilter DLL for PDF documents I've used Foxit PDF IFilter - any everything worked well. I can easily grab the entire text from a PDF!
Here's the problem I'm having - and I need your help!
I cannot install PDF Filter on users machines - do not ask why.
If anybody knows a way to somehow just check if there's *any* text in a PDF document .. please share...
Related:
Edit: We have a "winner"!Gnostice PDFtoolkit VCL has the anwser to my questions!! Here's the code:
//is a PDF crypted
if gtPDFDocument1.IsEncrypted then
begin
//
end;
//pdfVersion property reveals the PDF version
gtPDFDocument1.PDFVersion
//here's how to get the text
var
pel : TgtPDFPageElementList;
begin
OpenDialog1.Execute;
try
pel := TgtPDFPageElementList.Create;
PDFDoc.LoadFromFile(OpenDialog1.FileName);
pel := PDFDoc.GetPageElements(1,[etText],muPoints);
//Similarly etImage can be used to find Images in the PDF
if pel.Count > 0 then
ShowMessage('The PDF has searchable text in it')
else
ShowMessage('The PDF has no text in it');
finally
FreeAndNil(pel);
End;


Comments
What I do in some case like this is to build the neccessary files into my Delphi exe as resources and extract them on load of the app. If your situation requires you to ‘Install’ the app, then it might be a bit tricky….
Quentin, thanks for the idea - but I’m looking for some “code-way” to see if there’s any text in a PDF document…
I recently purchased Gnostice pdfToolkit to extract and insert pages into pdf documents. But it also has isencrypted, pdfversion properties and a extracttext method. So it should be able to do what you are looking for without having to install anything on the customers computer.
Jan,
Hm, I’ve contacted (just a few days ago) Gnostice and got the following answer regarding the pdfVersion:
“Currently there are no functionalities in PDFtoolkit by which you can retrieve the version of the DPF been used”.
Strange. Ok, I’ll download the trial and try…
Jan, thanks again!
It turns out that I had a misunderstanding with Gnostice support - they do have pdfVersion!
They do not have a property that will tell me what Adobe version was used to create the PDF.
Great!
Hi all,
relatively to this (but not exactly same subject), which architecture would you implement to index a folder with PDF’s? More precisely: a user types a searched key word, and the Delphi app is retrieving all PDF’s containing this key word.
Thanks for your advice
Didier,
Use IFilter implementation from ShorterPath to extract text from a PDF; to locate all PDFs in a folder you can use standard Delphi’s “find file” code: http://delphi.about.com/od/vclusing/a/findfile.htm
~zarko