2008-08-05

QTP - How to get number of pages in PDF file?

PDF file data extracting is not a trivial task in automated testing with QTP.
Recently I faced with a question:
How to get number of pages in PDF file?

In other words, how to get number of pages for any given PDF file, like on this image:
This is QuickTest Professional 9.5 User's Guide, located in \help\QTUsersGuide.pdf. It contains 1418 pages.
Let's see how we can get this number from QTP script.

I created the following simple VBScript for QuickTest Professional to extract number of pages in PDF file:
' Function GetNumPagesInPDF returns the number of pages in PDF file
' FileName - path to given ODF file
' If a file isn't found, then -1 will be returned

Function GetNumPagesInPDF(FileName)
    Dim oPDFDoc
    Set oPDFDoc = CreateObject( "AcroExch.PDDoc" )

    If oPDFDoc.Open( FileName ) Then
        GetNumPagesInPDF = oPDFDoc.GetNumPages()
        Set oPDFDoc = Nothing
    Else
        GetNumPagesInPDF = -1
    End If
End Function


I call GetNumPagesInPDF function with a full path of PDF file:
numPages = GetNumPagesInPDF("C:\Program Files\Mercury\QuickTest Professional\help\QTUsersGuide.pdf")
MsgBox "Number of pages: " & numPages


And the result of this QTP script is:
As you can see, our QTP script works correctly and extracts the number of pages in PDF file.


How does the above QTP script work?

We
use Acrobat OLE object - "AcroExch.PDDoc":
Set oPDFDoc = CreateObject( "AcroExch.PDDoc" )
It provides an interface for common Acrobat document opeartions, such as: opening/closing, working with pages etc.

To use "AcroExch.PDDoc" object, you have to install Adobe Acrobat (do not confuse with Acrobat Reader!) on your computer.
You can check whether "AcroExch.PDDoc" object is available on your computer. For that, open a registry and check the path:
If "AcroExch.PDDoc" key exists, then you can use Acrobat OLE Automation in your QTP scripts.
If not, then Adobe Acrobat should be installed.

Important note:
The above code can work without QuickTest Professional!
Just save the code into vbs-file and run. It will return the same result - number of pages in PDF file.


Related articles:


Do you have interesting topics and ideas to be explained and shown on this blog?
Please, let me know! Send them to my email: Dmitry Motevich's email

12 comments:

  1. Hey Dmitri...this whole thing looked simple and sweet when you know that you are supposed to use the type AcroExch.PDDoc to create the object. But how did you know that you had to use this object? Where can we get this information?

    ReplyDelete
  2. I have a request Dmitri... Could you please cover topics on practical examples of Recovery Scenario and Library Functions ?

    ReplyDelete
  3. 2George,
    Where can we get this information?
    In Google we trust :)

    ReplyDelete
  4. 2George,
    Could you please cover topics on practical examples of Recovery Scenario and Library Functions ?

    Sure! I do it, but only after you send me your visual tutorials on automated testing!

    Why don't you contribute, Mr. George? :)

    ReplyDelete
  5. This code will work, if we have full version of Adobe. I was trying to get the text from PDF (i have only Adode reader). I'm getting the whole text from PDF, but i need to get only a selective part of the text in PDF. Is there anyway to get the requried text?

    ReplyDelete
  6. Anonymous,
    1. Please provide the code you use to get "the whole text from PDF".
    At least, the present article does not cover this topic :)
    2. What is "selective part of the text in PDF"?

    ReplyDelete
  7. Dmitry..I will also contribute to this forum....I dont know how to do a video recording like you do...

    ReplyDelete
  8. @ George,
    Write me email and I will explain how to do it :)
    It's not difficult.

    ReplyDelete
  9. how to read text from pdf file through qtp

    ReplyDelete
  10. @Barani,
    http://www.codeproject.com/KB/cpp/ExtractPDFText.aspx

    ReplyDelete
  11. Dmitri

    I have the QTP 10.0 with ActiveX, VB, Web Addin associated with the Test. When am trying to retrieve the no of pages in PDF, I got a run error ActiveX component cant create a AcroExch.PDDoc.

    Do u have any idea on this


    Regards,

    Kandas

    ReplyDelete