Question

Looking for a Widget and/or PlugIn to search keywords or phrases in all searchable PDFs that are stored in a certain folder or web location

Posted July 25, 2020 222 views
DigitalOceanAPISystem ToolsElasticsearch

Hello. I am looking for a widget and/or plugin to search keywords or phrases in all searchable PDFs that are stored in a certain folder or web location on Digital Ocean. Thanks.

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

×
1 answer

Hi there @AtlInq,

It would depend on how exactly you need to use that tool, for example, if you want to just be able to run it via a script, you could use Python with PyPDF2.

The script itself would look something like this:

# import packages
import PyPDF2
import re

# open the pdf file
object = PyPDF2.PdfFileReader("test.pdf")

# get number of pages
NumPages = object.getNumPages()

# define keyterms
String = "Social"

# extract text and do the search
for i in range(0, NumPages):
    PageObj = object.getPage(i)
    print("this is page " + str(i)) 
    Text = PageObj.extractText() 
    # print(Text)
    ResSearch = re.search(String, Text)
    print(ResSearch)

Source.

Regards,
Bobby

  • Thanks very much for your response.

    You can check out what my site looks like at: https://ATLINQ.com/archived-newspapers/.

    I will have about 4,000 searchable PDF documents stored.

    I want to have a text bar to search keywords or phrases through each PDF document. Then it return the filenames that it finds the keywords or phrases.

    I can do it using my old desktop version of Adobe Acrobat using the “Edit” -> “Search” menu selections.

    What’s my best method?

    John

Submit an Answer