Creating Text Based PDFs for Hypothesis and Perusall

This resource walks you through converting a scanned, image recognized PDF into a text recognized, searchable PDF. PDF documents can be used for learning activities with tools such as Hypothesis and Perusall. However, PDFs must be readable as text. This requires that the document be scanned using Optical Character Recognition (OCR) software, or converted to text using Adobe Acrobat OCR or similar software. Adobe Acrobat is part of the Adobe Creative Cloud and is available to all UO faculty and staff. Log into the UO Software Center to download it.

How do I know if my document is an image or text?

To determine if a document is an image or searchable text, open the document with Adobe Acrobat Pro and try to select a group of words in the document. If it is text, you will be able to highlight and select the text as shown here.

An image scan will not allow the reader to select text. Instead, it will display a "+" as the cursor and a box is drawn around the selected area. Notice the "+" as the cursor and the dotted box around the text that is selected.

Hypothesis and Perusall will "read" this as an image, not as text, which will frustrate students.

Additionally, Perusall's "Read Aloud" feature will not read this as text since it is an image.

How can a document that is an image be converted to text?

Using Adobe Acrobat! You can download Adobe Acrobat from the UO Software Center.

Convert image document to a PDF

Open the image in the image viewing/editing software and select Save as (or Export) and save the file as a PDF.

Enable Adobe Acrobat Scan & OCR Tool

Open the .pdf file with Adobe Acrobat.
Click on the Tools tab.

Next, in the Create and Edit tools, click on "Add" Scan & OCR. If you have used the Scan & OCR tool before, select "Open."

adobe acrobat create and edit menu with scan and ocr tool highlighted

Apply OCR Tool

Make sure the document is open and that Scan & OCR is "Open."
On the Scan & OCR ribbon, select the arrow next to Recognize Text to view the dropdown menu.

Select In This File from the dropdown menu.

adobe acrobat recognize text dropdown menu

This will display the Recognize Text ribbon below the Scan & OCR ribbon. Select the Recognize Text button.

adobe acrobat recognize text button on scan and ocr ribbon

Edit PDF

This is not a perfect process and it is often necessary to edit the text. Use the Edit PDF tool to make any necessary edits.
Remember to Save your new editable PDF.

Finally, if this document will be used with an Annotation Tool such as Hypothesis or Perusall, the PDF must be editable text, and as clean a copy as possible. This will give the students a much better experience with the tool.

Search Teaching Support and Innovation

Teaching Support and Innovation Menu

Teaching Support and Innovation

Creating Text Based PDFs for Hypothesis and Perusall

Creating Text Based PDFs for Hypothesis and Perusall

How do I know if my document is an image or text?

How can a document that is an image be converted to text?

Convert image document to a PDF

Enable Adobe Acrobat Scan & OCR Tool

Apply OCR Tool

Edit PDF

For further assistance, UO Online & Canvas Support are available.