Creating Text Based PDFs for Hypothesis and Perusall

Creating Text Based PDFs for Hypothesis and Perusall

This resource walks you through converting a scanned, image recognized PDF into a text recognized, searchable PDF. PDF documents can be used for learning activities with tools such as Hypothesis and Perusall. However, PDFs must be readable as text. This requires that the document be scanned using Optical Character Recognition (OCR) software, or converted to text using Adobe Acrobat OCR or similar software. Adobe Acrobat is part of the Adobe Creative Cloud and is available to all UO faculty and staff. Log into the UO Software Center to download it.
Related Topics: Canvas Assignments, Canvas

How do I know if my document is an image or text?

To determine if a document is an image or searchable text, open the document with Adobe Acrobat Pro and try to select a group of words in the document. If it is text, you will be able to highlight and select the text as shown here.

selecting text in a text editable pdf

 

An image scan will not allow the reader to select text. Instead, it will display a "+" as the cursor and a box is drawn around the selected area. Notice the "+" as the cursor and the dotted box around the text that is selected.

Hypothesis and Perusall will "read" this as an image, not as text, which will frustrate students.

Additionally, Perusall's "Read Aloud" feature will not read this as text since it is an image.

image of text not selectable as text

 

How can a document that is an image be converted to text?

Using Adobe Acrobat! You can download Adobe Acrobat from the UO Software Center.

Convert image document to a PDF

  • Open the image in the image viewing/editing software and select Save as (or Export) and save the file as a PDF.

Enable Adobe Acrobat Scan & OCR Tool

  • Open the .pdf file with Adobe Acrobat.
  • Click on the Tools tab.
adobe acrobat tools menu
  • Next, in the Create and Edit tools, click on "Add" Scan & OCR. If you have used the Scan & OCR tool before, select "Open."
adobe acrobat create and edit menu with scan and ocr tool highlighted

Apply OCR Tool

  • Make sure the document is open and that Scan & OCR is "Open."
  • On the Scan & OCR ribbon, select the arrow next to Recognize Text to view the dropdown menu. adobe acrobat scan and ocr ribbon menu

 

 

 

 

  • Select In This File from the dropdown menu.

 

adobe acrobat recognize text dropdown menu

 

  • This will display the Recognize Text ribbon below the Scan & OCR ribbon. Select the Recognize Text button.
adobe acrobat recognize text button on scan and ocr ribbon

 Edit PDF

  • This is not a perfect process and it is often necessary to edit the text. Use the Edit PDF tool to make any necessary edits.
  • Remember to Save your new editable PDF.

Finally, if this document will be used with an Annotation Tool such as Hypothesis or Perusall, the PDF must be editable text, and as clean a copy as possible. This will give the students a much better experience with the tool.