Creating Text Based PDFs for Hypothesis and Perusall
How do I know if my document is an image or text?
To determine if a document is an image or searchable text, open the document with Adobe Acrobat Pro and try to select a group of words in the document. If it is text, you will be able to highlight and select the text as shown here.
An image scan will not allow the reader to select text. Instead, it will display a "+" as the cursor and a box is drawn around the selected area. Notice the "+" as the cursor and the dotted box around the text that is selected.
Hypothesis and Perusall will "read" this as an image, not as text, which will frustrate students.
Additionally, Perusall's "Read Aloud" feature will not read this as text since it is an image.
How can a document that is an image be converted to text?
Using Adobe Acrobat! You can download Adobe Acrobat from the UO Software Center.
Convert image document to a PDF
- Open the image in the image viewing/editing software and select Save as (or Export) and save the file as a PDF.
Enable Adobe Acrobat Scan & OCR Tool
- Open the .pdf file with Adobe Acrobat.
- Click on the Tools tab.
- Next, in the Create and Edit tools, click on "Add" Scan & OCR. If you have used the Scan & OCR tool before, select "Open."
Apply OCR Tool
- Make sure the document is open and that Scan & OCR is "Open."
- On the Scan & OCR ribbon, select the arrow next to Recognize Text to view the dropdown menu.
- Select In This File from the dropdown menu.
- This will display the Recognize Text ribbon below the Scan & OCR ribbon. Select the Recognize Text button.
- This is not a perfect process and it is often necessary to edit the text. Use the Edit PDF tool to make any necessary edits.
- Remember to Save your new editable PDF.
Finally, if this document will be used with an Annotation Tool such as Hypothesis or Perusall, the PDF must be editable text, and as clean a copy as possible. This will give the students a much better experience with the tool.