Translation technology is great when you work with digital files, but I’m often asked at translation events how to translate hard-copies of physical documents using SDL Trados Studio. This seems to be particularly common in the legal domain where clients often fax or send hard-copies of texts for security reasons. Good news! We’ve made scanned documents even easier to handle with SDL Trados Studio 2015.
Handling editable PDFs has been possible since Studio 2011, however Studio 2015 now includes a new PDF file type that has three new options to Recognize PDF text. These options make it possible to process scanned PDF documents, recognize the text within them using optical character recognition and make it available for translation in Studio 2015.
So imagine you’ve received a hard copy of a document from a client. What happens next?
With this new file type you can scan the documents to PDF format, then configure the file type options in SDL Trados Studio 2015 under File > Options > File Types > PDF > Converter.
There are three different options under this file type when it comes to recognizing PDF text:
- Every character – designed for PDFs containing both editable and scanned text
- Problem characters only – designed for completely scanned PDFs
- None – designed for editable PDFs
You can also fine-tune the other conversion settings, including ‘Layout’, ‘Image recovery’, ‘Headers and footers’ and ‘Table detection’. You may find that it’s worth trying the different options to see which works best with your particular file.
It’s important to bear in mind that the quality of the OCR conversion will largely depend on the quality of the scanned PDF. If the resolution is low or you have photocopy lines, for example, it will be difficult for the OCR reader to handle accurately.
Once you’ve configured your settings, open your scanned PDF for translation in the Editor (either start a ‘New Project’ or ‘Translate Single Document’). Depending on the complexity of your source document, you can make any necessary minor amends to the recognized source text within Studio. Just ensure that ‘Allow source editing’ is enabled in your Project Settings:
Another useful tip is to preview the target file in Microsoft Word to see if the output is accurate. You can either use the keyboard shortcut Ctrl+Shift+P, or click on the preview icon in the Quick Access Toolbar:
If you need to make adjustments to the formatting or word recognition errors, I would recommend doing so in Word, then save this document and use it as your new source file in Studio. It’s then simple to save the Word file back to PDF format once you have generated your target file.
The OCR technology in this new file type is powered by Solid Documents. This not only makes the character recognition more precise, when compared to our legacy PDF filter, but also extracts text from good-quality images.
The supported languages are currently Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Turkish. We are working on integrating more languages in the future. You can find out more about Solid OCR here.
I hope this feature will enhance the way you’re able to work with PDFs in Studio 2015. Please watch our YouTube below to learn more.