But what if your original is in PDF format? Convert it… Here’s how.
A brief review of the PDF format
PDF stands for Portable Document Format. Originally developed by Adobe® Systems, it enabled users to view documents in electronic format whatever the software, hardware or operating system they were using. It has since become an international standard for sharing documents and information.
The benefits of PDF
- Graphic integrity
A PDF file shows exactly the same content and layout regardless of the operating system, device or software application in which it is viewed. The document thus appears exactly as the creator imagined.
- Independence from the creation software
A PDF document can be read universally, even if the reader does not have the software that was used to create it.
PDF files are easy to create, read and use for everyone.
PDF files provide options for configuring different levels of access to protect the content and the entire document, such as watermarks, passwords, and digital signatures.
The limitations of PDF
- Sometimes difficult to convert
PDF was developed as a format for sharing documents. The initial aim was to preserve and protect document content and layout – regardless of the platform or computer programme in which it was viewed. This is why PDF files are difficult to modify; sometimes, even extracting information from them can prove a real challenge.
- Adapting the method
Not all PDF files are the same when it comes time to work with them. Different types of PDF files require different working methods, for example when searching for information, as opposed to extracting it.
The concept behind PDF-to-editable document converters
OCR and PDF file conversion technology are required to search, extract and reuse information from these files.
What is OCR? Optical Character Recognition (OCR) or text recognition unlocks the information “trapped” in a scanned / photographed image of a document. OCR software “reads” the content of a document (text and structure) by interpreting the images of the characters and assigning them an electronic equivalent, thereby making it possible to convert the document’s content and layout into searchable and modifiable formats.
PDF-to-Word conversion tools
The market leaders are Adobe of course (the inventor of the process) and ABBY. They offer only paid on-line and off-line solutions. See their sites for offers.
At Ubiqus IO, we use the free version of the on-line tool, Smallpdf.
Many other sites exist, though. The free on-line version of SmallPDF allows only two PDF files to be converted per hour. We feel this is entirely sufficient for one-off use.
Free software is also available for download to your workstation. A word of caution, however, as these often install third-party marketing programs.
The limits of PDF-to-Word converters
When working with digitally-created PDF, i.e., “normal” PDF files
These PDF types are created directly from other applications such as word processing software, calculation software, graphic design tools, etc. This type of PDF document contains only contains text and images. In most cases, the converter’s OCR software will be able to very easily re-create the Word document.
This being said, the quality of the obtained document may vary greatly. Even if words are well recognised, layout, spaces or punctuation elements can be altered.
Using this type of document for translation is often risky and requires analysis before use.
When working with “image only” or scanned PDF files
When digitising paper documents on office scanners, or converting an image (jpg, tiff) or screenshot into a PDF, the content is “locked” into an instant image. These “image only” PDF documents contain only the photographed or digitised images of pages, and have no text layer. The use of a converter may be able to reconstitute the text but without guarantee of success.
It will be difficult to use this type of document in a translation context.
Why do I have to convert my PDF into Word on Ubiqus IO?
Ubiqus IO makes it easy for you to initiate translation, 24/7, of any document in Microsoft® Word format.
Why doesn’t this work with PDF?
- An editable document
When converting a PDF file into a Microsoft® Word document, you move from a locked document to one that can be edited. The Word document can be easily modified without any time lost retyping or reformatting it.
- Word count
Our autonomous process requires that an estimate be drawn up instantly by our on-line calculator. Our prices depend on how much work our translators will be putting in, i.e., the number of words. It is much easier to calculate the number of words in a Word document than in a PDF.
- The original layout
The original layout is reproduced accurately, including the images, tables and columns. This will enable our translators to work directly in Word and return you a translated document that will have maintained the formats of the original.
Will converted PDFs work with Ubiqus IO?
Before using a converted PDF on Ubiqus IO, we recommend to save the text from the converter in a text format (using NotePad, for example) and then save it in Word format.
As mentioned above, Word documents from PDF converters have many flaws, when it comes to layout and word sequence.
Thus, very often, tabs or unbreakable spaces are added between words (or even other even crazier signs), which then makes the use of the document on our platform problematic.
our new service of machine translation (no edit, no human intervention) will not be compatible with a Word document obtained from a PDF converter.
How to use PDFs on Ubiqus IO ?
Like any translation agency, Ubiqus IO will prefer to work with contents that are professionally extracted from PDF. This often means sending the PDF to a specialised agency (such as a graphic design agency). In turn, they will provide us with a Word document which we can use to translate directly.
In general, this paid service costs only a few euros per page. Contact us!
A few last helpful reminders
If you are using an on-line platform to convert your PDF to Word, make sure that the resulting document is free of viruses or corruption. Have your Word analysed by your anti-virus software before opening it.
If the content of the PDF is confidential, sensitive or personal, make sure that the website on which you place your document abides by due confidentiality rules.
Conformity to the original
Ensure that the Word document resulting from the conversion is in every way identical to the original.
- It would be a pity to have incorrect or incomplete information translated.
- It is your responsibility as the principal to see to the integrity of the document. It will never be the responsibility of Ubiqus IO or any other service provider to ensure the quality of the converted document.
Pay particular attention with scans and images
If the source is a scan, an image or a photo, chances are that the result will not be conclusive. Also check the conversion carefully before using it.
For the sake of simplicity in this article, Word is used to refer to Microsoft® Word.
Source about PDF https://www.abbyy.com/fr-fr/finereader/what-is-pdf/