Translations are required in a great variety of different contexts, from contract documentation to software user interfaces right through to websites or brochures. And, depending on the situation, source texts also come in a great many different file formats.
What file formats are most suitable and which are least suitable for translation?
To ensure an efficient translation process, it is firstly important to use an editable (“open”) file format. Only these formats can be directly processed in the translation memory tool, and using them prevents time being wasted on file format conversions and the additional work this entails.
In most cases, translating Word or text files causes very few problems. Even XML can be translated very easily, as it is a file format that separates the format from the content and follows a clearly defined syntax. Other file types may make more extensive preparatory work necessary.
The basic rule of thumb is that the source formats or exports from the application in which the source text was produced are the most suitable. This is because the source text is overwritten with the translation when translating in a translation memory tool, and any formatting is therefore retained. This saves on time and cost, and means that the translated document is an exact replica of the source document.
Talk to us about recommended file formats
A client wanted to lighten our workload by copying the entire text of their website into a Word file. However, that was unnecessary. As is the case with most modern Content Management Systems, the content to be translated could quite easily have been exported into a usable format (such as xml) at the click of a mouse – which would also have allowed the translation to be imported directly back in again.
In another case, the text for the user interface of a piece of software was manually transferred into Excel for the translation. Many software resource formats can be imported into translation memory tools and edited without difficulty by using an appropriately configured filter (resx, Java properties, or json, for example). In most cases, therefore, there is no need to copy content from the software files into Word or Excel.
These examples from our past experience demonstrate that it makes sense to agree on suitable file formats for the transfer of source content in advance.
File formats that create extra work
All non-editable file formats create work before the translation itself can begin. They have to be converted into an editable format and may require further editing, as information can be lost during the conversion. For example, a scanned document with text is not a directly editable text file but a non-editable image. The text must be extracted from this image to (for example) Word or Excel so that it can be translated with a translation memory tool.
PDF (which stands for Portable Document Format) is probably the most frequently used non-editable file format. PDFs cannot be edited directly. Extracting the text from a PDF document may require some effort, depending on how the file was created and what the original file format was. Various software programs can be used to convert PDFs into translatable text files. However, formatting such as headings, lists and cross references – sometimes right through to the overall layout of the document – then has to be added or recreated manually post-translation.
This is why we always advise our clients to provide us with the original file and not the PDF version.
Even images can contain text
Brochures, user manuals and handbooks often contain images, and these may include text. In these cases, the text forms part of the image but still has to be translated. It is often helpful to be able to use the “open” image files for the translation (in Photoshop or Illustrator format, for example).
Avoid PDFs and scanned documents. Instead, send us the original files, if possible.
Talk to us before you copy content from the original files into Excel or Word. It may be possible to translate the source format significantly more efficiently, dispensing with time-consuming copying in and out of different file formats.