Video localization like clockwork. These points make it easier and better.

Global companies are increasingly working with video content*, which needs to be localized into a variety of languages. To ensure that localization runs smoothly, we would like to share some of our experiences with you:

Keeping an eye on localization while making the video

  1. Make sure you get the raw data

What needs to be considered at the development stage: Video content uses a lot of data. Even short, one-minute movies can be gigabytes in size. While videos are sent in closed formats, such as mp4, raw data is required for editing (which also occurs during translation). If the video is to be used in other markets at a later date, you should secure the rights to the raw data while the video is being made and ensure quick and uncomplicated access to this data. The more accessible and editable all the individual components of the video are, the faster and cheaper localization will be.

  1. Separate audio tracks

Background music, sound effects, and spoken text should each have their own audio track. This makes it possible to remove the spoken word during localization without affecting the other audio elements and to replace it with translated voice-overs.

  1. Third-party plugins

Effects and presets from third-party providers should only be used sparingly. Plugins that are used should be available in a version supported by the manufacturer when the project is delivered. If, for licensing reasons, certain elements of the video cannot be supplied to you by the creator, make note of this with an exact description and the manufacturer’s details so that you can license them later if required. Effects that aren’t really necessary should generally not be included in the data delivered to you.

  1. Securing licenses & rights

As early as the project planning stage, ensure that all the necessary rights (image, speaker, music, effects) are acquired in order to avoid expensive buyouts later on. Ideally, royalty-free music should always be used in order to avoid licensing problems with international use.

Strategies for localizing video components

Once the points about video production have been clarified, it’s time to plan the optimal workflow for localization – including how to handle spoken word, on-screen text, subtitles, etc.

While people speak in a video, they are either visible (original sound) or not visible (off-screen). This often alternates. If there is also a narrator, a further spoken level is created. The spoken content can now be dubbed into the desired foreign languages: using either lip sync (for example, for interviews) or a voiceover by superimposing a voice over the original, which can then be heard in the background.

Speech synthesis can also be a cost-effective and efficient alternative. You can decide on a case-by-case basis whether to use it, but speech synthesis cannot replace a human voice in all its dimensions yet. But first let’s look at the subtitles.

Subtitling and dubbing: Lots of air, lots of space, and scripts

Subtitles need space. Take this into account from the outset so that the image is covered as little as possible. Avoid putting elements essential to the message at the bottom of the screen since they will be obscured if subtitles are added later. The original texts should have as much “play” as possible. If the original language is English, in our experience translations can be up to 30% longer. A lot of languages require many more syllables for a statement, and thus more room, than English. The same applies to titles and captions. On-screen texts should be positioned with plenty of space around it. The subtitles should be visible for as long as possible, otherwise there won’t be enough time to read them.

Play and timing are also important for dubbing: If care is taken during production to ensure that the speakers do not speak too quickly and take breaks in their speech, the result of the localization is much better.

Another must for efficient localization: Ask for the scripts of the spoken texts, including confirmation that they correspond to the texts in the final video version. (Text changes are often made during the studio recordings.) Using the scripts saves time and costs compared to creating new scripts by means of manual transcription or auto-transcription.

And one more hint: Too much text, whether spoken or in the picture, distracts from the picture’s message. Video production is expensive – it’s better to invest in meaningful images and sparse text so that the localized versions are also convincing.

Videos are like a box of chocolates. You never know what you’re going to get.

Videos should be carefully analyzed and the following points clarified before localization:

What is the video’s objective and audience?

Different standards apply to an in-house employee training course than to an elaborate marketing video designed to appeal to prospective customers in their native language. The type of localization should therefore be selected so that it best suits the objective.

Which video components need to be localized?

An apt localization strategy takes all components into consideration. For example, is there enough room for longer subtitles or captions?

Can the concept of the video even be localized?

Let’s take an explanatory video in which the operation of software is demonstrated. If the language of the software interface in the original video is English and the voiceover is also English: no problem. If the whole thing is to be translated into German, the challenge arises of explaining an English-language software interface in German. This will certainly lead to confusion. The English-language video may therefore be more effective in this case than a partially translated version.

Voice recording, speech synthesis, or subtitles? A cost-benefit analysis

Sure, subtitles cost less than voice recordings. But do they also fulfill the intended communication goal?

When subtitles and text overload the picture, the viewer is overwhelmed. Spoken language in combination with on-screen texts, on the other hand, works much better. And human voice also offers emotional added value.

The question therefore always arises as to when synchronization is indispensable. And whether the use of speech synthesis could be a good alternative.

Speech synthesis: A serious alternative

Many voices generated with AI already sound very natural and professional. They are particularly suitable when the voice recordings involve factual, neutral, informative content.

If, on the other hand, emotionality, pronunciation, intonation, and melody are important, then speech synthesis isn’t on equal footing with a real person yet. This is the case in advertising content such as product, brand, or image videos.

This is why specialist content such as multilingual e-learning and technical explanatory videos are the prime beneficiaries of the shorter production times of speech synthesis, which also enables uncomplicated and cost-effective re-recordings for corrections or updates.

The markup language SSML (Speech Synthesis Markup Language) offers a particular advantage: The fine-tuning of individual aspects in combination with trained AI language models. But there are certain limits: In many languages, AI language models are not yet available or there are only a few voices to choose from. Also, the pronunciation of certain words and terms cannot be adapted in every language if the desired pronunciation is too specific or company-specific.

The advantages of speech synthesis

  • Shorter production times
  • Lower costs
  • Simple and cost-effective re-recording, especially for subsequent updates

The disadvantages of speech synthesis

  • Quality not yet equivalent (e.g., emotionality)
  • Limited possibility of adapting pronunciation, intonation, timing
  • Technical limitations often only become apparent during the course of the project

Our conclusion

During the development and production stage, keep the possibility of video localization in mind. Secure access to the raw data and make it available to your language service provider. This reduces processing time and costs. Please also provide details about the intended use.

Thorough consideration of which localization concept is best suited is crucial for the optimal effectiveness of the videos for the desired target markets and user groups.

Whether speech synthesis is the better option depends on the content, the languages required, and the language varieties.

But most importantly: Trust the power of images. A lot of text distracts from what is being shown and can overwhelm the audience. Therefore, use as little text as possible. This makes the videos better and localization into other languages easier.

 

* Statista forecasts that sales in the video marketing market in Germany will grow by 3.17% to € 1.79 billion by 2030. https://de.statista.com/outlook/amo/werbung/tv-video-werbung/digitale-videowerbung/deutschland?currency=EUR