When you’re using automated translation software, various databases are being used in which previously translated text fragments are stored. Such databases are called translation memory (TM). TMs allow a translator to reuse previously completed translations. In other words, TM stores sentences and their fragments (segments) and corresponding translations.
Translation memories function on the software level. When a TM is used, the source document is divided into segments. The term “segment” is used because parts of texts, e.g. headings, don’t always make complete sentences. A segment is the smallest unit of the text which can be reused when working with TM. Smaller units of text, such as words, aren’t used because they can exist in various contexts and therefore, translated differently – a word-for-word translation doesn’t usually provide worthy results.
Repetitions, complete and incomplete matches
Each translated text’s segment is compared to segments stored in TM. A complete match (100%) is a complete match of a TM segment to the segment to be translated. This means that this segment has previously been met, translated and added to the TM. If there are segments in the TM that resemble translated segments but don’t match them completely, this is a case of an incomplete match. In each case like that, an overlap percentage is determined – 0%-99%. When it’s a 99% match, the difference between segments is a letter or a punctuation sign, and a few words when it’s a 75% match. Usually, matches below 70% are of little use in translation.
Identical segments that appear in the text a few times but aren’t part of TM are called repetitions. Many modern automated translation programs search for possible repetitions prior to the start of translation. The advantage of repetitions is that after the first one is translated, the rest automatically become complete matches. As the translator works, all newly translated sentences are added into TM and can therefore become complete or incomplete matches.
Before the translator begins working on the text, the automated translation program analyses the file for TM. The statistics include total number of words, repetitions, complete and incomplete matches contained in the file. Usually such statistics look like this:
- Complete matches (100%)
- 95-99% matches
- 85-94% matches
- 75-84^ matches
- Unique words (74% or below)
TMs allow to speed up the translation process and achieve a unification of terminology in cases of teamwork. A TM on a certain subject also helps with large projects with a high percentage of repeated terms and grammar constructions. However, this requires a TM prepared in advance – creating a new TM would take up a certain amount of time of work with texts on the subject in question.