OmegaT 4.3.1 - User's Guide

Windows and dialogs

The difference is thin: we call "Window" a window that can stay open next to the editor and interact with it, and "dialog" a window where you must select parameters before closing it and pursuing your translation activity.

Project Properties

This dialog is accessible by selecting ProjectProperties...

Languages

You can either enter the source and target languages by hand or use the drop down menus. Bear in mind that changing the languages may render the currently used translation memories useless since their language pair may not longer match the new languages.

Tokenizers corresponding to the selected languages are displayed.

Options

Enable Sentence-level segmentation

The segmentation settings only address the way the source files are handled by OmegaT. The predominant way of segmenting the sources is the sentence-level segmenting, so this check box should in a normal case be left checked.

In some seldom cases the alternative, i.e. segmenting by paragraphs, may be preferred. Changing this flag does not modify the segmentation of already existing translation memories. If you decide mid-translation to switch from sentence to paragraph translation, the internal translation memory of the project will not be changed (OmegaT may upgrade old translation memories that did not use sentence segmentation, but not vice versa), but OmegaT will attempt to create paragraph fuzzy matches by gluing together existing sentence translations.

Changing segmentation settings may cause some already translated segments to be split or merged. This will effectively return them to the "untranslated" status, as they will no longer match segments recorded in the project memory, even though their original translation is still there.

Auto-propagation of Translations

In case there are non-unique segments in source documents, the Auto-propagation check box offers the user the following two possibilities as regards automatic translation: if checked, the first translated segment will be assumed as the default translation and its target text will be automatically used for later hits during the translation process. Mistranslated segments can of course be corrected later manually using Create Alternative Translation. If the Auto-propagation check box is not checked, the segments with alternative translations are left untranslated until the user has decided which translation is to be used.

Remove Tags

When enabled, all the formatting tags are removed from source segments. This is especially useful when dealing with texts where inline formatting is not really useful (e.g., OCRed PDF, bad converted .odt or .docx, etc.) In a normal case it should always be possible to open the target documents, as only inline tags are removed. Non-visible formatting (i.e., which doesn't appear as tags in the OmegaT editor) is retained in target documents.

External Post-processing Command

This area allows entering an external post-processing command (for instance, a script to rename files) that will be applied each time Create Translated Documents is used. This external command cannot include "pipes", etc., which is why calling a script is recommended.

Segmentation...

The segmentation rules are generally valid across all the projects. The user, however, may need to generate a set of rules, specific to the project in question. Use this button to open a dialog, activate the check box Project specific segmentation rules, then proceed to adjust the segmentation rules as desired. The new set of rules will be stored together with the project and will not interfere with the general set of segmentation rules. To delete project specific segmentation rules, uncheck the check box. See Segmentation Setup preferences for more information on segmentation rules.

Hint: the set of segmentation rules for a given project is stored as project/omegat/segmentation.conf.

File Filters...

In a similar fashion as above the user can create project-specific File filters, which will be stored together with the project and will be valid for the current project only. To create a project-specific set of file filters, click on the File filter ... button, then activate Enable project specific filters check box in the window that opens. A copy of the changed filters configuration will be stored with the project. To delete project specific file filters, uncheck the check box. Note that in the menu Options->File Filters, the global user filters are changed, not the project filters. See File Filters preferences for more on the subject.

Hint: the set of file filters for a given project is stored as project/omegat/filters.xml.

Repository Mapping...

When working on a team project, this window allows you to define the mapping between remote folders and local folders (see examples here).

External Search...

Defines the project-specific external search resources.

File locations

Here you can select different subfolders, for instance the subfolder with source files, subfolder for target files etc. If you enter names of folders that do not exist yet, OmegaT creates them for you. In case you decide to modify project folders, keep in mind that this will not move existing files from old folders to the new location.

Clic on Exclusions... to define the files or folders that will be ignored by OmegaT. An ignored file or folder:

  • is not displayed in the Editor pane,

  • is not taken into account in statistics,

  • is not copied in target folder during the translated files creation process.

In the Exclusion patterns dialog, it is possible to Add or Remove a pattern, or edit one by selecting a line and pressing F2. It is possible to use wildcards, using the ant syntax.

Project Files

This window is displayed automatically when OmegaT loads a project, and at any time by pressing ProjectProject Files....

Note: it is possible to inhibit the window displaying at opening, by setting project_files_show_on_load to false in omegat.prefs file (accessible by OptionsAccess Configuration Folder menu).

Use Ctrl + L to open and Esc to close it. The Project Files Window displays the following information:

  • the total number of translatable files in the project. These are the files present in the source folder in a format that OmegaT is able to recognize. This number is displayed in brackets, next to the "Project file" title

  • the list of all translatable files in the project. Clicking on any file will open it for translation.

    Typing any text will open a Filter field where parts of filenames can be entered. You can select a file with Up and Down keys, and open it for translation by pressing Enter

    Note: filenames (in first column) can be sorted alphabetically by clicking in the header. It also possible to change the position of a filename, by clicking on it and pressing Move ... buttons.

    Right-clicking on a filename opens a popup that allows to open the source file and (if it exists) the target file.

  • File entries include their names, file filter types, their encoding and the number of segments each file contains

  • the total number of segments, the number of unique segments in the whole project, and the number of unique segments already translated are shown at the bottom

The set of Unique segments is computed by taking all the segments and removing all duplicate segments. (The definition of “unique” is case-sensitive: "Run" and "run" are treated as being different)

The difference between "Number of segments" and "Number of unique segments" provides an approximate idea of the number of repetitions in the text. Note however that the numbers do not indicate how relevant the repetitions are: they could mean relatively long sentences repeated a number of times (in which case you are fortunate) or it could describe a table of keywords (not so fortunate). The project_stats.txt located in the omegat folder of your project contains more detailed segment information, broken down by file.

Modifying the segmentation rules may have the effect of modifying the number of segments/unique segments. This, however, should generally be avoided once you have started translating the project. See the chapter Segmentation rules for more information.

Adding files to the project: You can add source files to the project by clicking on the Import Source Files... button. This copies the selected files to the source folder and reloads the project to import the new files. You can also add source files from Internet pages, written in MediaWiki, by clicking on Import from MediaWiki button and providing the corresponding URL.

Text Search

Open the Search window with Ctrl + F and enter the word or phrase you wish to search for in the Search for box.

Alternatively, you can select a word or phrase in the Editor pane, Fuzzy matches pane or Glossary pane and hit Ctrl + F . The word or phrase is entered in the Search for box automatically. You can have several Search windows open at the same time, but close them when they are no longer needed so that they do not clutter your desktop.

Click the dropdown arrow of the Search for box to access the last 10 searches.

The Search window has its own menus:

  • File > Search for selection ( Ctrl + F ): refocus on the search field and select all its contents.

  • File > Close ( Ctrl + W ): close the search window (in the same way as Esc )

  • Edit > Insert source ( Ctrl + Shift + I ): insert current segment source.

  • Edit > Replace with source ( Ctrl + Shift + R ): replace with current segment source.

  • Edit > Create Glossary Entry ( Ctrl + Shift + G ): add a new glossary item.

Using wild cards

In both exact and keyword searches, the wild card search characters '*' and '?' can be used. They have the meaning, familiar to Word users:

  • '*' matches zero or more characters, from the current position in a given word to its end. The search term 'run*' for example would match words 'run', 'runs' and 'running'.

  • '?' matches any single character. For instance, 'run?' would match the word 'runs' and 'runn' in the word 'running'.

The matches will be displayed in bold blue. Note that '*' and '?' have special meaning in regular expressions, so wild card search, as described here, applies to exact and keyword search only (see below).

Search methods and options

Select the method using the radio buttons. The following search methods are available:

exact search

Search for segments containing the exact string you specified. An exact search looks for a phrase, i.e. if several words are entered, they are found only if they occur in exactly that sequence. Searching for open file will thus find all occurrences of the string open file , but not file opened or open input file .

keyword search

Search for segments containing all keywords you specified, in any order. Select keyword search to search for any number of individual full words, in any order. OmegaT displays a list of all segments containing all of the words specified. Keyword searches are similar to a search "with all of the words" in an Internet search engine such as Google (AND logic). Using keyword search with open file will thus find all occurrences of the string open file, as well as file opened, open input file, file may not be safe to open , etc.

regular expressions

The search string will be treated as a regular expression. The search string - [a-zA-Z]+[öäüqwß] - in the example above for instance looks for words in the target segment, containing questionable characters from German keyboard. Regular expressions are a powerful way to look for instances of a string.

Additionally to one of the methods above you can select the following:

  • case sensitive: the search will be performed for the exact string specified; i.e. capitalization is observed.

  • Space matches nbsp: when this option is checked, a space character put in search entry can match either a normal space character or a non-breacking space (\u00A) character.

  • in source: search in the source segments

  • in translations: search in the target segments

  • in notes: search in notes to segments

  • in comments: search in comments to segments

  • Translated or untranslated: search in both translated and untranslated segments.

  • Translated: search only in translated segments.

  • Untranslated: search only in untranslated segments.

  • Display: all matching segments: if checked, all the segments are displayed one by one, even if they are present several times in the same document or in different documents.

  • Display: file names: if checked, the name of the file where each segment is found is displayed above each result.

  • Search in Project: check Memory to include the project memory (project_save.tmx file) in the search. Check TMs to include the translation memories located in the tm folder in the search. Check Glossaries to include the glossaries located in the glossary folder in the search.

  • Search in Files: search in a single file or a folder containing a set of files. When searching through files (as opposed to translation memories), OmegaT restricts the search to files in source file formats. Consequently, although OmegaT is quite able to handle tmx files, it does not include them in the Search files search.

If you click on the button Advanced options additional criteria (author or changer of the translation, date translated, excluding orphan segments, etc) can be selected. When Full/Half width char insensitive option is checked, searches for fullwidth forms (CJK characters) will match halfwidth forms and vice versa.

Search results display

Pressing the search button after entering a string in the search field displays all the segments in the project that include the entered string. As OmegaT handles identical segments as one single entity, only the first unique segment is shown. The segments are displayed in order of appearance in the project. Translated segments are displayed with the original text at the top and the translated text at the bottom, untranslated segments are displayed as the source only.

Double-clicking on a segment opens it in the Editor for modifications (one single click does it when Auto-sync with Editor option is checked). You can then switch back to the Search window for the next segment found, for instance to check and, if necessary, correct the terminology.

In the Search window, you can use standard shortcuts ( Ctrl + N , Ctrl + P ) to move from one segment to another.

You may have several Search windows open at the same time. You can quickly see their contents by looking at their title: it will contain the search term used.

Filter entries in editor according to search

For easier navigation in the search result set, you can apply the search to the editor. Press the Filter button on the bottom to limit the shown entries in the editor window to those that match the current search. You can use normal navigation to go to e.g. the next (untranslated) segment that matches the search criteria.

NB:

  • A search may be limited to 1000 items, so if you search on a common phrase, the editor then shows only those 1000 matching entries, and not all entries that match the search criteria.

  • A file might have no matching entries, so it will show empty.

  • If a search removes duplicates, those duplicates will not be in the Editor.

To remove a filter, press the Remove filter button, or reload a project.

Text Replace

Open the Search and replace window with Ctrl + K and enter the word or expression you wish to replace in the Search for box.

Then click on Search button to display all the corresponding occurrences.

Enter the new word or phrase (regular expressions are not supported) in the Replace with box, then click one of the following options:

  • Replace All: operates replacement of all occurrences (after displaying a confirmation window where the number of occurrences is shown).

  • Replace: operates a "one by one" replacement, by the mean of buttons in the header of the Editor pane. Click Replace Next or Skip, then end the replacement session with Finish.

  • Close: close the window without any change.

Search options

Search options are similar to the ones displayed in Search window.

Except one: check Untranslated in order to operate Search and replace also on segments that have not been translated yet.

To make it possible (although Search and replace operates only on memory), OmegaT will copy the source segment to the target segment before the replace operation occurs. If no replacement is done to a given segment, the target segment will be “emptied”, i.e., it will remain untranslated.

Aligner

Alignment involves creating a bilingual translation memory from monolingual documents that have already been translated.

To access this window, select ToolsAlign Files....

Step 1: Adjust the alignment parameters

If the alignment looks as if it could be improved, try changing the parameters. In most cases, the lower the Average score is, better the alignment will be.

In Heapwise comparison mode, the texts are evaluated globally. In Parsewise comparison mode, they are evaluated segment by segment. The option only appears when such a selection is possible.

Use the ID comparison mode to align Key=Value texts. This works even if the keys are not in the same order in the two files, and/or if the two files do not contain the same amount of information. This option only appears if both the selected files are recognised as Key=Value files.

The Viterbi and Forward-Backward algorithms are two different calculation methods. Choose the one that provides the best results.

Click Continue to access the next step.

Step 2: Make manual corrections

After the automatic process, the alignment of two files generally requires manual corrections.

Translation units are located in cells in the two last columns.

To align two segments on the same line:

  1. Select the first segment.

  2. Press the space bar (shortcut for EditStart Pinpoint Align).

  3. Click in the other column on the translation corresponding to the first segment.

After several of these operations, select EditRealign pending to update the alignment of the other segments.

To modify the position of one or more segments individually, select the segment(s) and press U (Move Up) or D (Move Down).

Only rows with the Keep box ticked in the first column will be included when the translation memory is created.

When the two columns are sufficiently aligned, click Save TMX... to create the resulting translation memory.

Scripts

This window is accessible by selecting ToolsScripting...

Use

The Scripting window allows you to load an existing script into the text area and run it against the current opened project. To customize the scripting feature, do the following:

  • Load a script into the editor by clicking its name in the list in the left-hand panel.

  • Right-click a button from <1> to <12> in the bottom panel and select Add Script.

  • When you left-click the number, the selected script will run. You can also start the selected macros from the main menu by using their entries in the Tools menu or by pressing Ctrl + Alt + F# (# 1 to 12).

By default, scripts are stored in the scripts folder located in the OmegaT installation folder (the folder that contains the OmegaT.jar file).

If you add new scripts there, they will appear in the list of available scripts in the Scripting window.

Some additional scripts can be found here: OmegaT Scripts

Scripting languages

The following scripting languages have been implemented:

  • Groovy (http://groovy.codehaus.org): a dynamic language for the Java Virtual machine. It builds upon the strengths of Java but has additional power features inspired by languages like Python, Ruby and Smalltalk.

  • JavaScript (sometimes abbreviated JS, not to be confused with Java): a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles. Being the language behind popular software such as Firefox, it is a familiar and preferred programming tool in the open-source domain.

All the languages have access to the OmegaT object model, with the project as the top object. For example, the following Groovy code snippet scans through all the segments in all the files in the current project and, if a translation exists, prints out the source and the target of the segment:

    files = project.projectFiles;
    for (i in 0 ..< files.size())
    {
        for (j in 0 ..< files[i].entries.size())
        {
            currSegment = files[i].entries[j];
            if (project.getTranslationInfo(currSegment))
            {
                source = currSegment.getSrcText();
                target = project.getTranslationInfo(currSegment).translation;
                console.println(source + " >>>> " + target);
            }     
        }
    }

General Preferences

This dialog is accessible by selecting OptionsPreferences...

It allows parameters to be set for all translation projects.

General

Use TAB to Advance

Sets the segment validation key to Tab instead of the default Enter. This option is useful for some Chinese, Japanese or Korean character input systems.

Always Confirm Quit

The program will seek confirmation before closing down.

Machine Translation

Automatically fetch translations

For confidentiality reasons, you may want to not send all the segments to the Machine Translation engine. If you uncheck this option, machine translations will be fetched only when you press Ctrl + M ( Cmd + M on OS X) in the current segment. You must then press Ctrl + M again to insert the suggestion.

Untranslated segments only

Check this box to send only untranslated segments to the machine translation services.

Select a supplier from the list and, if necessary, click Configure to enter the identification details provided by the supplier.

The procedures for configuring access to the Microsoft Translator and Google Translate services are described here.

Glossary

Display context description for TBX glossaries

Uncheck this option if the context description shown for each glossary entry is unnecessary or too long.

Use terms appearing separately in the source text

When this option is checked, the glossary will display pairs or groups of words (expressions) even if the words within them appear separately in the source text.

Uncheck this option if the glossary displays too many false positives.

Use stemming for glossary entries

Select this option if you want the glossary to display words that have the same root.

Replace glossary hits when inserting source text

If both this option and the Insert the source text option are selected, all words with a corresponding glossary entry will be translated automatically when the source text is inserted.

Ignore hits with very different case (e.g. FOO vs foo)

If this option is checked, the glossary will only display one entry, even if the same word exists in several forms (e.g. with and without capital letters) in the glossary.

Glossary layout

You can select a glossary pane contents layout. Layout variations can be added as plugins.

Merge alternate definition for the same term

If a glossary item has alternate definitions, they will be displayed on the same line.

TaaS Terminology

Get API Key

Click this button to access the TaaS project site and create a user account.

You can then create an access key on the page https://term.tilde.com/account/keys/create?system=omegaT.

Store for this session only

If this option is selected, OmegaT will not remember the access key between sessions.

Browse TaaS Collections...

This button enables you to browse and download the collections that exist for the project's source and target languages. Private collections are displayed in bold. The collections are downloaded as TBX glossaries and stored in the current glossary folder.

Select TaaS Terminology Lookup Domain...

If necessary, you can select a specific domain to limit the volume of data sent and received.

Dictionary

Automatically search segment text in dictionary

Clear this option to deactivate automatic searching – if dictionaries are too long, for example.

Use fuzzy matching for dictionary entries

Select this option if you want dictionaries to display words that have the same root.

Appearance

Theme

You can select a theme for OmegaT's user interface. Theme can also be added as plugins.

Restore Main Window

Restores the components of the main OmegaT window to their default state. Use this feature when you have undocked, moved or hidden one or more components and you are unable to restore the desired arrangement. It can also be used when panes do not appear as expected following an OmegaT upgrade.

Font

Shows the dialog to modify the text display font. Users of old computers who feel window resizing is very slow can try changing the font. See font settings in Miscellanea

Colours

This page allows you to choose different colours for each part of the user interface.

Pre-defined themes can be set using scripts. A script bundled with OmegaT called Switch Colour Themes provides a default "Dark" theme.

File Filters

This dialog lists the file filters available. The filters used by the current project are displayed in bold. If you prefer not to use OmegaT to translate files of a certain type, you can turn off the corresponding filter by deactivating the check box beside its name. OmegaT will then omit the corresponding files when loading projects, and will copy them unmodified to the target folder when creating target documents. When you wish to use the filter again, just tick the check box. Click Defaults to reset the file filters to the default settings. To edit the files and encodings a filter is used for, select the filter in the list and click Edit.

The dialog allows you to enable or disable the following options:

  • Remove leading and trailing tags: uncheck this option to display all the tags, including tags at the beginning and end of the segment. Warning: in Microsoft Open XML formats (docx, xlsx, etc.), if all tags are displayed, DO NOT place any text before the first tag – it is a technical tag that must always begin the segment.

  • Remove leading and trailing whitespace in non-segmented projects: by default, OmegaT removes leading and trailing whitespace. In non-segmented projects, it is possible to keep it by unchecking this option.

  • Preserve spaces for all tags: check this option if the source documents contain significant spaces used to control the layout that must not be ignored.

  • Ignore file context when identifying segments with alternate translations: by default, OmegaT uses the source file name as part of the identification of an alternative translation. If the option is checked, the source file name will not be used, and alternative translations will take effect in any file as long as the other context (previous/next segments or some sort of segment identifier, depending on the file format) matches.

Filter options

Several filters (text files, XHTML files, HTML and XHTML files, OpenDocument files and Microsoft Open XML files) have one or more specific options. To modify the options, select the filter in the list and click Options.... The available options are:

Text files

  • Segment source text into paragraphs on:

    if sentence segmentation rules are active, the text will be segmented further according to the option selected here.

PO files

  • Allow blank translations in the target file:

    If selected, when a segment in a PO file (which may be a whole paragraph) is not translated, the translation will be empty in the target file. Technically speaking, the msgstr segment in the PO target file, if created, will be left empty. As this is the standard behaviour for PO files, it is selected by default. If the option is off, the source text will be copied to the target segment.

  • Skip PO header

    The PO header will be skipped and left unchanged if this option is checked.

  • Auto replace 'nplurals=INTEGER; plural=EXPRESSION;' in header

    This option allows OmegaT to override the specification in the PO file header and use the default for the selected target language.

XHTML Files

  • Translate the following attributes: the selected attributes will appear as segments in the Editor window.

  • Start a new paragraph on: the <br> HTML tag will constitute a paragraph break for segmentation purposes.

  • Skip text matching regular expression: any text matching the regular expression is skipped. It is shown in red in the tag validator. Text in source segments that matches is shown in italics.

  • Do not translate the content attribute of meta-tags ...: the meta-tags in the box will not be translated.

  • Do not translate the content of tags with the following attribute key-value pairs (separate with commas): if a tag matches the list of key-value pairs, its content will be ignored.

    It is sometimes useful to be able make certain tags untranslatable based on the values of their attributes. For example, <div class="hide"> <span translate="no">. You can define key-value pairs for tags to be left untranslated. For the example above, the field would contain: class=hide, translate=no .

Microsoft Office Open XML files

You can select which elements are to be translated. They will appear as separate segments in the translation.

  • Word: non-visible instruction text, comments, footnotes, endnotes, footers

  • Excel: comments, sheet names

  • Power Point : slide comments, slide masters, slide layouts

  • Global: charts, diagrams, drawings, WordArt

  • Other Options:

    • Aggregate tags: if checked, tags with no translatable text between them will be aggregated into a single tag.

    • Preserve spaces for all tags: if checked, "white space" (i.e. spaces and newlines) will be preserved, even if this option is not defined in the document.

HTML and XHTML files

  • Add or rewrite encoding declaration in HTML and XHTML files: the target files often need to have a different character set encoding from the one in the source file (whether it is explicitly defined or implied). Using this option, the translator can specify whether the target files should have the encoding declaration included. For instance, if the file filter specifies UTF8 as the encoding scheme for the target files, selecting Always will ensure that this information is included in the translated files.

  • Translate the following attributes: the selected attributes will appear as segments in the Editor window.

  • Start a new paragraph on: the <br> HTML tag will constitute a paragraph break for segmentation purposes.

  • Skip text matching regular expression: any text matching the regular expression is skipped. It is shown in red in the tag validator. Text in source segments that matches is shown in italics.

  • Do not translate the content attribute of meta-tags ...: The meta-tags in the box will not be translated.

  • Do not translate the content of tags with the following attribute key-value pairs (separate with commas): if a tag matches the list of key-value pairs, its content will be ignored.

    It is sometimes useful to be able make certain tags untranslatable based on the values of their attributes. For example, <div class="hide"> <span translate="no">. You can define key-value pairs for tags to be left untranslated. For the example above, the field would contain: class=hide, translate=no .

  • Compress whitespace in translated document: multiple continuous whitespace characters will be converted into one single whitespace in the translated document.

  • Remove HTML comments in translated document: commented parts (between <!-- and -->) will not be copied into the translated document.

Open Document Format (ODF) files

  • You can select which of the following items are to be translated:

    index entries, bookmarks, bookmark references, notes, comments, presentation notes, links (URL), sheet names

Edit filter dialog

This dialog enables you to specify the source filename patterns for files to be processed by the filter, customize the filenames of translated files and select which encodings should be used for loading the source file and saving the translation. To modify a file filter pattern, either modify the fields directly or click Edit. To add a new file filter pattern, click Add. The same dialog is used to add a pattern or to edit a particular pattern. The dialog includes a special target filename pattern editor, which you can use to customize the names of output files.

Source file type, filename pattern

When OmegaT encounters a file in its source folder, it attempts to select the filter based upon the file's extension. More precisely, OmegaT attempts to match each filter's source filename patterns against the filename. For example, the pattern *.xhtml matches any file with the .xhtml extension. If the appropriate filter is found, the file is assigned to it for processing. For example, by default, the XHTML filter is used to process files with the .xhtml extension. You can change or add filename patterns for files to be handled by each file filter. Source filename patterns use wild card characters similar to those used in Searches . The '*' character matches zero or more characters. The '?' character matches exactly one character. All other characters represent themselves. For example, if you wish the text filter to handle readme files (readme, read.me, and readme.txt) you should use the pattern read*.

Source and Translated file encoding

Only a limited number of file formats specify a mandatory encoding. File formats that do not specify their encoding will use the encoding you set up for the extension that matches their name. For example, by default .txt files will be loaded using the default encoding of your operating system. You can change the source encoding for each different source filename pattern. Target files can also be written in any encoding. By default, the translated file encoding is the same as the source file encoding. The source and target encoding fields use drop-down menus containing all the supported encodings. <auto> leaves the choice of encoding to OmegaT. This is how it works:

  • OmegaT identifies the source file encoding by using its encoding declaration, if present (HTML files, XML based files).

  • OmegaT is instructed to use a mandatory encoding for certain file formats (Java properties etc).

  • OmegaT uses the default encoding of the operating system for text files.

Translated filename

Sometimes you may wish to rename the files you translate automatically, for example adding a language code after the file name. The target filename pattern uses a special syntax, so if you want to edit this field, you must click Edit... and use the Edit Pattern Dialog. If you want to revert to the filter's default configuration, click Defaults. You can also modify the name directly in the target filename pattern field of the file filters dialog. The Edit Pattern Dialog offers among others the following options:

  • Default is ${filename}– full filename of the source file with extension: in this case the name of the translated file is the same as that of the source file.

  • ${nameOnly}– allows you to insert only the name of the source file without the extension.

  • ${extension} - the original file extension

  • ${targetLocale}– target locale code (of a form "xx_YY").

  • ${targetLanguage}– the target language and country code together (of a form "XX-YY").

  • ${targetLanguageCode} – the target language - only "XX"

  • ${targetCountryCode}– the target country - only "YY"

  • ${timestamp-????} – system date time at generation time in various patterns

    See Oracle documentation for examples of the "SimpleDateFormat" patterns

  • ${system-os-name} - operating system of the computer used

  • ${system-user-name} - system user name

  • ${system-host-name} - system host name

  • ${file-source-encoding} - source file encoding

  • ${file-target-encoding} - target file encoding

  • ${targetLocaleLCID} - Microsoft target locale

Additional variants are available for variables ${nameOnly} and ${Extension}. In case the file name has ambivalent name, one can apply variables of the form ${name only -extension number } and ${extension- extension number} . If for example the original file is named Document.xx.docx, the following variables will give the following results:

  • ${nameOnly-0} Document

  • ${nameOnly-1} Document.xx

  • ${nameOnly-2} Document.xx.docx

  • ${extension-0} docx

  • ${extension-1} xx.docx

  • ${extension-2} Document.xx.docx

Segmentation Setup

Translation memory tools work with textual units called segments. OmegaT has two ways to segment a text: by paragraph or by sentence segmentation (also referred to as “rule-based segmentation”). In order to select the type of segmentation, select ProjectProperties... from the main menu and tick or untick the check box provided. Paragraph segmentation is advantageous in certain cases, such as highly creative or stylistic translations in which the translator may wish to change the order of entire sentences; for the majority of projects, however, sentence segmentation is a choice to be preferred, since it delivers better matches from previous translations. If sentence segmentation has been selected, you can setup the rules by selecting OptionsSegmentation...from the main menu.

Dependable segmentation rules are already available for many languages, so it is likely that you will not need to get involved with writing your own segmentation rules. On the other hand this functionality can be very useful in special cases, where you can increase your productivity by tuning the segmentation rules to the text to be translated.

Warning: because the text will segment differently after filter options have been changed, so you may have to start translating from scratch. At the same time the previous valid segments in the project translation memory will turn into orphan segments. If you change segmentation options when a project is open, you must reload the project in order for the changes to take effect.

OmegaT uses the following sequence of steps:

Structure level segmentation

OmegaT first parses the text for structure-level segmentation. During this process it is only the structure of the source file that is used to produce segments.

For example, text files may be segmented on line breaks, empty lines, or not be segmented at all. Files containing formatting (ODF documents, HTML documents, etc.) are segmented on the block-level (paragraph) tags. Translatable object attributes in XHTML or HTML files can be extracted as separate segments.

Sentence level segmentation

After segmenting the source file into structural units, OmegaT will segment these blocks further into sentences.

Segmentation rules

The process of segmenting can be pictured as follows: the cursor moves along the text, one character at a time. At each cursor position rules, consisting of a Before and After pattern, are applied in their given order to see if any of the Before patterns are valid for the text on the left and the corresponding After pattern for the text on the right of the cursor. If the rule matches, either the cursor moves on without inserting a segment break (for an exception rule) or a new segment break is created at the current cursor position (for the break rule).

The two types of rules behave as follows:

Break rule

Separates the source text into segments. For example, " Did it make sense? I was not sure ." should be split into two segments. For this to happen, there should be a break rule for "?", when followed by spaces and a capitalized word. To define a rule as a break rule, tick the Break/Exception check box.

Exception rule

specify what parts of text should NOT be separated. In spite of the period, "Mrs. Dalloway " should not be split in two segments, so an exception rule should be established for Mrs (and for Mr, for Dr, for prof etc), followed by a period. To define a rule as an exception rule, leave the Break/Exception check box unticked.

The predefined break rules should be sufficient for most European languages and Japanese. In view of the flexibility, you may consider defining more exception rules for your source language in order to provide more meaningful and coherent segments.

Rule priority

All segmentation rule sets for a matching language pattern are active and are applied in the given order of priority, so rules for specific language should be higher than default ones. For example, rules for Canadian French (FR-CA) should be set higher than rules for French (FR.*), and higher than Default (.*) ones. Thus, when translating from Canadian French the rules for Canadian French - if any - will be applied first, followed by the rules for French and lastly, by the Default rules.

Creating a new rule

Major changes to the segmentation rules should be generally avoided, especially after completion of the first draft, but minor changes, such as the addition of a recognized abbreviation, can be advantageous.

In order to edit or expand an existing set of rules, simply click on it in the top table. The rules for that set will appear in the bottom half of the window.

In order to create an empty set of rules for a new language pattern click Add in the upper half of the dialog. An empty line will appear at the bottom of the upper table (you may have to scroll down to see it). Change the name of the rule set and the language pattern to the language concerned and its code. The syntax of the language pattern conforms to regular expression syntax. If your set of rules handles a language-country pair, we advise you to move it to the top using the Move Up button.

Add the Before and After patterns. To check their syntax and their applicability, it is advisable to use tools which allow you to see their effect directly. See Regular expressions. A good starting point will always be the existing rules.

A few simple examples

Intention Before After Note
Set the segment start after a period ('.') followed by a space, tab ... \. \s "\." stands for the period character. "\s" means any white space character (space, tab, new page etc.)
Do not segment after Mr. Mr\. \s This an exception rule, so the rule check box must not be ticked
Set a segment after "。" (Japanese period)   Note that after is empty
Do not segment after M. Mr. Mrs. and Ms. Mr??s??\. \s Exception rule - see the use of ? in regular expressions

Auto-Completion

Click on Glossary... to configure the Auto-completer Glossary View.

Click on Auto-text... to configure Auto-text options and to add or remove entries.

Click on Character Table... to set the Character table auto-completer options.

Auto-completer is launched within the target segment via Ctrl + Space shortcut.

If Show Relevant Suggestions Automatically option is checked, Auto-completer is launched automatically by typing the first letter of a translated glossary entry, or by typing "<" in case of tags.

Spellchecker

OmegaT has a built-in spell checker based on the spelling checker used in Apache OpenOffice, LibreOffice, Firefox and Thunderbird. It is consequently able to use the huge range of free spelling dictionaries available for these applications.

LanguageTool plug-in

Service type

Select the location of the language checker.

Using a different language checker on your local machine than the one supplied with OmegaT gives you the option of personalising the verification rules.

Rules

Check or uncheck the rules depending on whether they are relevant to the type of text you are translating.

External Search

Enable project-specific commands

By default, OmegaT does not execute the commands specified in the project-specific settings (the finder.xml file in the omegat folder), because they may have a critical impact on the machine's security.

Only activate this option if you know what you are doing, and only for projects from trusted sources.

Context Menu Priority:

Enables you to change the order of the commands in the context menu (the right-click menu). Values around 100 display commands at the top, and values around 900 display them at the bottom.

You will need to restart OmegaT for this change to take effect.

Editor

Insert the source text

You can have the source text inserted automatically into the editing field. This is useful for texts containing many trade marks or other proper nouns you which must be left unchanged.

Leave the segment empty

OmegaT leaves the editing field blank. This option allows you to enter the translation without the need to remove the source text, thus saving you two keystrokes ( Ctrl + A and Del ). Empty translations are now allowed. They are displayed as <EMPTY> in the Editor. To create one, right-click in a segment, and select Set empty translation. The entry Remove translation in the same pop up menu also allows to delete the existing translation of the current segment. You achieve the same by clearing the target segment and pressing Enter.

Insert the best fuzzy match

OmegaT inserts the translation of the string most similar to the current source, if it is above the similarity threshold that you have selected in this dialog. The prefix (per default empty) can be used to tag translations, done via fuzzy matches. If you add a prefix (for instance [fuzzy]), you can trace those translations later to see they are correct.

The check boxes in the lower half of the dialog window serve the following purpose:

Attempt to convert numbers when inserting a fuzzy match

If this option is checked, when a fuzzy match is inserted, either manually or automatically, OmegaT attempts to convert the numbers in the fuzzy matches according to the source contents. There are a number of restrictions:

  • The source segment and the fuzzy matches must contain the same list of numbers

  • The numbers must be exactly the same between the source and the target matches.

  • Only integers and simple floats (using the period as a decimal character, e.g. 5.4, but not 5,4 or 54E-01) are considered.

Allow translation to be equal to source

Documents for translation may contain trade marks, names or other proper nouns that will be the same in translated documents. There are two strategies for segments that contain only such invariable text.

You can decide not to translate such segments at all. OmegaT will then report these segments as not translated. This is the default. The alternative is to enter a translation that is identical to the source text. OmegaT is able to recognize that you have done this. To make this possible, select this option.

Export the segment to text files

The text export function exports data from within the current OmegaT project to plain text files. The data are exported when the segment is opened. The files appear in the script subfolder in the OmegaT user files folder, and include:

  • The content of the segment source text (source.txt).

  • The content of the segment target text (target.txt).

  • The text highlighted by the user, when Ctrl + Shift + C is pressed or EditExport Selection is selected (selection.txt).

The content of the files is overwritten either when a new segment is opened (source.txt and target.txt) or when a new selection is exported (selection.txt). The files are unformatted plain text files. The whole process can be steered and controlled via Tck/Tcl-based scripting. See Using the OmegaT text export function for specifics, examples and suggestions.

Go To Next Untranslated Segment stops where there is at least one alternative translation

If we want to avoid any mis-translations in case of segments with several possible target contents, checking this check box will cause Go To Next Untranslated Segment to stop on the next such segment, irrespective of whether it has already been translated or not.

Allow tag editing

Uncheck this option to prevent any damage on the tags (i.e., partial deletion) during editing. Removing an entire tag remains possible in that case, by using Ctrl+Backspace/Delete or by selecting it completely (Ctrl+Shift+Left/Right) then deleting it (Delete or Ctrl+X).

Validate tags when leaving a segment

Check this option to be warned about differences between source and target segments tags each time you leave a segment.

Save auto-populated status

Check this option to record in the project_save.tmx file the information that a segment has been auto-populated, so it can be displayed with a specific color in the Editor (if the "Mark Auto-Populated Segments" option, in the View menu, is checked).

Initially load this many segments

By default the editor displays 2,000 of initial segments, and progressively loads more as you scroll up or down. If you have a powerful machine, and/or if you don't like how the scrollbar behaves during progressive loading, you can increase this number.

Tag Processing

When translating software-related files, you can configure the Tag Validator options to also check programming (%...) variables or placeholders ({0}), if the file filter doesn't do it out of the box already. The PO filter already handles %.. and Java™ Resource Bundle filter already handles {#} tags, so you only need this for other file types.

You can also define various options relating to tag validation and define custom tags.

For example, if you enter \d+ into the Regular expression for custom tags field, all numbers will be considered as tags, enabling you to check that numbers have not been changed by mistake during translation.

Similarly, enter <.*?> to make sure that HTML tags (for example) entered into the source text are preserved without modification in the translation.

Note: these two instructions can be combined by writing (<.*?>)|(\d+) .

Team

Enter your name here and it will be attached to all segments translated by you.

Repository Credentials

List of projects for which login details are stored in OmegaT. Remove a project from this list if you want OmegaT to ask you for a login and a password every time you access the project.

TM Matches

Sort fuzzy matches by:

By default, the closest matches displayed in the Fuzzy Matches pane are determined using stemming.

To obtain more literal matches closer to 100%, select the Full text, including tags and numbers option.

Displaying tags in non-OmegaT TMXs

Decide how tags in foreign TMX files (i.e. not generated by OmegaT) are to be treated.

Match display template

Change how fuzzy matches are displayed, through the use of pre-configured variables:

Table 1. Match pane setup
${id} Number of the match from 1 to 5
${sourceText} Source text of the match
${targetText} Target text of the match
${diff} String showing the differences between the source and the match. Hint: use this if the text you are translating has been updated.
${diffReversed} Same as ${diff}, but with the differences (what is to be inserted and deleted) inverted.
${score} Percentage calculated with Stemming, no tags and no numbers option.
${noStemScore} Percentage calculated with No tags and no numbers option.
${adjustedScore} Percentage calculated with Full text, including tags and numbers option.
${fuzzyFlag} Indicate that this match is fuzzy (currently only for translations from PO files with the #fuzzy mark)

View

Contains options for displaying texts and modification information in different ways.

Include the first non-unique segment when marking non-unique segments

Check this option to display all non-unique segments (repetitions) in grey. When the option is unchecked, all non-unique segments are shown in grey except the first occurrence.

Saving and Output

Allows the user select the interval - in minutes and seconds - between consecutive automatic saves of the project.

Change the default interval (3 minutes) depending on the characteristics of the project:

  • short intervals (minimum: 10 seconds) for synchronised projects on an internal server.

  • long intervals for team projects hosted on external servers.

External Post-processing Command

Specify commands that are executed after the Create Translated Documents command.

An example of the use of this feature would be to send translated documents automatically to the client's FTP server.

Also allow per-project external commands

By default, OmegaT does not execute the commands specified in the project-specific settings (the omegat.project file), because they may have a critical impact on the machine's security.

Only activate this option if you know what you are doing, and only for projects from trusted sources.

Proxy Login

If OmegaT needs to use an authenticated proxy server to access the Internet, enter the details provided by the proxy administrator here.

Secure store

Here you can redefine the master password used to protect login details and access keys for machine translation services. Take care to make a note of all these details before creating a new password, because they will all be deleted and will need to be re-entered.

Plugins

Gives access to the list of plugins available. Plugins are installed in the plugins folder under the OmegaT installation folder or the platform-specific OmegaT user preferences folder.

Updates

Enables automatic notification of OmegaT updates.