The difference is thin: we call "Window" a window that can stay open next to the editor and interact with it, and "dialog" a window where you must select parameters before closing it and pursuing your translation activity.
This dialog is accessible by selecting
→You can either enter the source and target languages by hand or use the drop down menus. Bear in mind that changing the languages may render the currently used translation memories useless since their language pair may not longer match the new languages.
Tokenizers corresponding to the selected languages are displayed.
The segmentation settings only address the way the source files are handled by OmegaT. The predominant way of segmenting the sources is the sentence-level segmenting, so this check box should in a normal case be left checked.
In some seldom cases the alternative, i.e. segmenting by paragraphs, may be preferred. Changing this flag does not modify the segmentation of already existing translation memories. If you decide mid-translation to switch from sentence to paragraph translation, the internal translation memory of the project will not be changed (OmegaT may upgrade old translation memories that did not use sentence segmentation, but not vice versa), but OmegaT will attempt to create paragraph fuzzy matches by gluing together existing sentence translations.
Changing segmentation settings may cause some already translated segments to be split or merged. This will effectively return them to the "untranslated" status, as they will no longer match segments recorded in the project memory, even though their original translation is still there.
In case there are non-unique segments in source documents, the Auto-propagation check box offers the user the following two possibilities as regards automatic translation: if checked, the first translated segment will be assumed as the default translation and its target text will be automatically used for later hits during the translation process. Mistranslated segments can of course be corrected later manually using
. If the Auto-propagation check box is not checked, the segments with alternative translations are left untranslated until the user has decided which translation is to be used.When enabled, all the formatting tags are removed from source segments. This is especially useful when dealing with texts where inline formatting is not really useful (e.g., OCRed PDF, bad converted .odt or .docx, etc.) In a normal case it should always be possible to open the target documents, as only inline tags are removed. Non-visible formatting (i.e., which doesn't appear as tags in the OmegaT editor) is retained in target documents.
This area allows entering an external post-processing command (for instance, a script to rename files) that will be applied each time Create Translated Documents is used. This external command cannot include "pipes", etc., which is why calling a script is recommended.
The segmentation rules are generally valid across all the projects. The user, however, may need to generate a set of rules, specific to the project in question. Use this button to open a dialog, activate the check box Segmentation Setup preferences for more information on segmentation rules.
, then proceed to adjust the segmentation rules as desired. The new set of rules will be stored together with the project and will not interfere with the general set of segmentation rules. To delete project specific segmentation rules, uncheck the check box. See
Hint:
the set of segmentation rules for a
given project is stored as
project/omegat/segmentation.conf.
In a similar fashion as above the user can create project-specific File filters, which will be stored together with the project and will be valid for the current project only. To create a project-specific set of file filters, click on the File Filters preferences for more on the subject.
button, then activate check box in the window that opens. A copy of the changed filters configuration will be stored with the project. To delete project specific file filters, uncheck the check box. Note that in the menu , the global user filters are changed, not the project filters. See
Hint:
the set of file filters for a given
project is stored as
project/omegat/filters.xml.
When working on a team project, this window allows you to define the mapping between remote folders and local folders (see examples here).
Defines the project-specific external search resources.
Here you can select different subfolders, for instance the subfolder with source files, subfolder for target files etc. If you enter names of folders that do not exist yet, OmegaT creates them for you. In case you decide to modify project folders, keep in mind that this will not move existing files from old folders to the new location.
Clic on
to define the files or folders that will be ignored by OmegaT. An ignored file or folder:is not displayed in the Editor pane,
is not taken into account in statistics,
is not copied in target
folder during the translated files
creation process.
In the Exclusion patterns dialog, it is possible to Add or Remove a pattern, or edit one by selecting a line and pressing F2. It is possible to use wildcards, using the ant syntax.
This window is displayed automatically when OmegaT loads a project, and at any time by pressing
→ .
Note:
it is possible to inhibit the window displaying
at opening, by setting
project_files_show_on_load
to
false
in omegat.prefs
file
(accessible by → menu).
Use Ctrl + L to open and Esc to close it. The Project Files Window displays the following information:
the total number of translatable files in the project. These are
the files present in the source
folder in a format that OmegaT is able
to recognize. This number is displayed in brackets, next to the "Project
file" title
the list of all translatable files in the project. Clicking on any file will open it for translation.
Typing any text will open a Filter field where parts of filenames can be entered. You can select a file with Up and Down keys, and open it for translation by pressing Enter
Note: filenames (in first column) can be sorted alphabetically by clicking in the header. It also possible to change the position of a filename, by clicking on it and pressing
buttons.Right-clicking on a filename opens a popup that allows to open the source file and (if it exists) the target file.
File entries include their names, file filter types, their encoding and the number of segments each file contains
the total number of segments, the number of unique segments in the whole project, and the number of unique segments already translated are shown at the bottom
The set of Unique segments is computed by taking all the segments and removing all duplicate segments. (The definition of “unique” is case-sensitive: "Run" and "run" are treated as being different)
The difference between "Number of segments" and "Number of unique
segments" provides an approximate idea of the number of repetitions in the
text. Note however that the numbers do not indicate how relevant the
repetitions are: they could mean relatively long sentences repeated a number
of times (in which case you are fortunate) or it could describe a table of
keywords (not so fortunate). The project_stats.txt
located in the omegat
folder of your project contains more detailed segment
information, broken down by file.
Modifying the segmentation rules may have the effect of modifying the number of segments/unique segments. This, however, should generally be avoided once you have started translating the project. See the chapter Segmentation rules for more information.
Adding files to the project:
You can
add source files to the project by clicking on the button. This copies the selected files to the
source
folder and reloads the project to import the new
files. You can also add source files from Internet pages, written in
MediaWiki, by clicking on
button and providing the corresponding URL.
Open the Search window with Ctrl + F and enter the word or phrase you wish to search for in the Search for box.
Alternatively, you can select a word or phrase in the Editor pane, Fuzzy matches pane or Glossary pane and hit Ctrl + F . The word or phrase is entered in the Search for box automatically. You can have several Search windows open at the same time, but close them when they are no longer needed so that they do not clutter your desktop.
Click the dropdown arrow of the Search for box to access the last 10 searches.
The Search window has its own menus:
File > Search for selection ( Ctrl + F ): refocus on the search field and select all its contents.
File > Close ( Ctrl + W ): close the search window (in the same way as Esc )
Edit > Insert source ( Ctrl + Shift + I ): insert current segment source.
Edit > Replace with source ( Ctrl + Shift + R ): replace with current segment source.
Edit > Create Glossary Entry ( Ctrl + Shift + G ): add a new glossary item.
In both exact and keyword searches, the wild card search characters '*' and '?' can be used. They have the meaning, familiar to Word users:
'*' matches zero or more characters, from the current position
in a given word to its end. The search term 'run*'
for example would match words 'run'
,
'runs'
and 'running'
.
'?' matches any single character. For instance,
'run?'
would match the word
'runs'
and 'runn'
in the word
'running'
.
The matches will be displayed in bold blue. Note that '*' and '?' have special meaning in regular expressions, so wild card search, as described here, applies to exact and keyword search only (see below).
Select the method using the radio buttons. The following search methods are available:
Search for segments containing the exact string you specified.
An exact search looks for a phrase, i.e. if several words are
entered, they are found only if they occur in exactly that sequence.
Searching for open file
will thus find all
occurrences of the string
open
file
, but not
file
opened
or
open input
file
.
Search for segments containing all keywords you specified, in
any order. Select keyword search to search for any number of
individual full words, in any order. OmegaT displays a list of all
segments containing all of the words specified. Keyword searches are
similar to a search "with all of the words" in an Internet search
engine such as Google (AND logic). Using keyword search with
open file
will thus find all
occurrences of the string
open
file
,
as well as
file
opened
, open input file
, file
may not be safe to open
, etc.
The search string will be treated as a regular expression. The search string - [a-zA-Z]+[öäüqwß] - in the example above for instance looks for words in the target segment, containing questionable characters from German keyboard. Regular expressions are a powerful way to look for instances of a string.
Additionally to one of the methods above you can select the following:
case sensitive: the search will be performed for the exact string specified; i.e. capitalization is observed.
Space matches nbsp: when this option is checked, a space character put in search entry can match either a normal space character or a non-breacking space (\u00A) character.
in source: search in the source segments
in translations: search in the target segments
in notes: search in notes to segments
in comments: search in comments to segments
Translated or untranslated: search in both translated and untranslated segments.
Translated: search only in translated segments.
Untranslated: search only in untranslated segments.
Display: all matching segments: if checked, all the segments are displayed one by one, even if they are present several times in the same document or in different documents.
Display: file names: if checked, the name of the file where each segment is found is displayed above each result.
Search in Project: check
Memory to include the project memory
(project_save.tmx
file) in the search. Check TMs
to include the translation memories located in the
tm
folder in the search. Check
Glossaries to include the glossaries located in
the glossary
folder in the search.
Search in Files: search in a
single file or a folder containing a set of files. When searching
through files (as opposed to translation memories), OmegaT restricts
the search to files in source file formats. Consequently, although
OmegaT is quite able to handle tmx
files, it does
not include them in the Search files search.
If you click on the button Full/Half width char insensitive option is checked, searches for fullwidth forms (CJK characters) will match halfwidth forms and vice versa.
additional criteria (author or changer of the translation, date translated, excluding orphan segments, etc) can be selected. WhenPressing the search button after entering a string in the search field displays all the segments in the project that include the entered string. As OmegaT handles identical segments as one single entity, only the first unique segment is shown. The segments are displayed in order of appearance in the project. Translated segments are displayed with the original text at the top and the translated text at the bottom, untranslated segments are displayed as the source only.
Double-clicking on a segment opens it in the Editor for modifications (one single click does it when Auto-sync with Editor option is checked). You can then switch back to the Search window for the next segment found, for instance to check and, if necessary, correct the terminology.
In the Search window, you can use standard shortcuts ( Ctrl + N , Ctrl + P ) to move from one segment to another.
You may have several Search windows open at the same time. You can quickly see their contents by looking at their title: it will contain the search term used.
For easier navigation in the search result set, you can apply the search to the editor. Press the
button on the bottom to limit the shown entries in the editor window to those that match the current search. You can use normal navigation to go to e.g. the next (untranslated) segment that matches the search criteria.NB:
A search may be limited to 1000 items, so if you search on a common phrase, the editor then shows only those 1000 matching entries, and not all entries that match the search criteria.
A file might have no matching entries, so it will show empty.
If a search removes duplicates, those duplicates will not be in the Editor.
To remove a filter, press the
button, or reload a project.Open the Search and replace window with Ctrl + K and enter the word or expression you wish to replace in the Search for box.
Then click on
button to display all the corresponding occurrences.Enter the new word or phrase (regular expressions are not supported) in the Replace with box, then click one of the following options:
Replace All: operates replacement of all occurrences (after displaying a confirmation window where the number of occurrences is shown).
Replace: operates a "one by one" replacement, by the mean of buttons in the header of the Editor pane. Click Replace Next or Skip, then end the replacement session with Finish.
Close: close the window without any change.
Search options are similar to the ones displayed in Search window.
Except one: check Untranslated in order to operate Search and replace also on segments that have not been translated yet.
To make it possible (although Search and replace operates only on memory), OmegaT will copy the source segment to the target segment before the replace operation occurs. If no replacement is done to a given segment, the target segment will be “emptied”, i.e., it will remain untranslated.
Alignment involves creating a bilingual translation memory from monolingual documents that have already been translated.
To access this window, select
→ .If the alignment looks as if it could be improved, try changing the parameters. In most cases, the lower the Average score is, better the alignment will be.
In Heapwise comparison mode, the texts are evaluated globally. In Parsewise comparison mode, they are evaluated segment by segment. The option only appears when such a selection is possible.
Use the ID comparison mode to align Key=Value texts. This works even if the keys are not in the same order in the two files, and/or if the two files do not contain the same amount of information. This option only appears if both the selected files are recognised as Key=Value files.
The Viterbi and Forward-Backward algorithms are two different calculation methods. Choose the one that provides the best results.
Click
to access the next step.After the automatic process, the alignment of two files generally requires manual corrections.
Translation units are located in cells in the two last columns.
To align two segments on the same line:
Select the first segment.
Press the space bar (shortcut for
→ ).Click in the other column on the translation corresponding to the first segment.
After several of these operations, select
→ to update the alignment of the other segments.To modify the position of one or more segments individually, select the segment(s) and press
U
(Move Up) or
D
(Move Down).
Only rows with the Keep box ticked in the first column will be included when the translation memory is created.
When the two columns are sufficiently aligned, click
to create the resulting translation memory.This window is accessible by selecting
→The Scripting window allows you to load an existing script into the text area and run it against the current opened project. To customize the scripting feature, do the following:
Load a script into the editor by clicking its name in the list in the left-hand panel.
Right-click a button from <1> to <12> in the bottom panel and select Add Script.
When you left-click the number, the selected script will run. You can also start the selected macros from the main menu by using their entries in the Ctrl + Alt + F# (# 1 to 12).
menu or by pressingBy default, scripts are stored in the scripts
folder located in the OmegaT installation folder (the folder that contains the OmegaT.jar
file).
If you add new scripts there, they will appear in the list of available scripts in the Scripting window.
Some additional scripts can be found here: OmegaT Scripts
The following scripting languages have been implemented:
Groovy (http://groovy.codehaus.org): a dynamic language for the Java Virtual machine. It builds upon the strengths of Java but has additional power features inspired by languages like Python, Ruby and Smalltalk.
JavaScript (sometimes abbreviated JS, not to be confused with Java): a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles. Being the language behind popular software such as Firefox, it is a familiar and preferred programming tool in the open-source domain.
All the languages have access to the OmegaT object model, with the project as the top object. For example, the following Groovy code snippet scans through all the segments in all the files in the current project and, if a translation exists, prints out the source and the target of the segment:
files = project.projectFiles; for (i in 0 ..< files.size()) { for (j in 0 ..< files[i].entries.size()) { currSegment = files[i].entries[j]; if (project.getTranslationInfo(currSegment)) { source = currSegment.getSrcText(); target = project.getTranslationInfo(currSegment).translation; console.println(source + " >>>> " + target); } } }
This dialog is accessible by selecting
→It allows parameters to be set for all translation projects.
Sets the segment validation key to Tab instead of the default Enter. This option is useful for some Chinese, Japanese or Korean character input systems.
The program will seek confirmation before closing down.
For confidentiality reasons, you may want to not send all the segments to the Machine Translation engine. If you uncheck this option, machine translations will be fetched only when you press Ctrl + M ( Cmd + M on OS X) in the current segment. You must then press Ctrl + M again to insert the suggestion.
Check this box to send only untranslated segments to the machine translation services.
Select a supplier from the list and, if necessary, click
to enter the identification details provided by the supplier.The procedures for configuring access to the Microsoft Translator and Google Translate services are described here.
Uncheck this option if the context description shown for each glossary entry is unnecessary or too long.
When this option is checked, the glossary will display pairs or groups of words (expressions) even if the words within them appear separately in the source text.
Uncheck this option if the glossary displays too many false positives.
Select this option if you want the glossary to display words that have the same root.
If both this option and the Insert the source text option are selected, all words with a corresponding glossary entry will be translated automatically when the source text is inserted.
If this option is checked, the glossary will only display one entry, even if the same word exists in several forms (e.g. with and without capital letters) in the glossary.
You can select a glossary pane contents layout. Layout variations can be added as plugins.
If a glossary item has alternate definitions, they will be displayed on the same line.
Click this button to access the TaaS project site and create a user account.
You can then create an access key on the page https://term.tilde.com/account/keys/create?system=omegaT.
If this option is selected, OmegaT will not remember the access key between sessions.
This button enables you to browse and download the collections that exist for the project's source and target languages. Private collections are displayed in bold. The collections are downloaded as TBX glossaries and stored in the current glossary
folder.
If necessary, you can select a specific domain to limit the volume of data sent and received.
Clear this option to deactivate automatic searching – if dictionaries are too long, for example.
Select this option if you want dictionaries to display words that have the same root.
You can select a theme for OmegaT's user interface. Theme can also be added as plugins.
Restores the components of the main OmegaT window to their default state. Use this feature when you have undocked, moved or hidden one or more components and you are unable to restore the desired arrangement. It can also be used when panes do not appear as expected following an OmegaT upgrade.
Shows the dialog to modify the text display font. Users of old computers who feel window resizing is very slow can try changing the font. See font settings in Miscellanea
This page allows you to choose different colours for each part of the user interface.
Pre-defined themes can be set using scripts. A script bundled with OmegaT called Switch Colour Themes provides a default "Dark" theme.
This dialog lists the file filters available. The filters used by the current project are displayed in bold. If you prefer not to use OmegaT to translate files of a certain type, you can turn off the corresponding filter by deactivating the check box beside its name. OmegaT will then omit the corresponding files when loading projects, and will copy them unmodified to the target
folder when creating target documents. When you wish to use the filter again, just tick the check box. Click Defaults to reset the file filters to the default settings. To edit the files and encodings a filter is used for, select the filter in the list and click Edit.
The dialog allows you to enable or disable the following options:
Remove leading and trailing tags: uncheck this option to display all the tags, including tags at the beginning and end of the segment. Warning: in Microsoft Open XML formats (docx, xlsx, etc.), if all tags are displayed, DO NOT place any text before the first tag – it is a technical tag that must always begin the segment.
Remove leading and trailing whitespace in non-segmented projects: by default, OmegaT removes leading and trailing whitespace. In non-segmented projects, it is possible to keep it by unchecking this option.
Preserve spaces for all tags: check this option if the source documents contain significant spaces used to control the layout that must not be ignored.
Ignore file context when identifying segments with alternate translations: by default, OmegaT uses the source file name as part of the identification of an alternative translation. If the option is checked, the source file name will not be used, and alternative translations will take effect in any file as long as the other context (previous/next segments or some sort of segment identifier, depending on the file format) matches.
Several filters (text files, XHTML files, HTML and XHTML files, OpenDocument files and Microsoft Open XML files) have one or more specific options. To modify the options, select the filter in the list and click
. The available options are:Text files
Segment source text into paragraphs on:
if sentence segmentation rules are active, the text will be segmented further according to the option selected here.
PO files
Allow blank translations in the target file:
If selected, when a segment in a PO file (which may be a whole paragraph) is not translated, the translation will be empty in the target file. Technically speaking, the msgstr
segment in the PO target file, if created, will be left empty. As this is the standard behaviour for PO files, it is selected by default. If the option is off, the source text will be copied to the target segment.
Skip PO header
The PO header will be skipped and left unchanged if this option is checked.
Auto replace 'nplurals=INTEGER; plural=EXPRESSION;' in header
This option allows OmegaT to override the specification in the PO file header and use the default for the selected target language.
XHTML Files
Translate the following attributes: the selected attributes will appear as segments in the Editor window.
Start a new paragraph on: the <br> HTML tag will constitute a paragraph break for segmentation purposes.
Skip text matching regular expression: any text matching the regular expression is skipped. It is shown in red in the tag validator. Text in source segments that matches is shown in italics.
Do not translate the content attribute of meta-tags ...: the meta-tags in the box will not be translated.
Do not translate the content of tags with the following attribute key-value pairs (separate with commas): if a tag matches the list of key-value pairs, its content will be ignored.
It is sometimes useful to be able make certain tags untranslatable based on the values of their attributes. For example, <div class="hide"> <span translate="no">
. You can define key-value pairs for tags to be left untranslated. For the example above, the field would contain: class=hide, translate=no
.
Microsoft Office Open XML files
You can select which elements are to be translated. They will appear as separate segments in the translation.
Word: non-visible instruction text, comments, footnotes, endnotes, footers
Excel: comments, sheet names
Power Point : slide comments, slide masters, slide layouts
Global: charts, diagrams, drawings, WordArt
Other Options:
Aggregate tags: if checked, tags with no translatable text between them will be aggregated into a single tag.
Preserve spaces for all tags: if checked, "white space" (i.e. spaces and newlines) will be preserved, even if this option is not defined in the document.
HTML and XHTML files
Add or rewrite encoding declaration in HTML and XHTML files: the target files often need to have a different character set encoding from the one in the source file (whether it is explicitly defined or implied). Using this option, the translator can specify whether the target files should have the encoding declaration included. For instance, if the file filter specifies UTF8 as the encoding scheme for the target files, selecting Always will ensure that this information is included in the translated files.
Translate the following attributes: the selected attributes will appear as segments in the Editor window.
Start a new paragraph on: the <br> HTML tag will constitute a paragraph break for segmentation purposes.
Skip text matching regular expression: any text matching the regular expression is skipped. It is shown in red in the tag validator. Text in source segments that matches is shown in italics.
Do not translate the content attribute of meta-tags ...: The meta-tags in the box will not be translated.
Do not translate the content of tags with the following attribute key-value pairs (separate with commas): if a tag matches the list of key-value pairs, its content will be ignored.
It is sometimes useful to be able make certain tags untranslatable based on the values of their attributes. For example, <div class="hide"> <span translate="no">
. You can define key-value pairs for tags to be left untranslated. For the example above, the field would contain: class=hide, translate=no
.
Compress whitespace in translated document: multiple continuous whitespace characters will be converted into one single whitespace in the translated document.
Remove HTML comments in translated document: commented parts (between <!-- and -->) will not be copied into the translated document.
Open Document Format (ODF) files
You can select which of the following items are to be translated:
index entries, bookmarks, bookmark references, notes, comments, presentation notes, links (URL), sheet names
This dialog enables you to specify the source filename patterns for files to be processed by the filter, customize the filenames of translated files and select which encodings should be used for loading the source file and saving the translation. To modify a file filter pattern, either modify the fields directly or click
. To add a new file filter pattern, click . The same dialog is used to add a pattern or to edit a particular pattern. The dialog includes a special target filename pattern editor, which you can use to customize the names of output files.When OmegaT encounters a file in its source
folder, it attempts to select the filter based upon the file's extension. More precisely, OmegaT attempts to match each filter's source filename patterns against the filename. For example, the pattern *.xhtml
matches any file with the .xhtml
extension. If the appropriate filter is found, the file is assigned to it for processing. For example, by default, the XHTML filter is used to process files with the .xhtml extension. You can change or add filename patterns for files to be handled by each file filter. Source filename patterns use wild card characters similar to those used in
Searches
. The '*' character matches zero or more characters. The '?' character matches exactly one character. All other characters represent themselves. For example, if you wish the text filter to handle readme files (readme, read.me
, and readme.txt
) you should use the pattern read*
.
Only a limited number of file formats specify a mandatory encoding. File formats that do not specify their encoding will use the encoding you set up for the extension that matches their name. For example, by default .txt
files will be loaded using the default encoding of your operating system. You can change the source encoding for each different source filename pattern. Target files can also be written in any encoding. By default, the translated file encoding is the same as the source file encoding. The source and target encoding fields use drop-down menus containing all the supported encodings. <auto> leaves the choice of encoding to OmegaT. This is how it works:
OmegaT identifies the source file encoding by using its encoding declaration, if present (HTML files, XML based files).
OmegaT is instructed to use a mandatory encoding for certain file formats (Java properties etc).
OmegaT uses the default encoding of the operating system for text files.
Sometimes you may wish to rename the files you translate automatically, for example adding a language code after the file name. The target filename pattern uses a special syntax, so if you want to edit this field, you must click Edit... and use the Edit Pattern Dialog. If you want to revert to the filter's default configuration, click Defaults. You can also modify the name directly in the target filename pattern field of the file filters dialog. The Edit Pattern Dialog offers among others the following options:
Default is ${filename}
– full filename of
the source file with extension: in this case the name of the
translated file is the same as that of the source file.
${nameOnly}
– allows you to insert only
the name of the source file without the extension.
${extension}
- the original file
extension
${targetLocale}
– target locale code (of a
form "xx_YY").
${targetLanguage}
– the target language
and country code together (of a form "XX-YY").
${targetLanguageCode}
– the target
language - only "XX"
${targetCountryCode}
– the target country
- only "YY"
${timestamp-????}
– system date time at
generation time in various patterns
See Oracle documentation for examples of the "SimpleDateFormat" patterns
${system-os-name}
- operating system of
the computer used
${system-user-name}
- system user
name
${system-host-name}
- system host
name
${file-source-encoding}
- source file
encoding
${file-target-encoding}
- target file
encoding
${targetLocaleLCID}
- Microsoft target
locale
Additional variants are available for variables ${nameOnly} and
${Extension}. In case the file name has ambivalent name, one can apply
variables of the form ${name
only
-extension number
} and
${extension
-
extension number}
. If for example the original file is named
Document.xx.docx, the following variables will give the following
results:
${nameOnly-0}
Document
${nameOnly-1}
Document.xx
${nameOnly-2}
Document.xx.docx
${extension-0}
docx
${extension-1}
xx.docx
${extension-2}
Document.xx.docx
Translation memory tools work with textual units called segments. OmegaT has two ways to segment a text: by paragraph or by sentence segmentation (also referred to as “rule-based segmentation”). In order to select the type of segmentation, select
→ from the main menu and tick or untick the check box provided. Paragraph segmentation is advantageous in certain cases, such as highly creative or stylistic translations in which the translator may wish to change the order of entire sentences; for the majority of projects, however, sentence segmentation is a choice to be preferred, since it delivers better matches from previous translations. If sentence segmentation has been selected, you can setup the rules by selecting → from the main menu.Dependable segmentation rules are already available for many languages, so it is likely that you will not need to get involved with writing your own segmentation rules. On the other hand this functionality can be very useful in special cases, where you can increase your productivity by tuning the segmentation rules to the text to be translated.
Warning: because the text will segment differently after filter options have been changed, so you may have to start translating from scratch. At the same time the previous valid segments in the project translation memory will turn into orphan segments. If you change segmentation options when a project is open, you must reload the project in order for the changes to take effect.
OmegaT uses the following sequence of steps:
OmegaT first parses the text for structure-level segmentation. During this process it is only the structure of the source file that is used to produce segments.
For example, text files may be segmented on line breaks, empty lines, or not be segmented at all. Files containing formatting (ODF documents, HTML documents, etc.) are segmented on the block-level (paragraph) tags. Translatable object attributes in XHTML or HTML files can be extracted as separate segments.
After segmenting the source file into structural units, OmegaT will segment these blocks further into sentences.
The process of segmenting can be pictured as follows: the cursor moves along the text, one character at a time. At each cursor position rules, consisting of a Before and After pattern, are applied in their given order to see if any of the Before patterns are valid for the text on the left and the corresponding After pattern for the text on the right of the cursor. If the rule matches, either the cursor moves on without inserting a segment break (for an exception rule) or a new segment break is created at the current cursor position (for the break rule).
The two types of rules behave as follows:
Separates the source text into segments. For example, " Did it make sense? I was not sure ." should be split into two segments. For this to happen, there should be a break rule for "?", when followed by spaces and a capitalized word. To define a rule as a break rule, tick the Break/Exception check box.
specify what parts of text should NOT be separated. In spite of the period, "Mrs. Dalloway " should not be split in two segments, so an exception rule should be established for Mrs (and for Mr, for Dr, for prof etc), followed by a period. To define a rule as an exception rule, leave the Break/Exception check box unticked.
The predefined break rules should be sufficient for most European languages and Japanese. In view of the flexibility, you may consider defining more exception rules for your source language in order to provide more meaningful and coherent segments.
All segmentation rule sets for a matching language pattern are active and are applied in the given order of priority, so rules for specific language should be higher than default ones. For example, rules for Canadian French (FR-CA) should be set higher than rules for French (FR.*), and higher than Default (.*) ones. Thus, when translating from Canadian French the rules for Canadian French - if any - will be applied first, followed by the rules for French and lastly, by the Default rules.
Major changes to the segmentation rules should be generally avoided, especially after completion of the first draft, but minor changes, such as the addition of a recognized abbreviation, can be advantageous.
In order to edit or expand an existing set of rules, simply click on it in the top table. The rules for that set will appear in the bottom half of the window.
In order to create an empty set of rules for a new language pattern click Add in the upper half of the dialog. An empty line will appear at the bottom of the upper table (you may have to scroll down to see it). Change the name of the rule set and the language pattern to the language concerned and its code. The syntax of the language pattern conforms to regular expression syntax. If your set of rules handles a language-country pair, we advise you to move it to the top using the Move Up button.
Add the Before and After patterns. To check their syntax and their applicability, it is advisable to use tools which allow you to see their effect directly. See Regular expressions. A good starting point will always be the existing rules.
Intention | Before | After | Note |
---|---|---|---|
Set the segment start after a period ('.') followed by a space, tab ... | \. | \s | "\." stands for the period character. "\s" means any white space character (space, tab, new page etc.) |
Do not segment after Mr. | Mr\. | \s | This an exception rule, so the rule check box must not be ticked |
Set a segment after "。" (Japanese period) | 。 | Note that after is empty |
|
Do not segment after M. Mr. Mrs. and Ms. | Mr??s??\. | \s | Exception rule - see the use of ? in regular expressions |
Click on
to configure the Auto-completer Glossary View.Click on
to configure Auto-text options and to add or remove entries.Click on
to set the Character table auto-completer options.Auto-completer is launched within the target segment via Ctrl + Space shortcut.
If Show Relevant Suggestions Automatically option is checked, Auto-completer is launched automatically by typing the first letter of a translated glossary entry, or by typing "<" in case of tags.
OmegaT has a built-in spell checker based on the spelling checker used in Apache OpenOffice, LibreOffice, Firefox and Thunderbird. It is consequently able to use the huge range of free spelling dictionaries available for these applications.
Select the location of the language checker.
Using a different language checker on your local machine than the one supplied with OmegaT gives you the option of personalising the verification rules.
Check or uncheck the rules depending on whether they are relevant to the type of text you are translating.
By default, OmegaT does not execute the commands specified in the project-specific settings (the finder.xml
file in the omegat
folder), because they may have a critical impact on the machine's security.
Only activate this option if you know what you are doing, and only for projects from trusted sources.
Enables you to change the order of the commands in the context menu (the right-click menu). Values around 100 display commands at the top, and values around 900 display them at the bottom.
You will need to restart OmegaT for this change to take effect.
You can have the source text inserted automatically into the editing field. This is useful for texts containing many trade marks or other proper nouns you which must be left unchanged.
OmegaT leaves the editing field blank. This option allows you to enter the translation without the need to remove the source text, thus saving you two keystrokes ( Ctrl + A and Del ). Empty translations are now allowed. They are displayed as <EMPTY> in the Editor. To create one, right-click in a segment, and select Set empty translation. The entry Remove translation in the same pop up menu also allows to delete the existing translation of the current segment. You achieve the same by clearing the target segment and pressing .
OmegaT inserts the translation of the string most similar to the current source, if it is above the similarity threshold that you have selected in this dialog. The prefix (per default empty) can be used to tag translations, done via fuzzy matches. If you add a prefix (for instance [fuzzy]), you can trace those translations later to see they are correct.
The check boxes in the lower half of the dialog window serve the following purpose:
If this option is checked, when a fuzzy match is inserted, either manually or automatically, OmegaT attempts to convert the numbers in the fuzzy matches according to the source contents. There are a number of restrictions:
The source segment and the fuzzy matches must contain the same list of numbers
The numbers must be exactly the same between the source and the target matches.
Only integers and simple floats (using the period as a decimal character, e.g. 5.4, but not 5,4 or 54E-01) are considered.
Documents for translation may contain trade marks, names or other proper nouns that will be the same in translated documents. There are two strategies for segments that contain only such invariable text.
You can decide not to translate such segments at all. OmegaT will then report these segments as not translated. This is the default. The alternative is to enter a translation that is identical to the source text. OmegaT is able to recognize that you have done this. To make this possible, select this option.
The text export function exports data from within the current
OmegaT project to plain text files. The
data are exported when the segment is opened. The files appear in
the script
subfolder in the OmegaT user files folder, and
include:
The content of the segment source text
(source.txt
).
The content of the segment target text
(target.txt
).
The text highlighted by the user, when
Ctrl
+
Shift
+
C
is pressed or → is selected
(selection.txt
).
The content of the files is overwritten either when a new segment is opened (source.txt and target.txt) or when a new selection is exported (selection.txt). The files are unformatted plain text files. The whole process can be steered and controlled via Tck/Tcl-based scripting. See Using the OmegaT text export function for specifics, examples and suggestions.
If we want to avoid any mis-translations in case of segments with several possible target contents, checking this check box will cause Go To Next Untranslated Segment to stop on the next such segment, irrespective of whether it has already been translated or not.
Uncheck this option to prevent any damage on the tags (i.e., partial deletion) during editing. Removing an entire tag remains possible in that case, by using Ctrl+Backspace/Delete or by selecting it completely (Ctrl+Shift+Left/Right) then deleting it (Delete or Ctrl+X).
Check this option to be warned about differences between source and target segments tags each time you leave a segment.
Check this option to record in the
project_save.tmx
file the information that a
segment has been auto-populated, so it can be displayed with a
specific color in the Editor (if the "Mark Auto-Populated Segments"
option, in the View menu, is checked).
By default the editor displays 2,000 of initial segments, and progressively loads more as you scroll up or down. If you have a powerful machine, and/or if you don't like how the scrollbar behaves during progressive loading, you can increase this number.
When translating software-related files, you can configure the Tag Validator options to also check programming (%...) variables or placeholders ({0}), if the file filter doesn't do it out of the box already. The PO filter already handles %.. and Java™ Resource Bundle filter already handles {#} tags, so you only need this for other file types.
You can also define various options relating to tag validation and define custom tags.
For example, if you enter
\d+
into the Regular expression for custom tags field, all numbers will be considered as tags, enabling you to check that numbers have not been changed by mistake during translation.
Similarly, enter
<.*?>
to make sure that HTML tags (for example) entered into the source text are preserved without modification in the translation.
Note: these two instructions can be combined by writing
(<.*?>)|(\d+)
.
Enter your name here and it will be attached to all segments translated by you.
List of projects for which login details are stored in OmegaT. Remove a project from this list if you want OmegaT to ask you for a login and a password every time you access the project.
By default, the closest matches displayed in the Fuzzy Matches pane are determined using stemming.
To obtain more literal matches closer to 100%, select the Full text, including tags and numbers option.
Decide how tags in foreign TMX files (i.e. not generated by OmegaT) are to be treated.
Change how fuzzy matches are displayed, through the use of pre-configured variables:
${id}
|
Number of the match from 1 to 5 |
${sourceText}
|
Source text of the match |
${targetText}
|
Target text of the match |
${diff}
|
String showing the differences between the source and the match. Hint: use this if the text you are translating has been updated. |
${diffReversed}
|
Same as ${diff}, but with the differences (what is to be inserted and deleted) inverted. |
${score}
|
Percentage calculated with Stemming, no tags and no numbers option. |
${noStemScore}
|
Percentage calculated with No tags and no numbers option. |
${adjustedScore}
|
Percentage calculated with Full text, including tags and numbers option. |
${fuzzyFlag}
|
Indicate that this match is fuzzy (currently only for translations from PO files with the #fuzzy mark) |
Contains options for displaying texts and modification information in different ways.
Check this option to display all non-unique segments (repetitions) in grey. When the option is unchecked, all non-unique segments are shown in grey except the first occurrence.
Allows the user select the interval - in minutes and seconds - between consecutive automatic saves of the project.
Change the default interval (3 minutes) depending on the characteristics of the project:
short intervals (minimum: 10 seconds) for synchronised projects on an internal server.
long intervals for team projects hosted on external servers.
Specify commands that are executed after the
command.An example of the use of this feature would be to send translated documents automatically to the client's FTP server.
By default, OmegaT does not execute the commands specified in the project-specific settings (the omegat.project
file), because they may have a critical impact on the machine's security.
Only activate this option if you know what you are doing, and only for projects from trusted sources.
If OmegaT needs to use an authenticated proxy server to access the Internet, enter the details provided by the proxy administrator here.
Here you can redefine the master password used to protect login details and access keys for machine translation services. Take care to make a note of all these details before creating a new password, because they will all be deleted and will need to be re-entered.
Gives access to the list of plugins available. Plugins are installed in the plugins
folder under the OmegaT installation folder or the platform-specific OmegaT user preferences folder.
Enables automatic notification of OmegaT updates.