File filters are either local and specific to a given project, or global and available to all the projects that share a configuration folder.
For details, see:
Filters in bold are used in the current project.
Disable a filter by unchecking its box if you prefer not to translate the files that are associated to it. Their contents will not be displayed for translation.
You can sort the filters by name or by whether they are enabled. Click on the relevant header to sort them in ascending or descending order.
To modify the file extensions, target file name and encodings associated to a filter, select it in the list and click the
button.Some filters provide a
button to further customize their settings.Click the
button to reset the file filters to their default settings.Modified global file filter preferences are saved in filters.xml, in the configuration folder. See Configuration Folder for details. Deleting that file also resets the filter preferences.
Modified local file filters are saved in the filters.xml file, located in the project folder. See the Project Folder chapter for details. Deleting that file also resets the filter preferences and reverts the project to global file filters.
Leading and trailing tags are generally required by OmegaT to properly recreate the translated segment. Hiding them from the translatable contents ensures that you will not erase or modify them by mistake.
If you keep the leading and trailing tags, make sure you also include them in the translated text.
By default, OmegaT removes any leading and trailing whitespace from the translatable contents. In non-segmented projects, disable this option to make leading and trailing whitespace modifiable in the translation.
If the source documents contain whitespace used to control the layout, the whitespace that must be will retained in the translated document.
The source file name is one of the elements that characterize an alternative translation. If this option is checked, only the previous/next segments or a segment identifier will be used to characterize an alternative translation.
Segments with the same characteristics located in other files will be translated the same way.
Double-click the editable fields to make simple modifications or click on the
button to access the modification dialog.To add a filter pattern, click on
to open a similar dialog.Both dialogs allow you to customize the filename patterns for the source and target files associated to this the filter, and select their respective encoding.
Use the Filename Variables drop-down menu to customize the target file name.
To associate a filter to a file, OmegaT checks its file extension and attempts to match it to a source filename patterns in a filter.
For example, the pattern .xhtml
registered in the
XHTML filter matches any file with the xhtml
extension. If such a file is found in the source
folder, the file will be handled by the XHTML filter.
You can change or add filename patterns to associate different files to a filter.
Associating a file extension to a filter is not sufficient to have
the filter properly handle the file. The file structure must also be
compatible with the filter: even if you associate
.odt
to the XHMTL filter, the filter will not be able
to understand the contents of a LibreOffice Writer
file.
Source filename patterns use wild card characters : The
*
character matches zero or more characters, while the
?
character matches exactly one character.
For example, use the pattern read*
if you want to
have the text filter handle readme files (readme,
read.me
, or readme.txt
).
Most file formats allow various possible encodings. By default, the encoding of the translated file is the same as that of the source file.
The source and target encoding fields use drop-down menus listing all supported encodings. Selecting the <auto> option leaves the choice of encoding to OmegaT, based on the following criteria:
OmegaT uses the encoding declaration in the source file, if present, to identify the encoding (HTML or XML based files).
OmegaT is instructed to use a mandatory encoding for certain file formats (Java properties, for example).
OmegaT uses the default encoding of the operating system for text files.
Files in the target folder are overwritten every time you create them if they are created with the same name.
OmegaT can automatically create new file names for the files you create, by adding a language code or a time stamp, for example.
The target filename pattern uses a special syntax. The easiest way to modify it is to use the Edit Pattern dialog. The dialog offers various options:
The default pattern. It represents the complete filename of the source file, including the extension. Using this pattern assigns the translated file the exact same name as the source file.
name of the source file, without the extension
original file extension
target language+region code (xx_YY)
target language+region (xx-YY)
target language code (xx)
target region code (YY)
system time when the file was created
See the Oracle documentation for examples.
name of the operating system
user’s login name
host name on the system
encoding of the source file
encoding of the target file
Microsoft target locale
Additional variants are available for ${nameOnly}
and ${extension}
.
If the use of multiple periods makes identifying the file name and
extension ambiguous, you can use variables of the form
${nameOnly-
number
} or
${extension-
number}
to
specify which portions are part of the name or extension, as shown in the
example below.
For a source file named Document.xx.docx, using the variable variants below will produce the following results:
${nameOnly-0}
: Document
${nameOnly-1}
: Document.xx
${nameOnly-2}
: Document.xx.docx
${extension-0}
: docx
${extension-1}
: xx.docx
${extension-2}
: Document.xx.docx
Several filters offer options. Select the filter in the list and click
to modify them.The available options are:
Text files do not have generic paragraph markers. Choose here the way OmegaT creates paragraphs in your text files.
specifies the maximum number of characters before breaking a long line. A value of 0 sets no limits.
specifies the maximum number of characters before cutting a line and ignoring the rest. A value of 0 sets no limits.
The Microsoft Office Open XML (legacy filter)
is
the original OmegaT filter. You should only use it to avoid
compatibility issues with previous projects containing files you
handled with that filter.
You can choose additional document elements to translate. They will appear as separate segments in the editor.
Non-visible instruction text, comments, footnotes, endnotes, footers, duplicate fallback text, and document properties.
Comments and sheet names.
Slide comments, slide masters, and slide layouts.
External links, charts, diagrams, drawings, and WordArt.
Tags that do not enclose translatable text will be aggregated into a single tag.
Whitespace (i.e., spaces and newlines) will be preserved, even if this option is not defined in the document.
Enable this option if soft-returns are intended to be paragraph starters.
The selected attributes will appear as translatable segments in the Editor pane.
The <br> HTML tag will constitute a paragraph break for segmentation purposes.
Any paragraph matching the regular expression is ignored while loading and is not displayed for translation.
This option is convenient when dealing with HTML parts that only contain non translatable text.
Define the <meta> tag attribute values for which the associated "content" attribute will not be translated.
Do not add quotation marks and separate the values with a comma.
To ignore this content:
<meta name="robots" content="index,
follow">
use:
name=robots
Define the attribute values that make a tag non-translatable.
Do not add quotation marks and separate the values with a comma.
To ignore this content:
<span translate="no">This content
is not translatable</span>
use: translate=no
.
All the tags that are marked with
translate="no"
will be ignored.
Only the options not available under the XHTML files filter (see above) are described here.
The encoding of an HTML document is generally declared within a <meta> element situated in the <head> element.
Source and target files sometimes require a different encoding.
Here, you can decide whether to add or modify the declaration of the target file
always, based on the file filter settings,
only if the file already has a <head> tag,
only if the file already has a declaration,
or never and only save the target file in the encoding specified in the file filter settings.
Whitespace outside the tags is considered non significant in HTML/XHTML.
This option converts such multiple continuous whitespace characters into a single space in the translated document.
Comments in an HTML file are generally addressed to developers. Use this option to remove them. If unchecked, the comments are displayed as tags.
Text in HTML comments (between <!--
and -->
) are not copied into the
translated document.
Having untranslated contents in the translated files sometimes creates compatibility issues.
Having untranslated contents in the translated files sometimes creates compatibility issues.
The filter checks printf variables ('%s', etc.) by
default. See the
Check
printf function variables
preference for details.
OmegaT always reproduces the source contents when a segment is not provided. Use this option to leave a non translated segment blank.
Blank source segments sometimes act as placeholders for parts that do not exist in the source language but are necessary in the target language. Use this option to provide a translation based on the associated comments.
The PO header will not be displayed for translation.
Override the plural specification in the header and use the target language default.
PO files that use msgid
as the
source container and expect the translation to be put in
msgstr
PO files that use msgid
as an
ID code, use msgstr
as the source
container and expect the translation to overwrite
msgstr
Having untranslated contents in the translated files sometimes create compatibility issues.
The filter checks Java MessageFormat patterns (e.g. \{0\}) by
default. See the
Check
printf function variables
preference for details.
Java 8 requires ISO-8859-1 encoding and uses Unicode literals for characters outside that character set. Java 9 and above requires UTF-8 encoding. This option forces Java 8 compatibility.
Having untranslated contents in the translated files sometimes create compatibility issues.
Some applications require some Unicode literals to be kept. This option allows for that.
Index entries, bookmarks, bookmark references, notes, comments, presentation notes, links (URL), and sheet names.
This filter is the original OmegaT XLIFF filter. You should only use it to avoir compatibility issues with previous projects containing files you handled with that filter.
Enable this option if you need to work with XLIFF files created with OmegaT 2.6.
User can select from three options, Previous and next paragraphs, <trans unit> ID, or <trans-unit> resname attribute when available, when unavailable, the ID will be used as a fallback.
These options specify the way OmegaT creates tags from the XLIFF contents.
if checked, OmegaT changes the XLIFF target state to “needs-review-translation” instead of “translated”.
Translation memory tools work with textual units called segments. When a translation is entered, the segment containing the source text is stored with its translation in the project memory, and subsequently used to match other source segments in the project.
To specify the type of segmentation, use the
Sentence-level
segmenting
project
property.
Segments are by default paragraphs defined by the file format itself.
Not using sentence segmentation on a document is equivalent to using paragraph segmentation. In that case, each paragraph (as defined in the original document format) is displayed as a single segment, and the translator is free to reorganize the sentences within the segment in the translation.
Paragraph segmentation works well with more literary or creative texts, as well as, more generally, with documents for which translation memory matches are not so important.
Sentence segmentation relies on a number of rules (called segmentation rules ) that define what constitutes a sentence in the source language. This setting works well with documents where repetitions or similar sentences are common, such as technical or legal documents.
OmegaT first parses the text for paragraph-level segmentation. This process relies only on the structure of the source file to produce segments.
For example, text files may be segmented on line breaks, empty lines, or not at all. Files containing formatting (ODF, HTML, or other documents) are divided at block-level (paragraph) tags. Translatable object attributes in XHTML or HTML files can be extracted as separate "paragraphs".
After dividing the source file into structural units, OmegaT further divides those units into segments.
You can visualize segmentation as the process of moving the cursor along the text, one character at a time, and looking for the position where a break will occur, or where a break will not be allowed.
Each time the cursor moves to the next character, OmegaT checks whether:
the text before the location corresponds to a Before rule,
and the text after the location corresponds to the associated After rule.
If the location matches both rules, it is considered either as a break, or as a non-break, depending on what the rule defined.
The same mechanisms and dialogs are used to define global and local segmentation rules.
By default, segmentation settings are global and shared by all projects.
Use the project property to limit the scope of the segmentation rules to the current project.
You can achieve a similar result by starting OmegaT from the command line. See the Command line launch how-to for details.
If you use local rules, you can still access the global rules, but modifying them will have no effect on your project.
OmegaT provides predefined segmentation rules, and the translator can use regular expressions to modify them. See the Regular expressions appendix for details.
As a reminder, rules work the following way: when a rule matches, OmegaT puts a marker at the match location so that rules that come after ignore that location. That is the reason why exception rules must come before segmentation rules.
If you change the segmentation while translating, you will have to reload the project for the new segmentation to take effect. This will split or merge some previously translated segments, which will therefore no longer be considered translated. Nonetheless, their original translation will still be in the project memory.
Category | Intention | Before | After | Explanation |
---|---|---|---|---|
Exception rule, box unchecked, higher in the list | Do not segment after Ms. | Ms\. | \s | Ms, followed by a period, followed by a whitespace. |
Exception rule, box unchecked, higher in the list | Excel cells with lines breaks that do not represent segments | \n | . | Line break, followed by anything. |
Break rule, box checked, lower in the list | Start a new segment after a period followed by a space, tab, or other whitespace. | \. | \s | A period followed by a whitespace |
Break rule, box checked, lower in the list | Start a new segment after “。” (Japanese period). | 。 | Note that the Pattern After field can
be empty. |
This appendix is intended for users interested in exploring a powerful way to boost their productivity. Although seen as daunting and complex, even the simplest regular expressions (often abbreviated regex or regexp ) are extremely useful, not only in OmegaT, but in many other applications you might use on a day-to-day basis, with some variations.
Only the fundamentals most useful to translators are covered. The References section at the end of this appendix provides a few starting points to explore advanced or complex uses beyond the scope of this manual. If you need help for a specific case, you can also ask questions in the various support channels.
Regular expressions use a combination of letters, digits, and symbols (collectively known as characters ) to define an expression that represents a specific text pattern.
Here are a few examples.
Any single digit from 0 to 9.
Represents one or more “word characters”, namely the letters of the alphabet, digits, and underscore symbols.
Represents zero or one horizontal whitespace character (this includes regular and non-breaking spaces as well as tabs, but not line break characters, which belong to the “vertical whitespace” category: \v.)
Many OmegaT functions rely on regular expressions or make them available as an option:
Searches include a Regular expressions option that allows you to make extremely powerful searches across your files.
The same option in the Text Replace dialog allows you to apply regular expressions to both the search and replaced text.
Custom tags are tags defined with regular expressions that are
handled exactly like native OmegaT tags. See the
Custom
tags
preference for details.
Use the |
(OR) character to separate individual
tag definitions.
The
Flagged
text
preference allows you to define strings that OmegaT will mark in red by
default, and treat as extraneous tags for validation purposes.
Use the |
(OR) character to separate individual
fragment definitions.
Visual cues can help verifying that your alignment is correct. The Highlight setting allows you to define strings that OmegaT will highlight in the aligned documents.
Use the |
(OR) character to separate individual
expressions.
Segmentation rules and language patterns are defined with regular expressions. You can modify them freely to improve the segmentation of a document or add additional general rules. See the Segmentation appendix for details.
Segmentation or exception rules define the position in a segment where a split will, or will not, be made. Two regular expressions are required to define that position: a “before” expression to define the text pattern ahead of where the rule should apply, and an “after” expression to define the text pattern following that position.
A language pattern that matches the source language of the project will apply to that project.
Regular expressions are used to find text, including characters that are not visible on the screen or when printed out, such as spaces, tabs, or line breaks. Any given expression either matches , or does not match a word, phrase, or other sequence of text.
Each and every character in the expression is relevant when determining a match.
A number of characters or combinations of characters have a special meaning in a regular expression.
Regular expressions only match text. They cannot match decorations such as bold , italics , or other stylistic effects .
There are four rules to keep in mind.
The majority of characters in a regular expression simply look for themselves in the text sequence.
For example, the seven letters spelling out the word “ example ” simply tell the search function to match exactly those letters, in that order. Simply put, the search just looks for the word “ example ”.
\
) take on a special meaning
Unlike a letter on its own, which simply represents itself as
noted above, a letter preceded by a \
has a special
function in a regular expression.
For example,
r
is just a normal character
but preceding it with \
to make it
\r
turns it into a special combination that matches
a
carriage return character
. Similarly,
\R
matches
any line break
character
.
Only the letters i j l m o , and y , in both lower- and uppercase, have no special meaning when preceded by a backslash. This manual only describes a small subset of letters that take on a special meaning.
Consult the sites in the References section below for information on combinations not covered here.
That special meaning has to be cancelled by another character to match the character itself.
The full list of characters is presented below. One example is
.
: on its own, it has the special meaning of
matching
any single character
.
To find a normal period, that meaning has to be cancelled using
the \
, to make the expression
\.
, which just matches a period.
\
character is a very special
character
As stated above, the \
character has the
default special meaning of either cancelling or activating the special
meaning of other characters. It has no effect if placed before a
character with no special meaning (either by default or by
addition).
The \
can cancel its own special meaning by
doubling up to form \\
, which simply matches the
backslash
character itself.
The twelve special characters are the
backslash
\
, the
caret
^
,
the
dollar sign
$
, the
period
(or
dot
)
.
, the
vertical bar
(or
pipe symbol
) |
, the
question mark
?
, the
asterisk
(or
star
)
*
, the
plus sign
+
, the opening
parenthesis
(
, the closing
parenthesis
,
)
, the opening
square bracket
[
, and the opening
curly brace
{
.
Each character is briefly described below with examples of regular expressions that rely on the character as well as of text that they do, or do not, match.
\
This character either cancels or activates the special meaning of the following character.
0\.[0-9]
|
|
Matches |
A number between 0.0 and 0.9 , or just the final 0.5 in numbers such as 10.5 or 560.5. The |
Does not match |
Sequences such as 0,1, 0-3, or the first three
characters of 0x002E, which would be matched if the
expression was just |
^
When it is the first character in the expression, the caret character matches the beginning of a line.
When it is the first character in a character class enclosed in brackets, it matches all the characters that are not part of that class.
|
|
Matches |
|
Does not match |
|
$
When it is the last character in an expression, the dollar sign matches the end of a line.
^\w+:$
|
|
Matches |
A line that consists of a single word and ends with a colon: Questions: |
Does not match |
A line that consists of a single word, but does not end in a colon: Questions? |
.
Matches any single character.
c.t
|
||||
Matches |
Any combinations of three letters starting with “c” and ending with “t”: “ cat ”, “ cut ”, “ cot ”, or even nonsensical combinations such as “ czt ” or “ cqt ”. |
|||
Does not match |
Combinations containing three letters that start with “c” and ending with “t”, but are split across more than one line. What is the missing letter?
|
|
This character functions as an “OR” and matches either of the expressions that precede or follow it.
^An|^The
|
|||
Matches |
The initial “An” or “The” in phrases such as:
|
||
Does not match |
The initial “An” or “The” in phrases such as:
|
?
This character specifies that either zero or one instance of the preceding character or expression should be matched.
an?␣ (where “␣” represents a
single space).
|
|||
Matches |
Either the “a ” or the “an ” in:
It will also find the final “an ” of “Can ” in a sentence such as “Can I help you?”, or the final “a ” of “pasta ” in “We had pasta for lunch.” |
||
Does not match |
Neither the “a” nor the “an” in:
They are not followed by a space. |
*
This character specifies that zero or more instances of the preceding character or expression should be matched.
run\w*
|
|
Matches |
The word “run”, as well as “runs”, “runner”, “runway”, “runt” in “grunt” or “brunt”, and any other word or sequence of characters containing “run” followed by zero or more “ word characters ” (which include digits and the underscore, so the part before the “@” in an email address such as run_123@example.email.org is also a match). |
Does not match |
The complete phrase in “run-on” or
“run'n'gun”, because the hyphen and apostrophe are not
included in |
+
This character specifies that one or more instances of the preceding character or expression should be matched.
\d+.d
|
|
Matches |
Numbers such as “1.5”, “23.2” or “5235.8” with a single decimal place and any number of digits before the decimal point. |
Does not match |
The entire value of numbers such as “5,235.8” or “21,571.9”. Only the portion of the after the thousands separator will be matched. |
(
This character starts a
group
, which is a
set of characters treated as a single unit. Groups are numbered, and
their contents group are stored in memory. They can be reused later
in the search expression using
\
, where n
n
is
the number of the group.
The content of the group can also be used in the replacement text. Use
$
, where n
n
is the number of the group defined in the search.
Parentheses are always used in opening and closing pairs. Trying to use only the opening or closing parenthesis on its own will cause an error.
(\b\w+\b)\h\1\b
|
|
Matches |
Doubled up words separated by a space, such as the consecutive “an” in the following sentence: “I bought an an apple.” |
Does not match |
The “that, that” in the following sentence: “But that, that is just unbelievable”, because the first “that” is followed by both a comma and a space rather than only a space. |
)
This character closes a group. It is special because it can
never be used on its own. It must be preceded by the
\
if you need to match the closing parenthesis
character itself.
^\d+\)
|
||||
Matches |
The sequence number (including the parenthesis) at the beginning of each line in a list such as:
|
|||
Does not match |
Sequence numbers that are not at the beginning of a line. Follow these steps:
|
[
This character must be paired with the closing square bracket to enclose a set of individual characters that each represent a valid potential match.
Only the opening bracket is special and needs to be preceded by a backslash to search for the bracket character itself. If you only want to match the closing bracket as itself, you do not need to precede it with a backslash. (You can still add it, but it will have no effect on the expression or the result.)
li[cs]en[cs]e
|
|
Matches |
The correct “licence” and “license” spellings, as well as the potential “lisence” and “lisense” misspellings |
Does not match |
More egregious misspellings such as “licensse” or “lissense”. |
{
This character must be paired with the closing curly brace to encloses an exact number , minimum , maximum , or range specifying how many instances of the preceding character or group should be matched.
Only the opening brace is special and needs to be preceded by a backslash to search for the brace character itself. If you only want to match the closing brace as itself, you do not need to precede it with a backslash. (You can still add it, but it will have no effect on the expression or the result.)
\d{4}/\d{1,3}
|
|
Matches |
Codes such as “1234/5”, “1472/69”, or “9513/842” consisting of four digits, a forward slash, and one to three more digits. |
Does not match |
Codes such as “123/45”, “1472/6985”, or “95133/15746”. Caution: Although the last two codes above are not matched completely, the expression will return the “ 1472/698 ” portion of “1472/6985”, as well as the “ 5133/157 ” of “95133/15746”. |
This section presents various types of regular expression, ranging from the simple to the complex.
Remember that most
alphabetic
characters
preceded by a \
turn into an expression that represents
not the character itself, but its associated special
meaning
.
The simplest regular expression consist of a single character,
or combination of a \
and a character constituting a
unit with a single meaning.
Expression | Match |
---|---|
x
|
The character “x” itself Most characters match themselves. |
\t
|
The tab character, not the letter “t”. |
\n
|
The newline (line feed) character, not the letter “n”. |
\r
|
The carriage-return character, not the letter “r”. similarly, |
Ordinary OmegaT searches are case insensitive by default: they match both uppercase and lowercase characters, unless you choose to enable the Options option. Doing so makes the entire search expression case sensitive.
In contrast, Regular expressions are case sensitive by default. This means that a regular expression search for “OmegaT”, for example, will not match “omegat”. However, regular expressions also provide special modifiers to specify case sensitivity within the expression:
(?i)
Makes the part of the expression to the right of the modifier case insensitive.
(?-i)
Makes the part of the expression to the right of the modifier case sensitive.
You can take advantage of this to apply a fine degree of control to
case sensitivity in searches. Suppose, for example, that you want to find
instances of “OmegaT” and “omegat”, but not “OMEGAT”. You can do so with
the following expression:
(?i)o
(?-i)mega
(?i)t
, which represents a case insensitive “o” followed
by a case sensitive “mega”, followed by a case insensitive “t”.
Regular expressions allow you to create sets of characters—known as classes . Searches will match any of the characters in the set.
Classes are defined by enclosing the desired characters in square
brackets, and can be specified either by listing each individual character
to include, or by specifying a range of characters. For example, you could
create the [£€$]
class to find any of those three
currency symbols in the text, or [1-3] to find the number 1, 2 or 3.
Inside a class, only the backslash (\
), caret
(^
), closing bracket (]
) and
hyphen (-
) are special. The rest of the twelve
characters are normal, and do not have to be preceded by a backslash if
you want to search for those characters themselves.
You can search for any of the four class special characters as normal characters by preceding them with a backslash. You can also search for the caret, closing bracket, and hyphen as themselves by placing them at a position that does not trigger their special meaning: anywhere except right after the opening bracket for the caret, immediately after either opening bracket or the caret following it for the closing bracket, and either just after the opening bracket or just before the closing bracket for the hyphen.
Many frequently used sets have a shorthand form consisting of a
backslash followed by a letter of the alphabet. For example,
\d
is a shorthand for [0-9]
, which
matches any digit between 0 and 9. In many cases, the corresponding
uppercase later is used to negate the class: \D
matches
any character that is
not
a digit.
The table below provides various additional examples. These classes never represent only the actual letter used to form the shorthand.
Expression | Match |
---|---|
[abc]
|
The letter “a”, “b”, or “c”. A simple class consists of any number of characters
enclosed by |
[C-X]
|
A character in the range of letters from “C” through “X”. A range is defined by the first character in a series,
followed by a hyphen, followed by last character in the
series. Any number of ranges can be defined:
|
[^\n\r\t]
|
Any character except a newline, a carriage return, or tab. The caret placed immediately after the opening square bracket excludes the rest of the characters in the class. |
\w
|
A word character, generally defined as
|
\s
|
A whitespace character, including the space and tab characters, as well as line breaks.
|
\h and \v
|
Horizontal and vertical whitespace (generally preferred to
|
Regular expressions are not limited to alphanumeric characters. They cover the entire Unicode character set. Use Unicode blocks, scripts and categories to specify character classes outside the alphanumeric character range. A few examples are presented in the table below.
See also Unicode Regular Expressions for a thorough review of Unicode regular expressions.
Expression | Match |
---|---|
\p{InGreek}
|
A character in the Greek block (Unicode block)
|
\p{IsHan}
|
A logogram ( Han / kanji / hanja character) found in CJK languages (Unicode script) |
\p{Lu}
|
An uppercase letter (Unicode category) |
\p{Sc}
|
A currency symbol, which is also a Unicode category. |
Some expressions specify a position rather than a character. They indicate where in the text to look for the match, but do not include any characters in that match. The table below list a few of the more common examples. Consult the sites in the References section for more information.
Expression | Match |
---|---|
^
|
The beginning of a line |
$
|
The end of a line |
\b
|
A word boundary |
\B
|
Not a word boundary |
(?=u)
|
A character followed by a “u”. For example, |
(?!u)
|
A character that is not followed by the letter “u”. For example, |
(?<=q)
|
A character preceded by the letter “q”. For example, |
(?<!q)
|
A character that is not preceded by the letter “q”. For example, |
This section presents a few examples demonstrating how the various expressions described above can be combined to perform powerful searches in OmegaT.
Expression | Purpose |
---|---|
(\b\w+\b)\h\1\b
|
Find double words. |
,\h[\h(\w+\.\w+)\w,'ʼ"“”-]+[\.,]
|
Find clauses that start with a comma followed by a whitespace character, contain one or more words (including words in quotation marks, contractions, and filenames with a file extension), and end either with a comma or period. |
\. \h+$
|
Find extra whitespace after the period at the end of a line. |
\h+a\h+[aeiou]
|
Find words starting with a vowel that come after the article “a” rather than “an”. |
\h+an\h+[^aeiou]
|
The flip side of the preceding example. Find words starting with consonant that come after “an” rather than “a”. |
\d{4}([/\.-]\d{1,2}){2}
|
Find numerical dates in year, month, and day order with the month and day separated by a slash, period, or hyphen, such as:
NoteThis expression finds number and separator patterns matching possible dates, but does not validate them. It will also find patterns such as “5136/36/71”. |
\.[A-Z]
|
Find a period followed by an uppercase letter. Useful to find possible missing spaces between the period and the start of a new sentence |
\bis\b
|
Find “is” as a whole word in a sentence, without matching “this”, “isn’t”, or even “Is”. |
[\w\.-]+@[\w\.-]+
|
Find an email address. This simple expression may not cover every possible valid email address format. |
Although OmegaT does not offer fancy colouring for your regular expressions, you can get a lot of practice by using the Text Search window since OmegaT does colour the matching results.
A few additional resources are presented below.
The Java technical reference is useful as a canonical reference.
The official reference for regular expressions used in Java.
If you want to learn more about using regular expressions, the two following sites have proven very useful.
An online regular expression matcher that lets you enter the text you want to search and the regular expressions you want to test.
One of the most thorough regular expression tutorial and reference on the web.
OmegaT does not support either site in any way. If you find other interesting references—in any language—the OmegaT team would love to hear about them.
Glossaries are terminology files stored in the glossary folder.
All terms in a segment with a match in any of the glossaries will be displayed in the Glossaries pane.
Source terms can be multi-word expressions.
There are 2 kinds of glossary files:
Use C + S + G to enter new terms in this glossary. It is called the writable glossary for this reason.
Use to directly access it. You can then open it in a text editor and modify it.
You do not need to prepare the file in advance.
It will be created the first time you add an entry to the glossary.
If you choose to use an existing file as the default glossary, all new entries will be recorded in tab-separated format and saved in UTF-8 by default.
If you want to specify a different encoding, you can do so by adding a “magic” comment that takes the following form:
# -*- coding: <charset> -*- ,
|
where
<charset>
is typically one of
the sets listed in the IANA
Charset Registry.
They are terminology files in a format recognized by OmegaT. You cannot modify them from the OmegaT interface like the project glossary, but you can do so in a text editor.
Modifications made to any glossary are immediately recognized by OmegaT displayed in the Glossaries pane.
By default, each project contains a glossary
folder to store the writable glossary and any reference glossaries you want
to add to the project. See the
Glossary
files folder
project property for details.
All glossaries must be located in the glossary folder. Glossaries located in nested folders are also recognized.
Within that reference glossaries folder, you can create multiple terminology subfolders organized by topic, client, or any other category that suits your workflow.
Use the
Glossary
files folder
project
property to set the location of the reference glossaries folder. This folder
can be set outside the project, enabling you to use it, or one of the
specific subfolders, in other projects.
The writable project glossary is located in the glossary folder by default and called glossary.txt.
You can change its name and location in the
Writable
Glossary File
dialog, but you must give it a .txt
or
.utf8
extension, and store it within the glossary
folder or in one of its subfolders.
OmegaT glossary files are simple plain text files containing three-column lists, with the source term in the first column, an optional target term in the second column, and an optional comment in the third column.
Glossaries can be “tab-separated values” (TSV) or “comma-separated values” (CSV) files or can also use the TermBase eXchange (TBX 2) format.
A writable glossary created for the project by OmegaT will be a TSV file saved in UTF-8. User-created files that use only latin characters may be recognized and treated as ISO-8859-1 if it does not contain non-ASCII characters or other characters interpreted as UTF-8.
The encoding used to read reference glossaries depends on their file extension:
Format | Extension | Encoding |
---|---|---|
TSV |
.txt
|
UTF-8 |
TSV |
.utf8
|
UTF-8 |
TSV |
.tab
|
OS default encoding |
TSV |
.tsv
|
OS default encoding |
CSV |
.csv
|
UTF-8 |
TBX |
.tbx
|
UTF-8 |
Bidi control characters are available from . They can be used to:
Insert an invisible character with a strong directionality to force a specific position for a character with weak or neutral directionality.
Create a section of text that flows in the direction opposite that of the segment.
These control characters change directionality but are invisible. Use to show a visual indication of their position.
To change the position of a character with weak or neutral directionality (like punctuation symbols), insert an LRM or RLM character after the character, depending on the directionality of the segment:
Insert a LRM after a weak-directionality character that must run left-to-right in a right-to-left segment (e.g. an English excerpt inside Arabic text).
Insert a RLM after a weak-directionality character that must run right-to-left in a left-to-right segment (e.g. an Arabic excerpt inside English text).
Embeddings can be used to create a longer section of text (containing several words and spaces) that must flow in the direction opposite that of the segment. You can create two kinds of embeddings depending on the directionality of the segment:
To create a left-to-right embedding in a right-to-left segment, insert a left-to-right embedding (LRE) character, type or insert the left-to-right text, and then insert the pop directional formatting (PDF) character.
To create a right-to-left embedding in a left-to-right segment, insert a right-to-left embedding (RLE) character, type or insert the right-to-left text, and then insert the PDF character.
See the
Local
post-processing commands
projet property for project specific commands.
See the
Global
post-processing commands
preference for global commands.
The command is passed to Java runtime exec as a string with the
template values expanded. All the arguments should be quoted,
e.g. "${fileName}"
.
The following template variables are always available. The other items on the template list are environment variables for your system.
Variable name | Value |
---|---|
${projectName} | The name of the project directory |
${projectRoot} | Full path to the project folder |
${sourceRoot} | Full path to the source folder |
${targetRoot} | Full path to the target folder |
${glossaryRoot} | Full path to the glossary folder |
${tmRoot} | Full path to the TM root folder |
${tmAutoRoot} | Full path to the TM auto folder |
${dictRoot} | Full path to the dictionary folder |
${tmOtherLangRoot} | TM Root + tmx2source (See the Bridge two languages how-to for details.) |
${sourceLang} | Source language |
${targetLang} | Target language |
${filePath} | Full path to source file |
${fileShortPath} | Source file name relative to given root |
${fileName} | Full name of source file |
${fileNameOnly} | Name of source file without extension |
${fileExtension} | Extension of source file without a dot |
In addition to a regular command, you can call a script. Never run post-processing scripts from untrusted sources. For security reasons, local post-processing commands are disabled by default.
Template variables can be used with both regular commands and custom scripts. You may need to use an absolute path for your script. The PATH OmegaT uses may not be the same as the current user’s PATH.
STDOUT and STDERR are written to the omegat.log file. The exit code and STDERR or the last STDOUT will appear on the status bar.
You should use a shebang, e.g. #! /bin/bash
or
#! /usr/bin/env python3
. And the script must be
executable. Chaining commands with &&
or
||
or pipes |
will not work here.
xdg-open ${targetRoot}
open ${targetRoot}
Invoke-Item ${targetRoot}
The OmegaT interface generally does not rely on buttons to give access to its functions. Instead, they are called from the menus or, for the majority of functions, from their assigned default shortcut.
Learning the most frequent shortcuts will not take long once you start working with OmegaT. The shortcuts are indicated next to each menu item, allowing them to learn new shortcuts gradually as you use the software.
You can customize the majority of the shortcuts in OmegaT. See the Customization section for details.
OmegaT runs on any platform that runs a Java Runtime Environment (Windows, macOS, and Linux being the most mainstream). The modifier keys that form the shortcuts vary slightly depending between platforms. To make reading easier we have adopted the following convention for modifier keys:
Linux/Windows | Key identifier | macOS |
---|---|---|
Shift | S | shift or ⇧ |
Ctrl or Control | C | command or ⌘ |
Alt | A | alt / option or ⌥ |
Ctrl | control or ⌃ |
The Key identifiers above enable us to avoid listing multiple notations for every shortcut.
On Windows and Linux: Ctrl + Shift + N
On macOS: Shift + Command + N
In this manual: C + S + N
OmegaT assigns shortcuts to most of the functions available in the , , and menus and to a number of functions in the Editor pane. You can also add or modify the shortcuts for most of the functions.
To do so, you have to put the appropriate shortcut definition file in your OmegaT configuration folder. See the Configuration Folder appendix for details.
There are two shortcut definition files.
The shortcut definition file for the menus and a few other items.
The shortcut definition file for the editor.
OmegaT must be restarted after a shortcut definition file has been modified for the new shortcuts to take effect.
You can copy the default OmegaT shortcut files from the OmegaT development site on Sourceforge to your configuration folder and modify them to suit your needs:
The macOS files must be renamed MainMenuShortcuts.properties and EditorShortcuts.properties for OmegaT to recognize them.
The next section describes the syntax used in the shortcut definition files, and provides an example modification.
The basic syntax of the shortcut definitions files is simply:
function code=shortcut
Use the tables in the Lists of functions and codes section below to find the values for
function code
.
The shortcut
represents the key combination pressed by
the user. It takes the following form:
0 or more modifier
followed by 0 or 1 event
followed by 1 key
where modifier
can be: shift
,
ctrl
, meta
,
alt
, or altGraph
meta
refers to the key with the Windows logo
on most keyboards for Windows or Linux systems, and to the
command
on macOS.
altGraph
refers to the
Alt
key to the right of the spacebar on keyboards
with two
Alt
keys.
event
can be: typed
,
pressed
, released
and key
can be any key available on your
keyboard. You can refer to the table
presenting the different editor shortcuts to find the values for
keys such as Home
, Page Up
, or the
arrow keys.
Empty lines and comments can be added to organize the list and make it
easier to read. A comment line starts with a #
, and everything
after that is ignored by the application.
The easiest way to modify the shortcuts is to download copies of the default files to your configuration folder, as noted above, and make the changes you want there.
The default shortcut for closing a project is defined on Windows and Linux as:
projectCloseMenuItem=ctrl shift W
and on macOS as:
projectCloseMenuItem=meta shift W
However, you may want to remove the S key from the shortcut to make it only Ctrl + W (or Command + W on macOS) to match the shortcut you use in other applications.
To do so, modify the
MainMenuShortcuts.properties
as follows for Windows
or Linux:
projectCloseMenuItem=ctrl W
or as follows for macOS:
projectCloseMenuItem=meta W
If your language pair calls for the frequent use of alternative translations, you may want to assign a shortcut to that function since it does not have one by default.
The steps below demonstrate how to assign the Alt + X shortcut to the menu item.
Open the MainMenuShortcuts.properties
you
have copied to your configuration
folder in a text editor.
As shown in the Edit menu
table below, the function code for the function is
editMultipleAlternate
.
Searching for that code in the file will bring you to the following line:
# editMultipleAlternate=
The line is currently a comment. Delete the #
at
the beginning of the line so OmegaT will recognize the shortcut, and
add alt X
after the =
sign at the end
of the line:
editMultipleAlternate=alt X
Save and close the file. The next time you start OmegaT, your new shortcut should be active and displayed next to the name of the function in the menu.
Save the file after you have finished making your changes. If OmegaT is open, you will have to restart it for your changes to take effect.
Your modified or added shortcuts should now be displayed next to the menu items you have changed. They will now be available in OmegaT as long as there are no conflicts with other functions or with system-wide shortcuts.
The next section presents tables with the function codes and corresponding default shortcut for each menu or editor function in OmegaT.
The shortcuts that can be modified in the
EditorShortcuts.properties
file, along with their
default values, are presented in the table below
Function | Function code | Windows/Linux | macOS |
---|---|---|---|
Open Context Menu | editorContextMenu | CONTEXT_MENU | shift ESCAPE |
Go to Next Segment | editorNextSegment | TAB | |
Go to Previous Segment | editorPrevSegment | shift TAB | |
Go to Next Segment (not TAB) | editorNextSegmentNotTab | ENTER | |
Go to Previous Segment (not TAB) | editorPrevSegmentNotTab | ctrl ENTER | meta ENTER |
Insert Linebreak | editorInsertLineBreak | shift ENTER | |
Select All | editorSelectAll | ctrl A | meta A |
Delete Previous Token | editorDeletePrevToken | ctrl BACK_SPACE | alt BACK_SPACE |
Delete Next Token | editorDeleteNextToken | ctrl DELETE | alt DELETE |
Go to First Segment | editorFirstSegment | ctrl PAGE_UP | meta PAGE_UP |
Go to Last Segment | editorLastSegment | ctrl PAGE_DOWN | meta PAGE_DOWN |
Skip Next Token | editorSkipNextToken | ctrl RIGHT | alt RIGHT |
Skip Previous Token | editorSkipPrevToken | ctrl LEFT | alt LEFT |
Skip Next Token with Selection | editorSkipNextTokenWithSelection | ctrl shift RIGHT | alt shift RIGHT |
Skip Previous Token with Selection | editorSkipPrevTokenWithSelection | ctrl shift LEFT | alt shift LEFT |
Toggle Cursor Lock | editorToggleCursorLock | F2 | |
Toggle Overtype | editorToggleOvertype | INSERT | F3 |
Function | Function code | Windows/Linux | macOS |
---|---|---|---|
Open Autocompleter | autocompleterTrigger | ctrl SPACE | ESCAPE |
Open Autocompleter Next View | autocompleterNextView | ctrl SPACE | ctrl DOWN |
Open Autocompleter Previous View | autocompleterPrevView | ctrl shift SPACE | ctrl UP |
Confirm and Close Autocompleter | autocompleterConfirmAndClose | ENTER | |
Confirm Autocompleter Without Closing | autocompleterConfirmWithoutClose | INSERT | |
Close Autocompleter | autocompleterClose | ESCAPE | |
Go Up The List | autocompleterListUp | UP | |
Go Down The List | autocompleterListDown | DOWN | |
Go Up One Page | autocompleterListPageUp | PAGE_UP | |
Go Down One Page | autocompleterListPageDown | PAGE_DOWN | |
Go Up in Table | autocompleterTableUp | UP | |
Go Down in Table | autocompleterTableDown | DOWN | |
Go Left in Table | autocompleterTableLeft | LEFT | |
Go Right in Table | autocompleterTableRight | RIGHT | |
Go Up One Page in Table | autocompleterTablePageUp | PAGE_UP | |
Go Down One Page in Table | autocompleterTablePageDown | PAGE_DOWN | |
Go to First Table | autocompleterTableFirst | ctrl HOME | meta HOME |
Go to Last Table | autocompleterTableLast | ctrl END | meta END |
Go to First Row in Table | autocompleterTableFirstInRow | HOME | |
Go to Last Row in Table | autocompleterTableLastInRow | END |
The configuration folder stores the majority of the OmegaT options and preferences for the user.
The location of the default configuration folder varies by system (the ~ character represents your home folder):
~/.omegat
~/Library/Preferences/OmegaT
~\AppData\Roaming\OmegaT
You can specify a configuration folder other than the default when you start OmegaT from the command line. See the Command line launch how-to for details.
Modified preferences are stored in the configuration folder used by the project. If you do not use the default configuration folder, all modifications made in the preferences will be stored in the specified configuration folder and will not appear when you resume work with the default configuration folder.
This file includes a number of important user preferences.
Some preferences do not have an equivalent in the user interface. They must be modified manually.
To prevent the Source Files list window from automatically opening when a project is loaded, find <project_files_show_on_load> and replace true to false:
<project_files_show_on_load>false</project_files_show_on_load>
Only this preference currently requires manual modification.
This file describes the overall OmegaT layout.
This folder contains a number of log files. The most current is
OmegaT.log
.
These files record various internal state and program event messages generated while OmegaT is running. If OmegaT behaves erratically, include this file, or the relevant part thereof, to your report.
If the applicable functions are used, this folder can contain up to three text files:
This file stores the currently selected text when C + S + C is used. The text in the file is replaced each time this function is called.
This file contains the
original text
from of the current segment when the
Export
the segment to a text file
preference is enabled. The text in the file is replaced each
time a new segment is entered.
This file contains the
translated
text
from the current segment when the
Export
the segment to a text file
preference is enabled. The text in the file is replaced each
time a new segment is entered.
Those three files provide as a simple way to access some OmegaT content and process it with local programs such as shell scripts.
This parameter file contains customized editor shortcuts. See the Customization appendix for details.
This parameter file contains customized user interface shortcuts. See the Customization appendix for details.
This parameter file contains customized file filters. See the Global File Filters preferences for details.
This parameter file contains customized external search parameters. See the Global External Searches preferences for details.
This parameter file contains customized autotext parameters. See the Auto-Completion preferences for details.
This file contains the login information for your team project repositories.
The file contents are not encrypted.
See the Set up a team project how-to for details.
This parameter file contains customized segmentation parameters. See the Global Segmentation Rules preferences for details.
This folder provides the standard location for manually installed OmegaT extension plugins. See the Plugins preference for details.
It is also possible to install plugins in the application plugins/ folder.
This folder contains your spelling dictionaries. See the Spellchecker preferences for details.
The application folder contains the OmegaT.jar
application and a number of other important files.
The application folder location depends on your platform and on the way you have installed OmegaT. The recommended or default locations are the following:
C:\Program Files\OmegaT\
/opt/omegat/
/Applications/OmegaT.app/Contents/Java/
/opt/omegat/
OmegaT's distribution license. The GPL version 3.
OmegaT's executable. Used when launching OmegaT from the command line. See the Command line launch chapter for details.
The list of improvements and bug fixes. Check it if you need information on OmegaT's evolution.
The documentation distribution license. The GPL version 3.
The index to the multilingual user manual.
A link to the support page.
Simple installation and running instructions.
The documentation folder.
The list of individual contributors.
Each translated user manual comes in a different language folder.
The index to the multilingual user manual.
The list of libraries used by OmegaT.
The folder where images used by OmegaT are stored.
The folder where the libraries used by OmegaT are stored.
The folder where you can install external plugins. The prefered location for plugin manual installs is the plugins/ folder located in the configuration folder. See the Plugins preferences for details.
The folder where distributed scripts are located. See the Scripting window for details.