OmegaT 4.3.1 - User's Guide

Appendix B. Glossaries

Glossaries are terminology files. They can be created and updated manually or imported from other projects.

A given project can have any number of reference glossaries, but only one glossary, the project default glossary, will be directly writable from the OmegaT user interface.

Regardless of the number of glossaries in a project, any term in a segment that has a match in a glossary will be displayed in the Glossary pane.

Default glossary

The default writable glossary is located in the the glossary project folder and is called glossary.txt.

Its name and location can be modified in the project properties dialog but its extension must be .txt or .utf8 and its location must be contained in the glossary project folder.

The file does not need to exist when setting it, it will be created when adding the first glossary entry. If the file already exists, no attempt is done to verify the format or the character set of the file: the new entries will always be in tab-separated format and the file will be saved in UTF-8 encoding.

Usage

To use an existing glossary, simply place it in the glossary folder after creating the project. OmegaT automatically detects glossary files in this folder when a project is opened.

To add a new term to the writable glossary use Edit > Create Glossary Entry ( Ctrl+Shift+G ). Newly added terms will be immediately recognized. To add new terms to reference glossaries, edit them in an external text editor. Newly added terms will be recognized as soon as the changes have been saved.

The source term can be a multi-word term.

The glossary function uses stemming to find matches. Deactivate Use stemming for glossary entries in the OmegaT global preferences to only find exact matches for a term.

The source term is displayed before the " = " sign and the target terms after. Comments are displayed after a number and each on a separate line. Terms from the project writable glossary are displayed in bold face. Terms from the reference glossaries are displayed in normal face.

To display the Autocompletion contextual menu for the glossary matches, hit the OS dependant key ( Escape for macOS, Ctrl+Space for the other platforms).

To underline matching terms in the source part of the segment use View > Mark Glossary Matches. Right-click on the underlined term and select a target term to insert it at the cursor location in the target part of the segment.

File format

OmegaT glossary files are simple plain text three-column lists with the source term in the first column, an optional target term in the second column and an optional comment in the third column.

Plain text glossaries can be "tab separated values" (TSV) files or "comma separated values" (CSV) files. A third possible format is the "TBX" (TermBase eXchange) ISO standard.

The project default writable glossary is always a TSV file saved in UTF-8 encoding.

Glossaries are read in an encoding that depends on the file extension:

Table B.1. Format, extensions and expected encoding
Format Extension Encoding
TSV .txt UTF-8
TSV .utf8 UTF-8
TSV .tab OS default encoding
TSV .tsv OS default encoding
CSV .csv UTF-8
TBX .tbx UTF-8

Glossaries must be located in the glossary project folder. Glossaries located in nested folders are also recognized.

Common glossary problems

Problem: No glossary terms are displayed - possible causes:

  • No glossary file found in the glossary folder.

  • The glossary file is empty.

  • The items are not separated with a TAB character.

  • The glossary file does not have the correct extension (.tab, .utf8 or .txt).

  • There is no EXACT match between the glossary entry and the source text in your document - for instance plurals.

  • The glossary file does not have the correct encoding.

  • There are no terms in the current segment which match any terms in the glossary.

  • One or more of the above problems may have been fixed, but the project has not been reloaded.

Problem: In the glossary pane, some characters are not displayed properly

  • ...but the same characters are displayed properly in the Editing pane: the extension and the file encoding do not match.