The GNU gettext
toolset helps programmers and
translators at producing, updating and using translation files,
mainly those PO files which are textual, editable files. This
chapter stresses the format of PO files, and contains a PO mode
starter. PO mode description is spread throughout this manual
instead of being concentrated in one place. Here we present only
the basics of PO mode.
gettext
Installation
Once you have received, unpacked, configured and compiled the GNU
gettext
distribution, the `make
install´ command puts in place the programs
xgettext
, msgfmt
, gettext
,
and msgmerge
, as well as their available message
catalogs. To top off a comfortable installation, you might also
want to make the PO mode available to your Emacs users.
During the installation of the PO mode, you might want to modify your file `.emacs´, once and for all, so it contains a few lines looking like:
(setq auto-mode-alist (cons '("\\.po\\'\\|\\.po\\." . po-mode) auto-mode-alist)) (autoload 'po-mode "po-mode" "Major mode for translators to edit PO files" t)
Later, whenever you edit some `.po´ file, or any file having the string `.po.´ within its name, Emacs loads `po-mode.elc´ (or `po-mode.el´) as needed, and automatically activates PO mode commands for the associated buffer. The string PO appears in the mode line for any buffer for which PO mode is active. Many PO files may be active at once in a single Emacs session.
If you are using Emacs version 20 or newer, and have already installed the appropriate international fonts on your system, you may also tell Emacs how to determine automatically the coding system of every PO file. This will often (but not always) cause the necessary fonts to be loaded and used for displaying the translations on your Emacs screen. For this to happen, add the lines:
(modify-coding-system-alist 'file "\\.po\\'\\|\\.po\\." 'po-find-file-coding-system) (autoload 'po-find-file-coding-system "po-mode")
to your `.emacs´ file. If, with this, you still see boxes instead of international characters, try a different font set (via Shift Mouse button 1).
A PO file is made up of many entries, each entry holding the relation between an original untranslated string and its corresponding translation. All entries in a given PO file usually pertain to a single project, and all translations are expressed in a single target language. One PO file entry has the following schematic structure:
white-space # translator-comments #. automatic-comments #: reference... #, flag... msgid untranslated-string msgstr translated-string
The general structure of a PO file should be well understood by the translator. When using PO mode, very little has to be known about the format details, as PO mode takes care of them for her.
A simple entry can look like this:
#: lib/error.c:116 msgid "Unknown system error" msgstr "Error desconegut del sistema"
Entries begin with some optional white space. Usually, when
generated through GNU gettext
tools, there is exactly
one blank line between entries. Then comments follow, on lines all
starting with the character #
. There are two kinds of
comments: those which have some white space immediately following
the #
, which comments are created and maintained
exclusively by the translator, and those which have some non-white
character just after the #
, which comments are created
and maintained automatically by GNU gettext
tools. All
comments, of either kind, are optional.
After white space and comments, entries show two strings, namely
first the untranslated string as it appears in the original program
sources, and then, the translation of this string. The original
string is introduced by the keyword msgid
, and the
translation, by msgstr
. The two strings, untranslated
and translated, are quoted in various ways in the PO file, using
"
delimiters and \
escapes, but the
translator does not really have to pay attention to the precise
quoting format, as PO mode fully takes care of quoting for her.
The msgid
strings, as well as automatic comments,
are produced and managed by other GNU gettext
tools,
and PO mode does not provide means for the translator to alter
these. The most she can do is merely deleting them, and only by
deleting the whole entry. On the other hand, the
msgstr
string, as well as translator comments, are
really meant for the translator, and PO mode gives her the full
control she needs.
The comment lines beginning with #,
are special
because they are not completely ignored by the programs as comments
generally are. The comma separated list of flags is used
by the msgfmt
program to give the user some better
diagnostic messages. Currently there are two forms of flags
defined:
fuzzy
msgmerge
program or it can be inserted by the
translator herself. It shows that the msgstr
string
might not be a correct translation (anymore). Only the translator
can judge if the translation requires further modification, or is
acceptable as is. Once satisfied with the translation, she then
removes this fuzzy
attribute. The
msgmerge
program inserts this when it combined the
msgid
and msgstr
entries after fuzzy
search only. See section 6.3 Fuzzy
Entries.c-format
no-c-format
xgettext
program adds
them. In an automated PO file processing system as proposed here
the user changes would be thrown away again as soon as the
xgettext
program generates a new template file. The
c-format
flag tells that the untranslated string and
the translation are supposed to be C format strings. The
no-c-format
flag tells that they are not C format
strings, even though the untranslated string happens to look like a
C format string (with `%´ directives). In case
the c-format
flag is given for a string the
msgfmt
does some more tests to check to validity of
the translation. See section 8.1
Invoking the msgfmt
Program, section 3.5 Special Comments preceding
Keywords and section 13.3.1 C
Format Strings.python-format
no-python-format
lisp-format
no-lisp-format
elisp-format
no-elisp-format
librep-format
no-librep-format
smalltalk-format
no-smalltalk-format
java-format
no-java-format
awk-format
no-awk-format
object-pascal-format
no-object-pascal-format
ycp-format
no-ycp-format
tcl-format
no-tcl-format
php-format
no-php-format
A different kind of entries is used for translations which involve plural forms.
white-space # translator-comments #. automatic-comments #: reference... #, flag... msgid untranslated-string-singular msgid_plural untranslated-string-plural msgstr[0] translated-string-case-0 ... msgstr[N] translated-string-case-n
Such an entry can look like this:
#: src/msgcmp.c:338 src/po-lex.c:699 #, c-format msgid "found %d fatal error" msgid_plural "found %d fatal errors" msgstr[0] "s'ha trobat %d error fatal" msgstr[1] "s'han trobat %d errors fatals"
It happens that some lines, usually whitespace or comments, follow the very last entry of a PO file. Such lines are not part of any entry, and PO mode is unable to take action on those lines. By using the PO mode function M-x po-normalize, the translator may get rid of those spurious lines. See section 2.5 Normalizing Strings in Entries.
The remainder of this section may be safely skipped by those using PO mode, yet it may be interesting for everybody to have a better idea of the precise format of a PO file. On the other hand, those not having Emacs handy should carefully continue reading on.
Each of untranslated-string and translated-string respects the C syntax for a character string, including the surrounding quotes and embedded backslashed escape sequences. When the time comes to write multi-line strings, one should not use escaped newlines. Instead, a closing quote should follow the last character on the line to be continued, and an opening quote should resume the string at the beginning of the following PO file line. For example:
msgid "" "Here is an example of how one might continue a very long string\n" "for the common case the string represents multi-line output.\n"
In this example, the empty string is used on the first line, to
allow better alignment of the H
from the word
`Here´ over the f
from the word
`for´. In this example, the msgid
keyword is followed by three strings, which are meant to be
concatenated. Concatenating the empty string does not change the
resulting overall string, but it is a way for us to comply with the
necessity of msgid
to be followed by a string on the
same line, while keeping the multi-line presentation
left-justified, as we find this to be a cleaner disposition. The
empty string could have been omitted, but only if the string
starting with `Here´ was promoted on the first
line, right after msgid
.(2) It was not really necessary
either to switch between the two last quoted strings immediately
after the newline `\n´, the switch could have
occurred after any other character, we just did it this
way because it is neater.
One should carefully distinguish between end of lines marked as `\n´ inside quotes, which are part of the represented string, and end of lines in the PO file itself, outside string quotes, which have no incidence on the represented string.
Outside strings, white lines and
comments may be used freely. Comments start at the beginning of a
line with `#´ and extend until the end of the PO
file line. Comments written by translators should have the initial
`#´ immediately followed by some white space. If
the `#´ is not immediately followed by white
space, this comment is most likely generated and managed by
specialized GNU tools, and might disappear or be replaced
unexpectedly when the PO file is given to
msgmerge
.
After setting up Emacs with something similar to the lines in
section 2.1 Completing GNU
gettext
Installation, PO mode is activated for a
window when Emacs finds a PO file in that window. This puts the
window read-only and establishes a po-mode-map, which is a genuine
Emacs mode, in a way that is not derived from text mode in any way.
Functions found on po-mode-hook
, if any, will be
executed.
When PO mode is active in a window, the letters `PO´ appear in the mode line for that window. The mode line also displays how many entries of each kind are held in the PO file. For example, the string `132t+3f+10u+2o´ would tell the translator that the PO mode contains 132 translated entries (see section 6.2 Translated Entries, 3 fuzzy entries (see section 6.3 Fuzzy Entries), 10 untranslated entries (see section 6.4 Untranslated Entries) and 2 obsolete entries (see section 6.5 Obsolete Entries). Zero-coefficients items are not shown. So, in this example, if the fuzzy entries were unfuzzied, the untranslated entries were translated and the obsolete entries were deleted, the mode line would merely display `145t´ for the counters.
The main PO commands are those which do not fit into the other categories of subsequent sections. These allow for quitting PO mode or for managing windows in special ways.
po-undo
).po-quit
).po-confirm-and-quit
).po-other-window
).po-help
).po-statistics
).po-validate
).
The command _ (po-undo
) interfaces to the
Emacs undo facility. See section `Undoing Changes' in
The Emacs Editor. Each time U is typed,
modifications which the translator did to the PO file are undone a
little more. For the purpose of undoing, each PO mode command is
atomic. This is especially true for the RET
command: the whole edition made by using a single use of this
command is undone at once, even if the edition itself implied
several actions. However, while in the editing window, one can undo
the edition work quite parsimoniously.
The
commands Q (po-quit
) and q
(po-confirm-and-quit
) are used when the translator is
done with the PO file. The former is a bit less verbose than the
latter. If the file has been modified, it is saved to disk first.
In both cases, and prior to all this, the commands check if any
untranslated messages remain in the PO file and, if so, the
translator is asked if she really wants to leave off working with
this PO file. This is the preferred way of getting rid of an Emacs
PO file buffer. Merely killing it through the usual command
C-x k (kill-buffer
) is not the tidiest way
to proceed.
The command 0 (po-other-window
) is another,
softer way, to leave PO mode, temporarily. It just moves the cursor
to some other Emacs window, and pops one if necessary. For example,
if the translator just got PO mode to show some source context in
some other, she might discover some apparent bug in the program
source that needs correction. This command allows the translator to
change sex, become a programmer, and have the cursor right into the
window containing the program she (or rather he) wants to
modify. By later getting the cursor back in the PO file window, or
by asking Emacs to edit this file once again, PO mode is then
recovered.
The command h
(po-help
) displays a summary of all available PO mode
commands. The translator should then type any character to resume
normal PO mode operations. The command ? has the same
effect as h.
The command = (po-statistics
) computes the
total number of entries in the PO file, the ordinal of the current
entry (counted from 1), the number of untranslated entries, the
number of obsolete entries, and displays all these numbers.
The command V
(po-validate
) launches msgfmt
in checking
and verbose mode over the current PO file. This command first
offers to save the current PO file on disk. The msgfmt
tool, from GNU gettext
, has the purpose of creating a
MO file out of a PO file, and PO mode uses the features of this
program for checking the overall format of a PO file, as well as
all individual entries.
The program
msgfmt
runs asynchronously with Emacs, so the
translator regains control immediately while her PO file is being
studied. Error output is collected in the Emacs
`*compilation*´ buffer, displayed in another
window. The regular Emacs command C-x`
(next-error
), as well as other usual compile commands,
allow the translator to reposition quickly to the offending parts
of the PO file. Once the cursor is on the line in error, the
translator may decide on any PO mode action which would help
correcting the error.
The cursor in a PO file window is almost always part of an entry. The only exceptions are the special case when the cursor is after the last entry in the file, or when the PO file is empty. The entry where the cursor is found to be is said to be the current entry. Many PO mode commands operate on the current entry, so moving the cursor does more than allowing the translator to browse the PO file, this also selects on which entry commands operate.
Some PO mode commands alter the position of the cursor in a specialized way. A few of those special purpose positioning are described here, the others are described in following sections (for a complete list try C-h m):
po-current-entry
).po-next-entry
).po-previous-entry
).po-first-entry
).po-last-entry
).po-push-location
).po-pop-location
).po-exchange-location
). Any Emacs command able to reposition the cursor
may be used to select the current entry in PO mode, including
commands which move by characters, lines, paragraphs, screens or
pages, and search commands. However, there is a kind of standard
way to display the current entry in PO mode, which usual Emacs
commands moving the cursor do not especially try to enforce. The
command . (po-current-entry
) has the sole
purpose of redisplaying the current entry properly, after the
current entry has been changed by means external to PO mode, or the
Emacs screen otherwise altered.
It is yet to be decided if PO mode helps the translator, or otherwise irritates her, by forcing a rigid window disposition while she is doing her work. We originally had quite precise ideas about how windows should behave, but on the other hand, anyone used to Emacs is often happy to keep full control. Maybe a fixed window disposition might be offered as a PO mode option that the translator might activate or deactivate at will, so it could be offered on an experimental basis. If nobody feels a real need for using it, or a compulsion for writing it, we should drop this whole idea. The incentive for doing it should come from translators rather than programmers, as opinions from an experienced translator are surely more worth to me than opinions from programmers thinking about how others should do translation.
The commands n
(po-next-entry
) and p
(po-previous-entry
) move the cursor the entry
following, or preceding, the current one. If n is given
while the cursor is on the last entry of the PO file, or if
p is given while the cursor is on the first entry, no
move is done.
The commands <
(po-first-entry
) and >
(po-last-entry
) move the cursor to the first entry, or
last entry, of the PO file. When the cursor is located past the
last entry in a PO file, most PO mode commands will return an error
saying `After last entry´. Moreover, the commands
< and > have the special property of
being able to work even when the cursor is not into some PO file
entry, and one may use them for nicely correcting this situation.
But even these commands will fail on a truly empty PO file. There
are development plans for the PO mode for it to interactively fill
an empty PO file from sources. See section 3.4 Marking Translatable
Strings.
The translator may decide, before working at the translation of a particular entry, that she needs to browse the remainder of the PO file, maybe for finding the terminology or phraseology used in related entries. She can of course use the standard Emacs idioms for saving the current cursor location in some register, and use that register for getting back, or else, use the location ring.
PO mode offers another approach, by which cursor
locations may be saved onto a special stack. The command
m (po-push-location
) merely adds the
location of current entry to the stack, pushing the already saved
locations under the new one. The command r
(po-pop-location
) consumes the top stack element and
repositions the cursor to the entry associated with that top
element. This position is then lost, for the next r will
move the cursor to the previously saved location, and so on until
no locations remain on the stack.
If the translator wants the position to be kept on the location stack, maybe for taking a look at the entry associated with the top element, then go elsewhere with the intent of getting back later, she ought to use m immediately after r.
The command x
(po-exchange-location
) simultaneously repositions the
cursor to the entry associated with the top element of the stack of
saved locations, and replaces that top element with the location of
the current entry before the move. Consequently, repeating the
x command toggles alternatively between two entries. For
achieving this, the translator will position the cursor on the
first entry, use m, then position to the second entry,
and merely use x for making the switch.
There are many different ways for encoding a particular string
into a PO file entry, because there are so many different ways to
split and quote multi-line strings, and even, to represent special
characters by backslashed escaped sequences. Some features of PO
mode rely on the ability for PO mode to scan an already existing PO
file for a particular string encoded into the msgid
field of some entry. Even if PO mode has internally all the
built-in machinery for implementing this recognition easily, doing
it fast is technically difficult. To facilitate a solution to this
efficiency problem, we decided on a canonical representation for
strings.
A conventional representation of strings in a PO file is
currently under discussion, and PO mode experiments with a
canonical representation. Having both xgettext
and PO
mode converging towards a uniform way of representing equivalent
strings would be useful, as the internal normalization needed by PO
mode could be automatically satisfied when using
xgettext
from GNU gettext
. An explicit PO
mode normalization should then be only necessary for PO files
imported from elsewhere, or for when the convention itself
evolves.
So, for achieving normalization of at least the strings of a given PO file needing a canonical representation, the following PO mode command is available:
The special command M-x po-normalize, which has no
associated keys, revises all entries, ensuring that strings of both
original and translated entries use uniform internal quoting in the
PO file. It also removes any crumb after the last entry. This
command may be useful for PO files freshly imported from elsewhere,
or if we ever improve on the canonical quoting format we use. This
canonical format is not only meant for getting cleaner PO files,
but also for greatly speeding up msgid
string lookup
for some other PO mode commands.
M-x po-normalize presently makes three passes over
the entries. The first implements heuristics for converting PO
files for GNU gettext
0.6 and earlier, in which
msgid
and msgstr
fields were using
K&R style C string syntax for multi-line strings. These
heuristics may fail for comments not related to obsolete entries
and ending with a backslash; they also depend on subsequent passes
for finalizing the proper commenting of continued lines for
obsolete entries. This first pass might disappear once all oldish
PO files would have been adjusted. The second and third pass
normalize all msgid
and msgstr
strings
respectively. They also clean out those trailing backslashes used
by XView's msgfmt
for continued lines.
Having such an explicit
normalizing command allows for importing PO files from other
sources, but also eases the evolution of the current convention,
evolution driven mostly by aesthetic concerns, as of now. It is
easy to make suggested adjustments at a later time, as the
normalizing command and eventually, other GNU gettext
tools should greatly automate conformance. A description of the
canonical string format is given below, for the particular benefit
of those not having Emacs handy, and who would nevertheless want to
handcraft their PO files in nice ways.
Right now, in PO mode, strings are single line or multi-line. A string goes multi-line if and only if it has embedded newlines, that is, if it matches `[^\n]\n+[^\n]´. So, we would have:
msgstr "\n\nHello, world!\n\n\n"
but, replacing the space by a newline, this becomes:
msgstr "" "\n" "\n" "Hello,\n" "world!\n" "\n" "\n"
We are deliberately using a caricatural example, here, to make the point clearer. Usually, multi-lines are not that bad looking. It is probable that we will implement the following suggestion. We might lump together all initial newlines into the empty string, and also all newlines introducing empty lines (that is, for n > 1, the n-1'th last newlines would go together on a separate string), so making the previous example appear:
msgstr "\n\n" "Hello,\n" "world!\n" "\n\n"
There are a few yet undecided little points about string normalization, to be documented in this manual, once these questions settle.
Go to the first, previous, next, last section, table of contents.