While the presentation of gettext
focuses mostly on
C and implicitly applies to C++ as well, its scope is far broader
than that: Many programming languages, scripting languages and
other textual data like GUI resources or package descriptions can
make use of the gettext approach.
All programming and scripting languages that have the notion of
strings are eligible to supporting gettext
. Supporting
gettext
means the following:
gettext
would do, but a shorthand syntax helps keeping the legibility of
internationalized programs. For example, in C we use the syntax
_("string")
, in bash we use the syntax
$"string"
, and in GNU awk we use the shorthand
_"string"
.gettext
function, or
performs equivalent processing.ngettext
,
dcgettext
, dcngettext
available from
within the language. These functions are less often used, but are
nevertheless necessary for particular purposes:
ngettext
for correct plural handling, and
dcgettext
and dcngettext
for obeying
other locale environment variables than LC_MESSAGES
,
such as LC_TIME
or LC_MONETARY
. For these
latter functions, you need to make the LC_*
constants,
available in the C header <locale.h>
,
referenceable from within the language, usually either as
enumeration values or as strings.textdomain
function available
from within the language, or by introducing a magic variable called
TEXTDOMAIN
. Similarly, you should allow the programmer
to designate where to search for message catalogs, by providing
access to the bindtextdomain
function.setlocale (LC_ALL, "")
call during the startup of your language runtime, or allow the
programmer to do so. Remember that gettext will act as a no-op if
the LC_MESSAGES
and LC_CTYPE
locale
facets are not both set.xgettext
program is being extended to support very different programming
languages. Please contact the GNU gettext
maintainers
to help them doing this. If the string extractor is best integrated
into your language's parser, GNU xgettext
can function
as a front end to your string extractor.gettext
, but the programs
should be portable across implementations, you should provide a
no-i18n emulation, that makes the other implementations accept
programs written for yours, without actually translating the
strings.gettext
maintainers, so
they can add support for your language to
`po-mode.el´.On the implementation side, three approaches are possible, with different effects on portability and copyright:
gettext
's
`intl/´ directory in your package, as described in
section 12 The Maintainer's
View. This allows you to have internationalization on all kinds
of platforms. Note that when you then distribute your package, it
legally falls under the GNU General Public License, and the GNU
project will be glad about your contribution to the Free Software
pool.gettext
functions if they
are found in the C library. For example, an autoconf test for
gettext()
and ngettext()
will detect this
situation. For the moment, this test will succeed on GNU systems
and not on other platforms. No severe copyright restrictions
apply.gettext
functionality. This has the advantage of full portability and no
copyright restrictions, but also the drawback that you have to
reimplement the GNU gettext
features (such as the
LANGUAGE
environment variable, the locale aliases
database, the automatic charset conversion, and plural
handling).For the programmer, the general procedure is the same as for the
C language. The Emacs PO mode supports other languages, and the GNU
xgettext
string extractor recognizes other languages
based on the file extension or a command-line option. In some
languages, setlocale
is not needed because it is
already performed by the underlying language runtime.
The translator works exactly as in the C language case. The only difference is that when translating format strings, she has to be aware of the language's particular syntax for positional arguments in format strings.
C format strings are described in POSIX (IEEE P1003.1 2001), section XSH 3 fprintf(), http://www.opengroup.org/onlinepubs/007904975/functions/fprintf.html. See also the fprintf(3) manual page, http://www.linuxvalley.it/encyclopedia/ldp/manpage/man3/printf.3.php, http://informatik.fh-wuerzburg.de/student/i510/man/printf.html.
Python format strings are described in Python Library reference / 2. Built-in Types, Exceptions and Functions / 2.2. Built-in Types / 2.2.6. Sequence Types / 2.2.6.2. String Formatting Operations. http://www.python.org/doc/2.2.1/lib/typesseq-strings.html.
Lisp format strings are described in the Common Lisp HyperSpec, chapter 22.3 Formatted Output, http://www.lisp.org/HyperSpec/Body/sec_22-3.html.
Emacs Lisp format strings are documented in the Emacs Lisp reference, section Formatting Strings, http://www.gnu.org/manual/elisp-manual-21-2.8/html_chapter/elisp_4.html#SEC75. Note that as of version 21, XEmacs supports numbered argument specifications in format strings while FSF Emacs doesn't.
librep format strings are documented in the librep manual, section Formatted Output, http://librep.sourceforge.net/librep-manual.html#Formatted%20Output, http://www.gwinnup.org/research/docs/librep.html#SEC122.
Smalltalk format strings are described in the GNU Smalltalk
documentation, class CharArray
, methods
`bindWith:´ and
`bindWithArguments:´.
http://www.gnu.org/software/smalltalk/gst-manual/gst_68.html#SEC238.
In summary, a directive starts with `%´ and is
followed by `%´ or a nonzero digit
(`1´ to `9´).
Java format strings are described in the JDK documentation for
class java.text.MessageFormat
,
http://java.sun.com/j2se/1.4/docs/api/java/text/MessageFormat.html.
See also the ICU documentation
http://oss.software.ibm.com/icu/apiref/classMessageFormat.html.
awk format strings are described in the gawk documentation, section Printf, http://www.gnu.org/manual/gawk/html_node/Printf.html#Printf.
Where is this documented?
YCP sformat strings are described in the libycp documentation file:/usr/share/doc/packages/libycp/YCP-builtins.html. In summary, a directive starts with `%´ and is followed by `%´ or a nonzero digit (`1´ to `9´).
Tcl format strings are described in the `format.n´ manual page, http://www.scriptics.com/man/tcl8.3/TclCmd/format.htm.
PHP format strings are described in the documentation of the PHP
function sprintf
, in
`phpdoc/manual/function.sprintf.html´ or http://www.php.net/manual/en/function.sprintf.php.
For the maintainer, the general procedure differs from the C language case in two ways.
gettextize
program without the `--intl´ option, and that he
invokes the AM_GNU_GETTEXT
autoconf macro via
`AM_GNU_GETTEXT([external])´.XGETTEXT_OPTIONS
variable in
`po/Makevars´ (see section 12.4.3 `Makefile´
pieces in `po/´) should be adjusted to match the
xgettext
options for that particular programming
language. If the package uses more than one programming language
with gettext
support, it becomes necessary to change
the POT file construction rule in
`po/Makefile.in.in´. It is recommended to make one
xgettext
invocation per programming language, each
with the options appropriate for that language, and to combine the
resulting files using msgcat
.c
, h
.C
, c++
, cc
,
cxx
, cpp
, hpp
.m
."abc"
_("abc")
gettext
, dgettext
,
dcgettext
, ngettext
,
dngettext
, dcngettext
textdomain
functionbindtextdomain
functionsetlocale (LC_ALL, "")
#include <libintl.h>
#include <locale.h>
#define _(string) gettext (string)
xgettext -k_
fprintf "%2$d %1$d"
(POSIX/XSI but not C 99)autosprintf "%2$d %1$d"
(see section
`Introduction' in GNU autosprintf)sh
"abc"
, 'abc'
, abc
"`gettext "abc"`"
gettext
, ngettext
programsTEXTDOMAIN
TEXTDOMAINDIR
sh
"abc"
, 'abc'
, abc
$"abc"
gettext
, ngettext
programsTEXTDOMAIN
TEXTDOMAINDIR
bash --dump-po-strings
py
'abc'
, u'abc'
, r'abc'
,
ur'abc'
,"abc"
, u"abc"
, r"abc"
,
ur"abc"
,"'abc"'
, u"'abc"'
, r"'abc"'
,
ur"'abc"'
,"""abc"""
, u"""abc"""
,
r"""abc"""
, ur"""abc"""
_('abc')
etc.gettext.gettext
, gettext.dgettext
,
gettext.ngettext
, gettext.dngettext
, also
ugettext
, ungettext
gettext.textdomain
function, or
gettext.install(domain)
functiongettext.bindtextdomain
function, or
gettext.install(domain,localedir)
functionimport gettext
xgettext
'...%(ident)d...' % { 'ident': value }
lisp
"abc"
(_ "abc")
, (ENGLISH "abc")
i18n:gettext
, i18n:ngettext
i18n:textdomain
i18n:textdomaindir
xgettext -k_ -kENGLISH
format "~1@*~D ~0@*~D"
d
"abc"
ENGLISH ? "abc" : ""
GETTEXT("abc")
GETTEXTL("abc")
clgettext
, clgettextl
#include "lispbibl.c"
clisp-xgettext
fprintf "%2$d %1$d"
(POSIX/XSI but not C 99)el
"abc"
(_"abc")
gettext
, dgettext
(xemacs only)domain
special form (xemacs only)bind-text-domain
function (xemacs only)xgettext
format "%2$d %1$d"
I18N3
defined at build time,
no translation.jl
"abc"
(_"abc")
gettext
textdomain
functionbindtextdomain
function(require 'rep.i18n.gettext)
xgettext
format "%2$d %1$d"
st
'abc'
NLS ? 'abc'
LcMessagesDomain>>#at:
,
LcMessagesDomain>>#at:plural:with:
LcMessages>>#domain:localeDirectory:
(returns a LcMessagesDomain
object).I18N Locale default messages domain: 'gettext'
localeDirectory: /usr/local/share/locale'
LcMessages>>#domain:localeDirectory:
, see
above.I18N Locale default
.PackageLoader fileInPackage: 'I18N'!
xgettext
'%1 %2' bindWith: 'Hello' with: 'world'
java
GettextResource.gettext
,
GettextResource.ngettext
ResourceBundle.getResource
insteadxgettext -k_
MessageFormat.format "{1,number} {0,number}"
Before marking strings as internationalizable, uses of the
string concatenation operator need to be converted to
MessageFormat
applications. For example, "file
"+filename+" not found"
becomes
MessageFormat.format("file {0} not found", new Object[] {
filename })
. Only after this is done, can the strings be
marked and extracted.
GNU gettext uses the native Java internationalization mechanism,
namely ResourceBundle
s. There are two formats of
ResourceBundle
s: .properties
files and
.class
files. The .properties
format is a
text file which the translators can directly edit, like PO files,
but which doesn't support plural forms. Whereas the
.class
format is compiled from .java
source code and can support plural forms (provided it is accessed
through an appropriate API, see below).
To convert a PO file to a .properties
file, the
msgcat
program can be used with the option
--properties-output
. To convert a
.properties
file back to a PO file, the
msgcat
program can be used with the option
--properties-input
. All the tools that manipulate PO
files can work with .properties
files as well, if
given the --properties-input
and/or
--properties-output
option.
To convert a PO file to a ResourceBundle class, the
msgfmt
program can be used with the option
--java
or --java2
. To convert a
ResourceBundle back to a PO file, the msgunfmt
program
can be used with the option --java
.
Two different programmatic APIs can be used to access
ResourceBundles. Note that both APIs work with all kinds of
ResourceBundles, whether GNU gettext generated classes, or other
.class
or .properties
files.
java.util.ResourceBundle
API. In particular,
its getString
function returns a string translation.
Note that a missing translation yields a
MissingResourceException
. This has the advantage of
being the standard API. And it does not require any additional
libraries, only the msgcat
generated
.properties
files or the msgfmt
generated
.class
files. But it cannot do plural handling, even
if the resource was generated by msgfmt
from a PO file
with plural handling.gnu.gettext.GettextResource
API. Reference
documentation in Javadoc 1.1 style format is in the
javadoc1 directory and in Javadoc 2 style format in the
javadoc2 directory. Its gettext
function returns a
string translation. Note that when a translation is missing, the
msgid argument is returned unchanged. This has the
advantage of having the ngettext
function for plural
handling. To use this API, one
needs the libintl.jar
file which is part of the GNU
gettext package and distributed under the LGPL.awk
"abc"
_"abc"
dcgettext
, missing dcngettext
in
gawk-3.1.0TEXTDOMAIN
variablebindtextdomain
functionsetlocale (LC_MESSAGES, "")
in gawk-3.1.0xgettext
printf "%2$d %1$d"
(GNU awk only)dcgettext
, dcngettext
and
bindtextdomain
yourself.pp
, pas
'abc'
ResourceString
data type insteadTranslateResourceStrings
function
insteadTranslateResourceStrings
function
instead{$mode delphi}
or {$mode
objfpc}
uses gettext;
ppc386
followed by xgettext
or
rstconv
uses sysutils;
format "%1:d %0:d"
The Pascal compiler has special support for the
ResourceString
data type. It generates a
.rst
file. This is then converted to a
.pot
file by use of xgettext
or
rstconv
. At runtime, a .mo
file
corresponding to translations of this .pot
file can be
loaded using the TranslateResourceStrings
function in
the gettext
unit.
cpp
"abc"
_("abc")
wxLocale::GetString
,
wxGetTranslation
wxLocale::AddCatalog
wxLocale::AddCatalogLookupPathPrefix
wxLocale::Init
, wxSetLocale
#include <wx/intl.h>
include/wx/intl.h
and
src/common/intl.cpp
xgettext
ycp
"abc"
_("abc")
_()
with 1 or 3 argumentstextdomain
statementxgettext
sformat "%2 %1"
tcl
"abc"
[_ "abc"]
::msgcat::mc
::msgcat::mcload
insteadpackage require msgcat
proc _ {s} {return [::msgcat::mc $s]}
xgettext -k_
format "%2\$d %1\$d"
Before marking strings as internationalizable, substitutions of
variables into the string need to be converted to
format
applications. For example, "file
$filename not found"
becomes [format "file %s not
found" $filename]
. Only after this is done, can the strings
be marked and extracted. After marking, this example becomes
[format [_ "file %s not found"] $filename]
or
[msgcat::mc "file %s not found" $filename]
. Note that
the msgcat::mc
function implicitly calls
format
when more than one argument is given.
pl
, PL
"abc"
gettext
, dgettext
,
dcgettext
textdomain
functionbindtextdomain
functionsetlocale (LC_ALL, "");
use POSIX;
use Locale::gettext;
php
, php3
, php4
"abc"
, 'abc'
_("abc")
gettext
, dgettext
,
dcgettext
textdomain
functionbindtextdomain
functionsetlocale (LC_ALL, "")
xgettext
printf "%2\$d %1\$d"
pike
"abc"
gettext
, dgettext
,
dcgettext
textdomain
functionbindtextdomain
functionsetlocale
functionimport Locale.Gettext;
Here is a list of other data formats which can be internationalized using GNU gettext.
pot
, po
xgettext
rst
xgettext
, rstconv
glade
xgettext
, libglade-xgettext
Go to the first, previous, next, last section, table of contents.