Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 Feb 2002 12:54:16 +0100
From:      Szilveszter Adam <sziszi@bsd.hu>
To:        freebsd-doc@freebsd.org
Subject:   Entities in translations
Message-ID:  <20020223115416.GA1152@fonix.adamsfamily.xx>

next in thread | raw e-mail | index | archive | help
Hello everybody,

I am about to ask a translations-related question here, I hope this is
OK. If not please point me to the appropriate mailing list. Thank you.

So. I am in the process of doing a Hungarian translation of one of the
FreeBSD Documentation Project's articles. (I use the Project's
infrastructure to build it etc). Now I have a problem with marking up
the non-ASCII characters with entities.

I read the FAQ for translators in the Primer and it clearly states this
should be done, ie the non-ASCII chars should not be entered as is but
rather converted into entities. So fine, I even have found some vim
scripts under doc/share/examples to do this automatically. But,
Hungarian does not use Latin-1, rather Latin-2 for character encoding.

Now the problem. After I mark up the page using the Latin-2 entities,
and include the ISOlat2 entities at the top of the page, I let
openjade generate a HTML version from the DocBook and set the
Content-Type as "text/html; charset=ISO-8859-2". But the browsers that I
tested do not display the page correctly, except for lynx that somehow
is intelligent enough to know what eg &odblac; is. All the others seem
to expand the entities that are the same as the ones for Latin-1 and
leave the rest and display it to the user. This is, needless to say,
suboptimal:-( At the same time, the footer text from the localized
freebsd.dsl is displayed correctly, but I did not use entities there.

Next, I looked at how other translation teams overcame this problem.

What I found was disquieting at best. It seems that the Japanese and
Russian teams went completely different way and do not use entities for
marking up non-ASCII characters at all (which is understandable, since
those are the vast majority) but rather just input text as is.

All the Latin character-set using teams so far seem to use Latin-1,
which, for some reason works out of the box. The sole exception is the
Serbian translation (also Latin-2) but there, again, I saw that the
non-ASCII chars were just entered as-is.

So now, please advise. Which method should I follow? Should I stick to
entities for non-ASCII characters? But then how do I make them display
in the HTML rendering in their expanded form instead of the entity
itself? Or should I just follow the lead of the non Latin-1 teams and
start inputting these characters as-is? What is, for example the Greek
Doc Project doing about this? 

Also, what are the advantages and disadvantages of using entities for
non-ASCII character markup? Off the top of my head, the only one that
comes to mind is that by using entities, even someone who only has
access to a non-localized version of FreeBSD can contribute whereas
otherwise you definitely need proper fonts etc. But as I see from
others, this did not become a major problem. So?

I appreciate all help from fellow translators at the Doc Project.

-- 
Regards:

Szilveszter ADAM
Szombathely Hungary

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-doc" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020223115416.GA1152>