From owner-freebsd-doc Sat Feb 23 3:54: 3 2002 Delivered-To: freebsd-doc@freebsd.org Received: from mx1.datanet.hu (mx1.datanet.hu [194.149.13.160]) by hub.freebsd.org (Postfix) with ESMTP id 1504537B402 for ; Sat, 23 Feb 2002 03:53:59 -0800 (PST) Received: from fonix.adamsfamily.xx (nilus-289.adsl.datanet.hu [195.56.93.32]) by mx1.datanet.hu (DataNet) with ESMTP id 0A35BF9A1 for ; Sat, 23 Feb 2002 12:53:54 +0100 (CET) Received: (from cc@localhost) by fonix.adamsfamily.xx (8.11.6/8.11.6) id g1NBsHs39109 for freebsd-doc@freebsd.org; Sat, 23 Feb 2002 12:54:17 +0100 (CET) (envelope-from sziszi@bsd.hu) Date: Sat, 23 Feb 2002 12:54:16 +0100 From: Szilveszter Adam To: freebsd-doc@freebsd.org Subject: Entities in translations Message-ID: <20020223115416.GA1152@fonix.adamsfamily.xx> Mail-Followup-To: Szilveszter Adam , freebsd-doc@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.27i Sender: owner-freebsd-doc@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Hello everybody, I am about to ask a translations-related question here, I hope this is OK. If not please point me to the appropriate mailing list. Thank you. So. I am in the process of doing a Hungarian translation of one of the FreeBSD Documentation Project's articles. (I use the Project's infrastructure to build it etc). Now I have a problem with marking up the non-ASCII characters with entities. I read the FAQ for translators in the Primer and it clearly states this should be done, ie the non-ASCII chars should not be entered as is but rather converted into entities. So fine, I even have found some vim scripts under doc/share/examples to do this automatically. But, Hungarian does not use Latin-1, rather Latin-2 for character encoding. Now the problem. After I mark up the page using the Latin-2 entities, and include the ISOlat2 entities at the top of the page, I let openjade generate a HTML version from the DocBook and set the Content-Type as "text/html; charset=ISO-8859-2". But the browsers that I tested do not display the page correctly, except for lynx that somehow is intelligent enough to know what eg ő is. All the others seem to expand the entities that are the same as the ones for Latin-1 and leave the rest and display it to the user. This is, needless to say, suboptimal:-( At the same time, the footer text from the localized freebsd.dsl is displayed correctly, but I did not use entities there. Next, I looked at how other translation teams overcame this problem. What I found was disquieting at best. It seems that the Japanese and Russian teams went completely different way and do not use entities for marking up non-ASCII characters at all (which is understandable, since those are the vast majority) but rather just input text as is. All the Latin character-set using teams so far seem to use Latin-1, which, for some reason works out of the box. The sole exception is the Serbian translation (also Latin-2) but there, again, I saw that the non-ASCII chars were just entered as-is. So now, please advise. Which method should I follow? Should I stick to entities for non-ASCII characters? But then how do I make them display in the HTML rendering in their expanded form instead of the entity itself? Or should I just follow the lead of the non Latin-1 teams and start inputting these characters as-is? What is, for example the Greek Doc Project doing about this? Also, what are the advantages and disadvantages of using entities for non-ASCII character markup? Off the top of my head, the only one that comes to mind is that by using entities, even someone who only has access to a non-localized version of FreeBSD can contribute whereas otherwise you definitely need proper fonts etc. But as I see from others, this did not become a major problem. So? I appreciate all help from fellow translators at the Doc Project. -- Regards: Szilveszter ADAM Szombathely Hungary To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-doc" in the body of the message