From owner-freebsd-www@FreeBSD.ORG Mon Feb 6 10:49:00 2006 Return-Path: X-Original-To: www@FreeBSD.org Delivered-To: freebsd-www@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3701C16A420; Mon, 6 Feb 2006 10:49:00 +0000 (GMT) (envelope-from gabor.kovesdan@t-hosting.hu) Received: from server.t-hosting.hu (server.t-hosting.hu [217.20.133.7]) by mx1.FreeBSD.org (Postfix) with ESMTP id B73DD43D58; Mon, 6 Feb 2006 10:48:52 +0000 (GMT) (envelope-from gabor.kovesdan@t-hosting.hu) Received: from localhost (localhost [127.0.0.1]) by server.t-hosting.hu (Postfix) with ESMTP id 8E0829987A7; Mon, 6 Feb 2006 11:48:50 +0100 (CET) Received: from server.t-hosting.hu ([127.0.0.1]) by localhost (server.t-hosting.hu [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 38840-05-4; Mon, 6 Feb 2006 11:48:46 +0100 (CET) Received: from [80.98.231.227] (catv-5062e7e3.catv.broadband.hu [80.98.231.227]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by server.t-hosting.hu (Postfix) with ESMTP id BFEC99987A0; Mon, 6 Feb 2006 11:48:46 +0100 (CET) Message-ID: <43E7298B.20206@t-hosting.hu> Date: Mon, 06 Feb 2006 11:48:43 +0100 From: =?UTF-8?B?S8O2dmVzZMOhbiBHw6Fib3I=?= User-Agent: Mozilla Thunderbird 1.0.6 (Windows/20050716) X-Accept-Language: en-us, en MIME-Version: 1.0 To: "Simon L. Nielsen" References: <43E501A2.9080109@t-hosting.hu> <20060205153021.GC857@zaphod.nitro.dk> In-Reply-To: <20060205153021.GC857@zaphod.nitro.dk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: amavisd-new at t-hosting.hu Cc: doc@FreeBSD.org, www@FreeBSD.org Subject: Re: How to handle localized characters ans special symbols? X-BeenThere: freebsd-www@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Project Webmasters List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Feb 2006 10:49:00 -0000 Simon L. Nielsen wrote: >On 2006.02.04 20:33:54 +0100, Kövesdán Gábor wrote: > > > >>I'm translating the FreeBSD webpage to Hungarian. I haven't done too >>much so far, because I don't have too much spare time, but I'll finish >>this translation. Today, I made a test build. You can see this here: >>http://tux.t-hosting.hu/data >>The most part of it is still in English but there are some translated >>pages. The build succeeded quite good, I've found my mistakes easily and >>managed to build the site, but I have troubles with one of the localized >>characters. This is The o letter with two commas on it. Its standard >>html code is ő, but the sgml parser substitutes it with a Q char. I >>don't see why does it happen and don't know how to fix it. There are two >>more problematic characters, and they are ® and ™. They are >>also substituted in a wrong way. See: >>http://tux.t-hosting.hu/data/about.html >>You can notice the Z character with a ?? sign after the word Pentium and >>a " after Athlon. >>How could I correctly display these characters? Please tell me what to >>do so that we have a nice Hungarian webpage. :) >> >>(I use Firefox and it selects the ISO-8859-2 Central European encoding >>automatically.) >> >> > >I think the problem is that your web server forces a character set >which prevents the character set in the HTML from taking effect: > >[simon@zaphod:~] fetch -o /dev/null -vv http://tux.t-hosting.hu/data/about.html | & grep Content-Type: ><<< Content-Type: text/html; charset=ISO-8859-2 > >I'm not exactly sure how some of the other translations are handling >using non ISO-8859-1, but since e.g. ja and ru translations use >something which definitely isn't Latin characters I'm sure it can be >done. See how those translations changes the character set as needed. > > > I've found out, it's not just about the charset used by the browser. The SGML parser substitutes ő with Q. If ő remained in the html files, the browser would display them correctly. I tried to put this to my Makefile, to override the default in web.bsd.mk, hoping that SGML parser will not make this unwanted substitution any more: SGMLNORMOPTS= -d ${SGMLNORMFLAGS} -c ${CATALOG} -D ${.CURDIR} -biso-8859-2 But no use. I get a new problem recently, too. According to http://www.w3.org/2003/entities/iso8879doc/isolat1.html the entities á é etc... are accepted standards in the XML language, but if I put these character into an .xsl file, e.g. index.xsl the web build will fail. Anyway, I've realized if I simply write a character ő into the sgml sources it remaines good, but I don't know how standard and portable this solution is. I would like to make my work as standard and portable as it can be. As for the Russian website, they just type their characters according to their charset, and I see strange chaarcters in the sources. It is definitely working, but isn't there some more elegant solution? Like á instead of á, é instead of é, etc... Thanks, Gabor