Date: Mon, 06 Feb 2006 11:48:43 +0100 From: =?UTF-8?B?S8O2dmVzZMOhbiBHw6Fib3I=?= <gabor.kovesdan@t-hosting.hu> To: "Simon L. Nielsen" <simon@FreeBSD.org> Cc: doc@FreeBSD.org, www@FreeBSD.org Subject: Re: How to handle localized characters ans special symbols? Message-ID: <43E7298B.20206@t-hosting.hu> In-Reply-To: <20060205153021.GC857@zaphod.nitro.dk> References: <43E501A2.9080109@t-hosting.hu> <20060205153021.GC857@zaphod.nitro.dk>
next in thread | previous in thread | raw e-mail | index | archive | help
Simon L. Nielsen wrote: >On 2006.02.04 20:33:54 +0100, Kövesdán Gábor wrote: > > > >>I'm translating the FreeBSD webpage to Hungarian. I haven't done too >>much so far, because I don't have too much spare time, but I'll finish >>this translation. Today, I made a test build. You can see this here: >>http://tux.t-hosting.hu/data >>The most part of it is still in English but there are some translated >>pages. The build succeeded quite good, I've found my mistakes easily and >>managed to build the site, but I have troubles with one of the localized >>characters. This is The o letter with two commas on it. Its standard >>html code is ő, but the sgml parser substitutes it with a Q char. I >>don't see why does it happen and don't know how to fix it. There are two >>more problematic characters, and they are ® and ™. They are >>also substituted in a wrong way. See: >>http://tux.t-hosting.hu/data/about.html >>You can notice the Z character with a ?? sign after the word Pentium and >>a " after Athlon. >>How could I correctly display these characters? Please tell me what to >>do so that we have a nice Hungarian webpage. :) >> >>(I use Firefox and it selects the ISO-8859-2 Central European encoding >>automatically.) >> >> > >I think the problem is that your web server forces a character set >which prevents the character set in the HTML from taking effect: > >[simon@zaphod:~] fetch -o /dev/null -vv http://tux.t-hosting.hu/data/about.html | & grep Content-Type: ><<< Content-Type: text/html; charset=ISO-8859-2 > >I'm not exactly sure how some of the other translations are handling >using non ISO-8859-1, but since e.g. ja and ru translations use >something which definitely isn't Latin characters I'm sure it can be >done. See how those translations changes the character set as needed. > > > I've found out, it's not just about the charset used by the browser. The SGML parser substitutes ő with Q. If ő remained in the html files, the browser would display them correctly. I tried to put this to my Makefile, to override the default in web.bsd.mk, hoping that SGML parser will not make this unwanted substitution any more: SGMLNORMOPTS= -d ${SGMLNORMFLAGS} -c ${CATALOG} -D ${.CURDIR} -biso-8859-2 But no use. I get a new problem recently, too. According to http://www.w3.org/2003/entities/iso8879doc/isolat1.html the entities á é etc... are accepted standards in the XML language, but if I put these character into an .xsl file, e.g. index.xsl the web build will fail. Anyway, I've realized if I simply write a character ő into the sgml sources it remaines good, but I don't know how standard and portable this solution is. I would like to make my work as standard and portable as it can be. As for the Russian website, they just type their characters according to their charset, and I see strange chaarcters in the sources. It is definitely working, but isn't there some more elegant solution? Like á instead of á, é instead of é, etc... Thanks, Gabor
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?43E7298B.20206>