Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 May 2011 13:12:03 +0200
From:      Frank Bonnet <f.bonnet@esiee.fr>
To:        Jeremy Chadwick <freebsd@jdc.parodius.com>
Cc:        freebsd-apache@freebsd.org
Subject:   Re: Where to define HTTP_ACCEPT_LANGUAGE=fr-fr ???
Message-ID:  <4DD64C83.1070903@esiee.fr>
In-Reply-To: <20110520103725.GA19494@icarus.home.lan>
References:  <4DD624E4.5000408@esiee.fr> <20110520092755.GA18041@icarus.home.lan> <4DD63698.3030907@esiee.fr> <20110520103725.GA19494@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On 05/20/2011 12:37 PM, Jeremy Chadwick wrote:

stuff deleted
OK Jeremy, thank you for your complete and good technical
answer, I'm gonna check all your recommendation then
let you know if is has worked .

Thanks again.

Frank


>                                             here is the problem
> This looks like a character set issue of the browser vs. the filename o=
n
> the server.  Specifically: the browser is requesting to download a
> filename that's in utf-8 (Unicode), while what's on the actual server i=
s
> a filename encoded in iso-8859-1.
>
> I'm also making the assumption the letter which shows up in your Email
> above is actually the "=EF=BF=BD" character (latin small letter e with =
an
> acute (raising) accent above it).  I hope the below examples therefore
> render correctly for you.
>
> Let me explain the two differences:
>
> utf-8
> =3D=3D=3D=3D=3D=3D=3D
> - Filename (visually):  11_EE_APP_FE_CV_CISSE_Kaliss=EF=BF=BD.docx
> - Filename (literally): 11_EE_APP_FE_CV_CISSE_Kaliss<0xc3><0xa9>.docx
> - Filename (as URL):    11_EE_APP_FE_CV_CISSE_Kaliss%C3%A9.docx
>
> iso-8859-1
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> - Filename (visually):  11_EE_APP_FE_CV_CISSE_Kaliss=EF=BF=BD.docx
> - Filename (literally): 11_EE_APP_FE_CV_CISSE_Kaliss<0xe9>.docx
> - Filename (as URL):    11_EE_APP_FE_CV_CISSE_Kaliss%E9.docx
>
> URLs, per official RFC 1738, with regards to iso-8859-1, do not permit
> characters above 0x7f to make it into the URL.  So, technically
> speaking, the URL of:
>
> http://somesite/11_EE_APP_FE_CV_CISSE_Kaliss=EF=BF=BD.docx
>
> Should fail or not work.  Some browsers may try and "be smart" and turn
> the accented small e character into %E9, which would then become:
>
> http://somesite/11_EE_APP_FE_CV_CISSE_Kaliss%E9.docx
>
> Which would work just fine.
>
> I'm not sure that HTTP_ACCEPT_LANGUAGE would fix this problem.
>
> If you have a CGI, PHP script, web software, etc. which is generating
> filenames and things like that, and is using utf-8 as it's character se=
t
> (meaning either via an HTTP header or via HTML<meta http-equiv>  tag),
> then that's going to mess things up.  You need to be using the
> iso-8859-1 character set instead.  A good browser will be able to show
> you what character set the page shows up as.
>
> What's the alternative?  Simple: you start using utf-8 in your
> filenames.  I should note, however, that FreeBSD (including 8.2-STABLE)
> does not have very good Unicode support.  It's hit-or-miss, and using
> things like LANG/LC_CTYPE result in some serious problems with utilitie=
s
> that rely on locale(7).  So, I would be very careful going this route o=
n
> FreeBSD.
>
> The short version is this: if you're going to use utf-8, you need to us=
e
> it absolutely 100% of the time.  You cannot reliably mix-match characte=
r
> sets like that.
>
> Hope this helps.
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DD64C83.1070903>