From owner-freebsd-questions@FreeBSD.ORG  Sat Apr 21 20:07:06 2012
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 68E521065672
	for <freebsd-questions@freebsd.org>;
	Sat, 21 Apr 2012 20:07:06 +0000 (UTC)
	(envelope-from freebsd@edvax.de)
Received: from mx02.qsc.de (mx02.qsc.de [213.148.130.14])
	by mx1.freebsd.org (Postfix) with ESMTP id 27A158FC0C
	for <freebsd-questions@freebsd.org>;
	Sat, 21 Apr 2012 20:07:06 +0000 (UTC)
Received: from r56.edvax.de (port-92-195-124-250.dynamic.qsc.de
	[92.195.124.250]) by mx02.qsc.de (Postfix) with ESMTP id B0C3A24867;
	Sat, 21 Apr 2012 22:07:04 +0200 (CEST)
Received: from r56.edvax.de (localhost [127.0.0.1])
	by r56.edvax.de (8.14.5/8.14.5) with SMTP id q3LK73SC002822;
	Sat, 21 Apr 2012 22:07:04 +0200 (CEST)
	(envelope-from freebsd@edvax.de)
Date: Sat, 21 Apr 2012 22:07:03 +0200
From: Polytropon <freebsd@edvax.de>
To: Lars Eighner <lars@larseighner.com>
Message-Id: <20120421220703.86683bc9.freebsd@edvax.de>
In-Reply-To: <alpine.BSF.2.00.1204210909450.5338@abbf.6qbyyneqvnyhc.pbz>
References: <20120421055823.GA6788@tinyCurrent> <4F9253D7.7010609@locolomo.org>
	<4F9278A2.1020301@locolomo.org>
	<alpine.BSF.2.00.1204210909450.5338@abbf.6qbyyneqvnyhc.pbz>
Organization: EDVAX
X-Mailer: Sylpheed 3.1.1 (GTK+ 2.24.5; i386-portbld-freebsd8.2)
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-questions@freebsd.org
Subject: Re: converting UTF-8 to HTML
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Polytropon <freebsd@edvax.de>
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 21 Apr 2012 20:07:06 -0000

On Sat, 21 Apr 2012 09:10:03 -0500 (CDT), Lars Eighner wrote:
> On Sat, 21 Apr 2012, Erik N=F8rgaard wrote:
>=20
> > When characters show up wrong in the users browser it's usually because=
 the=20
> > browser is set to use a non-UTF-8 charset by default such as windows-12=
52,=20
> > the web server sends the charset=3Dascii in the http header and there i=
s no or=20
> > incorrect meta tag to resolve the problem. Non UTF-8 charsets are a lef=
tover=20
> > from last millenia that we sometimes still choke on .. sorry the rant ;)
>=20
> UTF-8 is a waste of storage for most people [...]

Disks and RAM are huge and cheap. Plenty of space that is
going to be used. Nobody cares.


> [...] and is incompatiple with
> text-mode tools: it's simple another bid to make it impossible to run
> without a GUI.

Again, nobody cares - until, of couse, it's too late and you
need to do some recovery or analytic tasks in a limited
environment or via a connection with limited means.

Regarding the fun of encodings, endianness, representation,
use ("fi" the two letters vs. "fi" the ligature, or "=DF"
the 1-byte sequence vs. "=DF" the two-byte sequence), see
the following document:

Matt Mayer: Love Hotels and Unicode
http://www.reigndesign.com/blog/love-hotels-and-unicode/

And finally it offers an interesting attack vector, given
the fact that several unicode characters "look" the same,
but in fact are different. So "two files with the 'same'
name" is a possible means that malware implementers can
utilize to mislead the users.

Short example from MICROS~1 land here:
http://blogs.technet.com/b/mmpc/archive/2011/08/10/can-we-believe-our-eyes.=
aspx

But this all doesn't negate the usefulness of unicode / UTF-8
in general. Especially when you have collaborative settings
with multi-language document processing requirements, it
is a helpful thing, as working with "normal" (ASCII) letters,
cyrillic ones, chinese and japanese symbols, arabic writing
is no big deal as long as all the tools do properly support
it the _same_ way.


--=20
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...