Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 22 Apr 2012 13:06:42 +0200
From:      Polytropon <freebsd@edvax.de>
To:        Matthew Seaman <m.seaman@infracaninophile.co.uk>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: converting UTF-8 to HTML
Message-ID:  <20120422130642.cb5b09c2.freebsd@edvax.de>
In-Reply-To: <4F93E159.7020807@infracaninophile.co.uk>
References:  <20120421055823.GA6788@tinyCurrent> <4F9253D7.7010609@locolomo.org> <4F9278A2.1020301@locolomo.org> <alpine.BSF.2.00.1204210909450.5338@abbf.6qbyyneqvnyhc.pbz> <4F93CC95.5050209@locolomo.org> <4F93E159.7020807@infracaninophile.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 22 Apr 2012 11:45:45 +0100, Matthew Seaman wrote:
> On 22/04/2012 10:17, Erik N=F8rgaard wrote:
> > UTF-8 is variable with, ascii characters are stored as single bytes (not
> > sure about iso-8859-1) while other characters are stored as two byte ch=
ars.
>=20
> ascii uses the low 128 values that you can assign to an unsigned char,
> ie. those where the high-order bit is zero.
>=20
> iso-8859-1 and the various other iso-8859-X character sets fill in the
> remaining 128 characters with various other glyphs useful in latin
> alphabets, so it's still one char per glyph.  Other alphabets (greek,
> cyrillic, etc) have similar one byte-per glyph encodings. But you have
> to know what the encoding is to display the content correctly, and it is
> difficult to mix chunks of text in different encodings in the same docume=
nt.

How about the "extended ASCII character set" that has a mixture
of "non-US glyphs" and semi-graphic symbols?

	http://asciiset.com/extended.gif

This default layout isn't tied to a specific encoding, if I
remember correctly, or is it? Accessing the set as seen in the
picture allows using "special character" from many languages,
such as german umlauts and eszett, greek gamma and phi,
danish o-slash, swedish a-circle and even the yen symbol.
And the nice semi-graphic symbols to draw boxes and backgrounds,
as well as card deck symbols or the "lazy L".

Of course, there are no arabic or chinese letters in there,
so it can be seen as a "roman-derived language" centrism
(targeting europe and america in the first place). All of
them are natively supported by graphic cards when running
in text mode, if my assumption is correct. So this "extended
set of capabilities" still is the most-minimum common
functionality that one can rely on.

(FreeBSD remaps some of the characters in text mode to display
the semi-graphic mouse pointer, so the full set cannot be
used all the time.)



--=20
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120422130642.cb5b09c2.freebsd>