From owner-freebsd-questions@FreeBSD.ORG Mon Nov 5 01:27:46 2012 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B946C7D7 for ; Mon, 5 Nov 2012 01:27:46 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from mx02.qsc.de (mx02.qsc.de [213.148.130.14]) by mx1.freebsd.org (Postfix) with ESMTP id 74E9F8FC12 for ; Mon, 5 Nov 2012 01:27:46 +0000 (UTC) Received: from r56.edvax.de (port-92-195-8-72.dynamic.qsc.de [92.195.8.72]) by mx02.qsc.de (Postfix) with ESMTP id 8ACB723E1B; Mon, 5 Nov 2012 02:27:45 +0100 (CET) Received: from r56.edvax.de (localhost [127.0.0.1]) by r56.edvax.de (8.14.5/8.14.5) with SMTP id qA51RjZL002072; Mon, 5 Nov 2012 02:27:45 +0100 (CET) (envelope-from freebsd@edvax.de) Date: Mon, 5 Nov 2012 02:27:45 +0100 From: Polytropon To: grarpamp Subject: Re: Character set conversion, locales, UTF-8, etc Message-Id: <20121105022745.adc3e4c2.freebsd@edvax.de> In-Reply-To: References: Organization: EDVAX X-Mailer: Sylpheed 3.1.1 (GTK+ 2.24.5; i386-portbld-freebsd8.2) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Polytropon List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Nov 2012 01:27:46 -0000 On Sun, 4 Nov 2012 13:36:58 -0500, grarpamp wrote: > As an aside, why does FreeBSD seem to default to the above locale > instead of say, en_US.UTF-8 ? FreeBSD's file system does not default to any locale, as far as I know. The system is "agnostic" to what the characters in the file name mean or what symbol they should represent. It's up to the console font and terminal emulator and font display what you can see on your screen. In text mode, this is limited and typically restricted to the fonts included with the system, having to meet the proper LC_ settings (e. g. de_DE.ISO8859-1 plus iso-8x8/14/16 if you want german characters like umlauts and eszett). There is no real UTF-8 support on the console. For example, files with chinese characters will show up as ??????????????. In X, with a "different than expected" locale, "funny characters" will typically appear, like A~.1/4..X=B0 upside-down question mark. :-) That being said, it's up to the application programs (and if it's just the terminal emulator displaying the output of ls) to deal with multibyte sequences. They are _valid_ in file names. The many problems they cause should make programmers pay attention on if and how to use them. :-) There isn't much you can do on file system level except renaming the files: write a program that reads the file names according to the preferred interpretation and write new names for them, being more portable (e. g. by translating "problematic" characters into such that are less problematic). --=20 Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...