From owner-freebsd-questions@FreeBSD.ORG Sun Nov 4 18:37:05 2012 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E3855E9F for ; Sun, 4 Nov 2012 18:37:05 +0000 (UTC) (envelope-from grarpamp@gmail.com) Received: from mail-oa0-f54.google.com (mail-oa0-f54.google.com [209.85.219.54]) by mx1.freebsd.org (Postfix) with ESMTP id A5D658FC0C for ; Sun, 4 Nov 2012 18:37:05 +0000 (UTC) Received: by mail-oa0-f54.google.com with SMTP id n9so6414645oag.13 for ; Sun, 04 Nov 2012 10:36:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=MKWxxlyy5OMsglba45pBK3Syyr05p4hCXgLL8p+hCfg=; b=f3wwLiZyNyuofQm9/lVo36xrr2pge0LsgvO8xy+yHljAxY92Wr//qlh9eMkeE+eeEp VhC3bT2b1qxNO5f/f/I3Jw+wmMi6UipNi3DEkQcLHyvT4Z98JhH63v/LsuszxEdPayqT 9pyLrGLU4gaHaa6ZRREpfEQYk1XagWck3Gk/HEZs8cb7leo5kp8hnVoBgXM0+ngUmbIQ rv5MNbFJeHGpQBYtHOkSO/svPHW7qTHnp7wabVL+vUGCRXBJOFR0jRzbpNFcMc4WYbFt rvMnIBkmyAwvgzPM68p5OkPbkjF7GHvo60SByZb84A13LKl4iJ1/IpPsuhm2n0F7/9XO LC7w== MIME-Version: 1.0 Received: by 10.182.157.82 with SMTP id wk18mr6035777obb.26.1352054218843; Sun, 04 Nov 2012 10:36:58 -0800 (PST) Received: by 10.76.68.39 with HTTP; Sun, 4 Nov 2012 10:36:58 -0800 (PST) Date: Sun, 4 Nov 2012 13:36:58 -0500 Message-ID: Subject: Character set conversion, locales, UTF-8, etc From: grarpamp To: freebsd-questions@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Nov 2012 18:37:06 -0000 Hi. I think I'm looking for a character conversion tool. I have a few thousand files in a hier. I believe an app, possibly a Java one, created them while in en_US.US-ASCII mode, or perhaps some other unidentified locale. Whatever it was, I think it took binary filename data, interpreted it and wrote the interpretation to disk (instead of the original binary). So now any other app that looks at the disk under any locale gets the names wrong. So I think I need something to take some stdin from /bin/ls -w (in the broken way I have it on disk), let me fiddle with feeding it different locales to until I see the right binary representation again, and then emit the binary to stdout so I can rename the files back to binary on disk so that any future app can read the names under it's own local locale. Does that make sense? I'm very new to character sets and things. As an aside, why does FreeBSD seem to default to the above locale instead of say, en_US.UTF-8 ?