Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Aug 1998 10:42:29 -0700 (PDT)
From:      Julian Elischer <julian@whistle.com>
To:        jallison@engr.sci.com
Cc:        archie@whistle.com, freebsd-hackers@FreeBSD.ORG
Subject:   Re: Warning: Change to netatalk's file name handling (fwd)
Message-ID:  <Pine.BSF.3.95.980828104149.3854I-100000@current1.whistle.com>

next in thread | raw e-mail | index | archive | help


---------- Forwarded message ----------
Date: Fri, 28 Aug 1998 00:36:04 +0200 (CEST)
From: Stefan Bethke <stb@hanse.de>
To: Terry Lambert <tlambert@primenet.com>
Cc: archie@whistle.com, freebsd-hackers@FreeBSD.ORG
Subject: Re: Warning: Change to netatalk's file name handling

On Thu, 27 Aug 1998, Terry Lambert wrote:

> > > Netatalk seems like the wrong place to modify behavior to solve this
> > > problem, which is a display problem, not an encoding problem.
> > 
> > Where is the encoding defined for character values in the ranges between
> > \0x01 to \0x1f, and \0x7f to \0xff in terms of UFS, POSIX, whatever?
> 
> ISO 8859?

Is this a standardized encoding for POSIX file names, or just a
convention? If it only is a convention, what will non-latin script users
think about it? How do we discriminate between different 8859 encodings?
(Yeah, I see your point about "locales".)

> > If you were right, it would be OK for afpd to store all chars literally.
> > While this does work, it is definitly awkward to work with in the shell,
> > and possibly so together with other applications as Samba as well. Its not
> > merely an display issue; its an interoperability issue. I feel that too
> > many things expect file names to confine to printable ascii, and unless
> > this changes, I opt to fix what in my eyes is an obvious bug in afpd (that
> > is, escaping \0x80 to \0xff, but leaving \0x01 to \0x1f and \0x7f
> > untouched).
> 
> Per interoperability: This presumes, incorrectly, that Mac's support
> the same idiotic idea of code pages as SAMBA must.

Macs, in this sense, use a single "code page." I believe there is an escape
mechanism to change the encoding to non-latin scripts, but I will have to
look that up in Inside Mac. For AFP 2.1 (which netatalk claims to support to
the extent the Macs use it), there is a single encoding defined, without any
escape mechanism.

> > It won't change anything to the worse; the only problem is that existing
> > files with file names containing control characters (custom icons on folders
> > being the single source of such name probably) will stop working and will
> > need manual assistance from an operator.
> 
> It will break a number of things.  It already breaks the file name
> length limitation in SAMBA.  Duplicating this break into Appletalk is,
> IMO, a bad idea.

I don't know much about SMB/CIFS/Samba. What is the filename length limit
(as opposed, possibly, to the pathname limit)?

AFP has a filename length limitation to 31 bytes/chars. All Unix-based AFP
servers I know of choose to drop files with longer names. Also, at least
two commercial products use the same mechanism for escaping non-ASCII chars.

> If you are going to push this hard, you should consider Internataional
> representation ofile names by client locale, and how it is already
> handled.

Would you mind to point me to any information shedding light on
standardisation efforts for file name representation? In terms of "locale",
this would mean that "Mac" or "AFP" would be it's own locale in terms of
file name character encoding?

After all, I see three possible ways:

- improve interoperability by confining to printable ASCII (or ISO-8859-1,
  or...) and not escaping other glyphs, thus breaking AFP conformance;

- escaping all glyphs (or rather their encoding) in a way that preserves the
  full AFP filename encoding space (for filenames, this is 0x01 to 0xff,
  with ":" being illegal as it is the path delimiter), but using printable
  ASCII where possible (this is, I believe, what netatalk tries to do, but
  doesn't, due to a stupid bug).

- translate the AFP filename encoding space into some larger glyph encoding
  space, such as Unicode, or, more specifically, UTF-8.

The last one probably is the way to go, but this would require (at least to
me) some testimonial that Unicode in general and UTF-8 in particular is the
way to go for file names in FreeBSD. This of course would probably start
other interop problems with NFS and alike, and it would require samba to
deal with CP bogosities in its own right instead of putting it in the face
of every other app.

> Novell servers are another case where the server assumes all clients
> exist in a given locale; this would be a mistake to buy into...

Yep.

Cheers,
Stefan

--
Stefan Bethke
Muehlendamm 12            Phone: +49-40-256848, +49-177-3504009
D-22087 Hamburg           <stefan.bethke@hanse.de>
Hamburg, Germany          <stb@freebsd.org>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.95.980828104149.3854I-100000>