Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 30 Jul 2010 16:24:59 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        jilles@FreeBSD.org
Cc:        freebsd-bugs@FreeBSD.org, andyf@speednet.com.au
Subject:   Re: bin/54784: find(1) -ls wastes space
Message-ID:  <20100730144702.M2522@delplex.bde.org>
In-Reply-To: <201007292327.o6TNRHAi045973@freefall.freebsd.org>
References:  <201007292327.o6TNRHAi045973@freefall.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 29 Jul 2010 jilles@FreeBSD.org wrote:

> Synopsis: find(1) -ls wastes space
>
> State-Changed-From-To: open->closed
> State-Changed-By: jilles
> State-Changed-When: Thu Jul 29 23:24:55 UTC 2010
> State-Changed-Why:
> I don't think there is a good way to fix this.
> Iterating over all users with getpwent() may be very slow or may not work
> at all in some environments.
> I don't see another way to find the column widths. Buffering all output
> would be rather annoying as well.

It certainly gives unreadable output.  And didn't someone want to bloat
UT_NAMESIZE to 256 or so, so that only users with 400-column terminals
could read the output?

Related bugs:
- find(1) claims that the format is identical to that produced by ls -dgils.
   Actually, the formats differ greatly in whitespace -- the latter produces
   readable output.  ls -Rdigls has the same problem as find -ls.  ls has
   variable column widths for many of the fields including ids.  I think it
   doesn't calculate the maximum in advance, but keeps using the previous
   maximum.
- UT_LINESIZE and <utmp.h> no longer exist, but:
   - the not-unused header <sys/param.h> says that MAXLOGNAME should be
     UT_LINESIZE+1 refers to <utmp.h> for this.
   - the not-unused header <stdio.h> says that L_cuserid is UT_LINESIZE + 1
     and refers to <utmp.h> for this.
   - there are some style bugs in these dead comments.
- ut_user[] in struct utmpx has size 32 bytes (max name length 31?), but
   MAXLOGNAME is still 17.
- find -ls still uses MAXLOGNAME, so its format hasn't been properly
   bloated.
- there are related formatting problems for printing the inode number.
   find -ls uses the fixed format %6lu and a bogus cast to u_long to
   ensure breakage if ino_t is larger than u_long.  ls uses the variable
   format %*lu.  %6lu was more than enough when inodes were 16-bit
   and/or file systems were small, but hasn't been enough for many
   years.  It messes up the alignment of the columns if there is a
   mixture of large and small inode numbers, and you almost might as
   well print ids using %s and accept the misalignment from this too.
- find -ls has many other printf format errors and format-printf errors:

% void
% printlong(char *name, char *accpath, struct stat *sb)
% {
% 	char modep[15];
% 
% 	(void)printf("%6lu %8"PRId64" ", (u_long) sb->st_ino, sb->st_blocks);

- format-printf errors:
   - PRI* shouldn't exist, but is used
   - space after cast
- printf format errors:
   - PRI* cannot be used here (without bogusly casting st_blocks to match it).
     st_blocks has type blkcnt_t, which might differ from int64_t.  blkcnt_t
     is actually still int64_t, and it would not be unreasonable to depend
     on this implementation to avoid using the PRI* mistake, but not to use it.

% 	(void)strmode(sb->st_mode, modep);
% 	(void)printf("%s %3u %-*s %-*s ", modep, sb->st_nlink, MAXLOGNAME - 1,
% 	    user_from_uid(sb->st_uid, 0), MAXLOGNAME - 1,
% 	    group_from_gid(sb->st_gid, 0));

Minor problems with fixed-width format for st_nlink (st_nlink can be as much
as 32767, and when it is >= 1000 it is misformatted).

Honest chumminess with the implementation for st_nlink. nlink_t happens to be
uint16_t so it promotes to u_int.

This still uses MAXLOGNAME.  Since MAXLOGNAME is not even the maximum,
misaligned comumns result if ut_user[] is actually used.  Here instead
of using %s format to get misalignment often or the current format to
get misalignment sometimes, we should probably use a debloated non-maximum
like 8 to get misalignment sometimes.

% 
% 	if (S_ISCHR(sb->st_mode) || S_ISBLK(sb->st_mode))
% 		(void)printf("%3d, %3d ", major(sb->st_rdev),
% 		    minor(sb->st_rdev));

The fixed width of 3 for device numbers will mostly work now, but it
used to cause lots of misalignment in ls output due to sparse encodings
in 32-bit minor numbers.

% 	else
% 		(void)printf("%8"PRId64" ", sb->st_size);

This use of PRI* is not even not wrong, as above.

% 	printtime(sb->st_mtime);

I fear finding too many format errors to look at the sub-functions.

% 	(void)printf("%s", name);
% 	if (S_ISLNK(sb->st_mode))
% 		printlink(accpath);
% 	(void)putchar('\n');
% }

ls doesn't use the PRI* mistake, but it uses the humanize_number() mistake
and associated bogus casts and printf format errors.  E.g., in printsize(),
for the scientificized case it blindly casts the size (which is an off_t)
to int64_t to match humanize_number()'s broken API, and in the decimal
case it uses %*jd format with 2 errors (first it bogusly casts the field
width (which has the bogus type size_t) to the bogus type u_int (the '*'
takes an int), and then it doesn't cast the size to intmax_t to match %jd.

ls uses a sub-function for printing device numbers to reduce the
problems with large and negative device numbers.  The find -ls output
is much futher from having the same format as ls -digls in this cases.
Perhaps similarly in cases involving extensions like likeACLs.  Certainly
similarly when ls -digls is forced to be colorized by an environment
variable :-).

I didn't know that find -ls existed, and would use xargs :-).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100730144702.M2522>