Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 25 Jun 2015 22:07:44 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Bryan Drewery <bdrewery@freebsd.org>
Cc:        Bruce Evans <brde@optusnet.com.au>, src-committers@freebsd.org,  svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r284163 - head/bin/cp
Message-ID:  <20150625203730.F937@besplex.bde.org>
In-Reply-To: <558B745C.8040305@FreeBSD.org>
References:  <201506081924.t58JOJQw095752@svn.freebsd.org> <20150609152946.Y935@besplex.bde.org> <558B745C.8040305@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 24 Jun 2015, Bryan Drewery wrote:

> On 6/9/2015 1:28 AM, Bruce Evans wrote:
>> On Mon, 8 Jun 2015, Bryan Drewery wrote:
>>
>>> Log:
>>>  Cleanup some style(9) issues.
>>>
>>>  - Whitespace.
>>>  - Comments.
>>>  - Wrap long lines.
>>
>> cp's style had a remarlable amount of bitrot.
>>
>> This change unimproves it in some places.
>
> I have been traveling and packing. I'm replying now but won't have time
> to address the issues until next week. I was trying to avoid doing any
> of this but touched code which was horrendously misstyled and chained
> into reindenting the whole file and doing it wrong :). At this point I
> don't want to tweak this much more.

Thanks for replying.

>>> @@ -245,10 +245,10 @@ main(int argc, char *argv[])
>>>             type = FILE_TO_FILE;
>>>
>>>         if (have_trailing_slash && type == FILE_TO_FILE) {
>>> -            if (r == -1)
>>> +            if (r == -1) {
>>
>> This adds excessive braces.
>>
>>>                 errx(1, "directory %s does not exist",
>>> -                     to.p_path);
>>> -            else
>>> +                    to.p_path);
>>> +            } else
>
> It is a multi-line statement due to the hard 80-width wrap. I feel it is
> fine in this case.

OK.  The mail mangled all the tabs so this is hard to see now.

>>> @@ -379,7 +379,8 @@ copy(char *argv[], enum op type, int fts
>>>                 mode = curr->fts_statp->st_mode;
>>>                 if ((mode & (S_ISUID | S_ISGID | S_ISTXT)) ||
>>>                     ((mode | S_IRWXU) & mask) != (mode & mask))
>>> -                    if (chmod(to.p_path, mode & mask) != 0){
>>> +                    if (chmod(to.p_path, mode & mask) !=
>>> +                        0) {
>>>                         warn("chmod: %s", to.p_path);
>>>                         rval = 1;
>>>                     }
>>
>> This changes from a minor misformatting to avoid a long line to even uglier
>> formatting with a split line.
>
> I agree 100%. I did it because of our hard 80-width cut-off. What would
> the proper style be? My inclination would be to wrap at the first comma
> but then it is even more odd. I find our 80-width cut-off to be strange
> when editors/tmux/window manager/etc can resize and wrap long lines already.

I don't mind the hack of omitting the space before the final parentheses.
This gives a line length of exactly 80, which is a little too long.  The
main problem with this is that automatic formatting programs will want to
"fix" it.

I like to wrap at the last possible comma, except for printf()s I usually
wrap after the format string.  Here the last comma is also the first comma.
It is a little too early.

Editors/tmux/window manager/etc cannot wrap long lines already.  Even
indent(1) is clueless about wrapping long lines.  It mostly doesn't
do it.  Otherwise it would want to change more than it does, and make
more messes by getting the wrapping slightly wronger than the original.
How could a mere editor/window manager know:
- C syntax, so as to pick wrapping points like commas
- the project and programmer's preference for wrapping early or late
   in a context-dependent way
- special cases like the above?
Formatting away from the project or current source style to the programmer's
preferred style is relatively easy, but is very difficult to format back
to the original style so as to not change everything for a 1-line real
change.

> Actually I don't see a width restriction in style(9) at all but surely
> we have this rule documented somewhere. My guess is that it is inherited
> by KNF.

cp.c (like most utilities in bin) used to be in KNF.  I think CSRG release
engineers (mainly Bostic?) enforced them a common style (perhaps the
single release engineer's style and not quite KNF).  From the above hack
alone in cp.c, we can infer that the line length limit is 80.  The
default line length limit in indent(1) is 79, but indent(1) doesn't
actually understand line lengths.  Fuller references for the line length
limit are:

     expand 4.4BD-Lite2/usr/src/admin/style/style | grep ... (79+ dots)

This finds 1 line of length 80, and that line is broken (it has garbage
trailing tabs), and a couple of lines of length 79, and none longer.
This files gives KNF rules by example.  Whitespace in it was broken by
converting it to a man page.

     similar greps in kern and pure BSD utilities

>> It is necessary to make such changes if you
>> use indent(1) to generate and check the changes -- otherwise, indent keeps
>
> Do you have an indent configuration I can use?

I use this .indent.pro.

%%%
-TFILE
-Tfd_mask
-Tfd_set
-Tu_char
-Tu_int
-Tu_long
-Tu_short
-ta
-bad -bap -nbbb -nbc -br -nbs -c41 -cd41 -cdb -ce -ci4 -cli0 -d0 -di8 -ndj
-ei -nfc1 -nfcb -i8 -ip8 -l79 -lc77 -ldi0 -nlp -npcs -psl -sc -nsob -nv
%%%

The T directives in this are very incomplete.  The -ta directive is
supposed to handle all typedefs that spelled with a _t suffix.  It
replaces a much longer but even more incomplete list of T directives,
and mostly works (it obviously makes messes if you have some non-typedefs
spelled with a _t suffix).

>>> Modified: head/bin/cp/utils.c
>>> ==============================================================================
>>>
>>> --- head/bin/cp/utils.c    Mon Jun  8 19:13:04 2015    (r284162)
>>> +++ head/bin/cp/utils.c    Mon Jun  8 19:24:18 2015    (r284163)
>>> ...
>>> -/* Small (default) buffer size in bytes. It's inefficient for this to be
>>> - * smaller than MAXPHYS */
>>> +/*
>>> + * Small (default) buffer size in bytes. It's inefficient for this to be
>>> + * smaller than MAXPHYS.
>>> + */
>>
>> Still has unusual sentence break of 1 space.  cp uses normal sentence
>> breaks
>
> I did a minimal effort on comments and didn't clean up grammar or
> breaks. I have not adopted 2 space breaks into my style(9) conformation yet.

The above .indent.pro intentionally turns off most comment formatting, using
directives that I added many years ago.  Otherwise, almost every large
comment gets rewrapped.  I miss these directives the most in gnu indent.
gnu indent is also missing a couple of other critical directives, probably
including the relatively new -ta.  Otherwise, it is a bit smarter than
FreeBSD indent.  It actually understands the line length limit, but without
directives to control the details it tends to make messes enforcing it.

>>> @@ -345,7 +352,7 @@ setfile(struct stat *fs, int fd)
>>>     fdval = fd != -1;
>>>     islink = !fdval && S_ISLNK(fs->st_mode);
>>>     fs->st_mode &= S_ISUID | S_ISGID | S_ISVTX |
>>> -               S_IRWXU | S_IRWXG | S_IRWXO;
>>> +        S_IRWXU | S_IRWXG | S_IRWXO;
>>
>> Here the formatting was reasonable, but it was in gnu style and was hard to
>> maintain since it is not supported by indent(1).  It is still hard to
>> maintain,
>> since it has fancy splitting earlier than necessary to put the S_IS* and
>> S_IR* parts of the expressions on separate lines.  indent(1) cannot
>> reproduce
>> this splitting.  Also, with the normal indentation of the condinuation
>> line,
>> the fancy splitting is not so readable.
>
> I'm do not see how this was proper before or how it is worse now. The
> indentation is tabs and then 4 spaces. I don't see exceptions to this in
> style(9) or in other code.

I said is was improper but more readable before.  It is improper because
of the strict indentation rules.  These often prevent lining things up.
Gnu has different strict indentation rules which often give lining up.
In the above, they give something like:

 	fs->st_mode &= S_ISUID | S_ISGID | S_ISVTX
 		     | S_IRWXU | S_IRWXG | S_IRWXO;

Here the first difference from the rules is that the `|' operator goes
on a new line.  The next difference is that the continuation indent is
to line up this operator under '&='.  Perhaps it is supposed to be
left justified, but I right justified it to line up the terms.  More
usually, the assignment operator is just '=', and there is no choice for
the justifcation of any 1-letter operator under it.

All these examples split the line earlier than necessary, so as to get
3 terms from the RHS on each line.  Except in the strict KNF version,
this also gives lining up of 3 S_IS's with 3 S_IR's.  I was thinking
that there was a bit more lining up than that, but there isn't except
possibly when the macros are expanded to octal.  More lining up would
occur for similar expressions with S_I{R,W,X}{USR,GRP,OTH} -- there are
9 terms, and you might want to arrange them in a 3x3 matrix.

>>> @@ -543,8 +550,10 @@ usage(void)
>>> {
>>>
>>>     (void)fprintf(stderr, "%s\n%s\n",
>>> -"usage: cp [-R [-H | -L | -P]] [-f | -i | -n] [-alpsvx] source_file
>>> target_file",
>>> -"       cp [-R [-H | -L | -P]] [-f | -i | -n] [-alpsvx] source_file
>>> ... "
>>> -"target_directory");
>>> +        "usage: cp [-R [-H | -L | -P]] [-f | -i | -n] [-alpsvx] "
>>> +        "source_file target_file",
>>> +        "       cp [-R [-H | -L | -P]] [-f | -i | -n] [-alpsvx] "
>>> +        "source_file ... "
>>> +        "target_directory");
>>>     exit(EX_USAGE);
>>> }
>>
>> This breaks the careful outdentation and obfuscates the strings.  The
>
> Again, this broke the 80-width limit. I preferred the old way but was
> going on down the 80-width line on my screen fixing violations.
>
> I suggest we update our styles to not require this awful wrapping. It
> makes `grep -r` very difficult when strings are split up. Perhaps I am
> mistaken on the rule but we have a lot of code that needlessly wraps early.

I like outdenting long strings in printfs to column 0.  After all, they
will start in column 0 in the output.  Unfortunately, just the quotes
around them make them start in column 1 and sometimes be longer than the
source line length limit even if they are shorter than the output line
length limit.

This only works well for literal strong.  Complicated formatting directives
tend to be longer in the source than the output, and simple formatting
directives tend to be longer in the output than the source (%jd" may expand
to 20 decimal digits even with only 64-bit intmax_t).  But strings in
usage messages are mostly literal.

Note that the string splitting also bogotifies the very standard style in
usage messages, of starting with

 	(void)fprintf(stderr, "%s\n%s\n",

where there is 1 "%s\n" per line.  With literal strings and even without
C90 string concatenation, there is no need for a separate format string.
This style mainly reduces the source line lengths by 2 characters for \n,
and makes the line structure clearer.  Then any use of string concatenation
makes it unclearer again.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150625203730.F937>