Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 18 Oct 2020 15:44:20 -0600
From:      Bob Proulx <bob@proulx.com>
To:        freebsd-questions@freebsd.org
Subject:   Re: sh scripting question
Message-ID:  <20201018144327254822114@bob.proulx.com>
In-Reply-To: <444kmtudmy.fsf@be-well.ilk.org>
References:  <d50ba2c9-617f-6842-ef89-f5933be8f8b3@hotmail.com> <DB8PR06MB64427D88E17F02711EE657A3F6030@DB8PR06MB6442.eurprd06.prod.outlook.com> <20201016113408.16d58d68@archlinux> <DB8PR06MB644292D3C0309B5DADADF69BF6030@DB8PR06MB6442.eurprd06.prod.outlook.com> <24457.35680.223661.203846@jerusalem.litteratus.org> <444kmtudmy.fsf@be-well.ilk.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Lowell Gilbert wrote:
> Robert Huff writes:
> > 	OP here.
> > 	The filenames at hand were generated by "find" and - on casual
> > inspection - have no oddities other than the embedded spaces.
> 
> Since find is in use, I think the canonical solution is
> to use "find -print0"..."xargs -0"

Since GNU find introduced -print0 and xargs -0 that at one time was
the Best Practice since that combination handled arbitrary file names
as efficiently as was available at the time.

However since then find now provides an even better way!  Plus it is
now a POSIX standard and therefore a POSIX portable scripting way too.
Here is an example, I will use a "ls -ld" command just to make it a
real concrete example and perhaps easier to read that way.

    find . -exec ls -ld {} +

In this there is no need for xargs as find can do it all itself
internally.  There is no need for -print0 since there is no
communication to another process needed.

     -exec utility [argument ...] {} +
             Same as -exec, except that “{}” is replaced with as many
             pathnames as possible for each invocation of utility.  This
             behaviour is similar to that of xargs(1).  The primary always
             returns true; if at least one invocation of utility returns a
             non-zero exit status, find will return a non-zero exit status.

For the advocates of -execdir I will mention it here so that they know
I am aware of it too. :-)

And since we are talking about whitespace and arbitrary shell
meta-characters here is an example to show that there is no problem
with them when using find in this way.

    rwp@outrage:~$ mkdir /tmp/junk
    rwp@outrage:~$ cd /tmp/junk
    rwp@outrage:/tmp/junk$ touch " \" \" \" \";.txt \""
    rwp@outrage:/tmp/junk$ find . -type f -exec ls -ld {} +
    -rw-r--r--  1 rwp  wheel  0 Oct 18 14:57 ./ " " " ";.txt "

And now it is time to show off how good 'find' can be, even if it is
an even more obscure example.  But it is another useful shell
programming idiom and good to know about.  Since find can invoke any
arbitrary command with quoted arguments as needed.

    rwp@outrage:/tmp/junk$ find . -type f -exec sh -c 'ls -ld "$@"' sh {} +
    -rw-r--r--  1 rwp  wheel  0 Oct 18 14:57 ./ " " " ";.txt "

Any external program utility that can be called from the command line
can be called arbitrarily with all of the proper quoting in place.

In the above we invoke the standard shell with "sh -c" and pass it a
list of arguments with "{} +".  In the script itself we use "$@" to
ensure that the shell (the sh -c part) quoted each argument correctly
to pass it to the command.  In this case the command was my example
use of "ls -ld".  And that leaves us with the mystery of the second
"sh" on the line which is due to the use of "sh -c".  It's the command
process name and required when passing arguments to "sh -c".

> But it's not always convenient to use that mechanism, so
> there are other methods that have been used quite a few
> times, depending on the tools the user is comfortable
> with. In my case, it might involve an inline sed
> invocation to add the quoting for the spaces.

Being comfortable with the tools on the workbench is important.  Those
are the tools that will get used.  If they get the job done then
whatever the user is most comfortable with is okay.

But if one is wishing to improve their skills then learning something
new every so often is even better! :-)

Bob

P.S. Here is a longer description of the command name argument when
using "sh -c".  The "man sh" docs on FreeBSD only say this.

     sh [-/+abCEefhIimnPpTuVvx] [-/+o longname] -c string [name [arg ...]]

It's the "name" field in that synopsis.  However the rest of the man
page for sh for -c is less than great since it does not go into the
detail needed.

     The -c option causes the commands to be read from the string operand
     instead of from the standard input.  Keep in mind that this option only
     accepts a single string as its argument, hence multi-word strings must be
     quoted.

That's sad.  Missing the description of "name" there.  The bash manual
does a better job of this description in the bash manual.

       -c        If the -c option is present, then commands are read from the
                 first non-option argument command_string.  If there are
                 arguments after the command_string, the first argument is
                 assigned to $0 and any remaining arguments are assigned to
                 the positional parameters.  The assignment to $0 sets the
                 name of the shell, which is used in warning and error
                 messages.

And this is backed up by the online standards.

    https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html

    sh -c [-abCefhimnuvx] [-o option]... [+abCefhimnuvx] [+o option]...
       command_string [command_name [argument...]]

    -c Read commands from the command_string operand. Set the value of
    special parameter 0 (see Special Parameters) from the value of the
    command_name operand and the positional parameters ($1, $2, and so
    on) in sequence from the remaining argument operands.

When using sh -c the first argument should be the name to assign to
the process, argv[0], and subsequent arguments are the actual
arguments.  So with sh -c one always needs to specify the process
name.  And if not then the first actual argument is lost.  And so that
is yet another shell idiom.

    rwp@outrage:/tmp/junk$ sh -c 'echo "$@"' one two three
    two three

    rwp@outrage:/tmp/junk$ sh -c 'echo "$@"' sh one two three
    one two three

    rwp@outrage:/tmp/junk$ sh -c 'echo arg0="$0" "$@"' one two three
    arg0=one two three

And so when using sh -c we always give it the process name as the
first argument.  Hence the use of "sh" before the "{} +" when using it
with the find utility.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20201018144327254822114>