Date: Sun, 18 Oct 2020 15:44:20 -0600 From: Bob Proulx <bob@proulx.com> To: freebsd-questions@freebsd.org Subject: Re: sh scripting question Message-ID: <20201018144327254822114@bob.proulx.com> In-Reply-To: <444kmtudmy.fsf@be-well.ilk.org> References: <d50ba2c9-617f-6842-ef89-f5933be8f8b3@hotmail.com> <DB8PR06MB64427D88E17F02711EE657A3F6030@DB8PR06MB6442.eurprd06.prod.outlook.com> <20201016113408.16d58d68@archlinux> <DB8PR06MB644292D3C0309B5DADADF69BF6030@DB8PR06MB6442.eurprd06.prod.outlook.com> <24457.35680.223661.203846@jerusalem.litteratus.org> <444kmtudmy.fsf@be-well.ilk.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Lowell Gilbert wrote: > Robert Huff writes: > > OP here. > > The filenames at hand were generated by "find" and - on casual > > inspection - have no oddities other than the embedded spaces. > > Since find is in use, I think the canonical solution is > to use "find -print0"..."xargs -0" Since GNU find introduced -print0 and xargs -0 that at one time was the Best Practice since that combination handled arbitrary file names as efficiently as was available at the time. However since then find now provides an even better way! Plus it is now a POSIX standard and therefore a POSIX portable scripting way too. Here is an example, I will use a "ls -ld" command just to make it a real concrete example and perhaps easier to read that way. find . -exec ls -ld {} + In this there is no need for xargs as find can do it all itself internally. There is no need for -print0 since there is no communication to another process needed. -exec utility [argument ...] {} + Same as -exec, except that “{}” is replaced with as many pathnames as possible for each invocation of utility. This behaviour is similar to that of xargs(1). The primary always returns true; if at least one invocation of utility returns a non-zero exit status, find will return a non-zero exit status. For the advocates of -execdir I will mention it here so that they know I am aware of it too. :-) And since we are talking about whitespace and arbitrary shell meta-characters here is an example to show that there is no problem with them when using find in this way. rwp@outrage:~$ mkdir /tmp/junk rwp@outrage:~$ cd /tmp/junk rwp@outrage:/tmp/junk$ touch " \" \" \" \";.txt \"" rwp@outrage:/tmp/junk$ find . -type f -exec ls -ld {} + -rw-r--r-- 1 rwp wheel 0 Oct 18 14:57 ./ " " " ";.txt " And now it is time to show off how good 'find' can be, even if it is an even more obscure example. But it is another useful shell programming idiom and good to know about. Since find can invoke any arbitrary command with quoted arguments as needed. rwp@outrage:/tmp/junk$ find . -type f -exec sh -c 'ls -ld "$@"' sh {} + -rw-r--r-- 1 rwp wheel 0 Oct 18 14:57 ./ " " " ";.txt " Any external program utility that can be called from the command line can be called arbitrarily with all of the proper quoting in place. In the above we invoke the standard shell with "sh -c" and pass it a list of arguments with "{} +". In the script itself we use "$@" to ensure that the shell (the sh -c part) quoted each argument correctly to pass it to the command. In this case the command was my example use of "ls -ld". And that leaves us with the mystery of the second "sh" on the line which is due to the use of "sh -c". It's the command process name and required when passing arguments to "sh -c". > But it's not always convenient to use that mechanism, so > there are other methods that have been used quite a few > times, depending on the tools the user is comfortable > with. In my case, it might involve an inline sed > invocation to add the quoting for the spaces. Being comfortable with the tools on the workbench is important. Those are the tools that will get used. If they get the job done then whatever the user is most comfortable with is okay. But if one is wishing to improve their skills then learning something new every so often is even better! :-) Bob P.S. Here is a longer description of the command name argument when using "sh -c". The "man sh" docs on FreeBSD only say this. sh [-/+abCEefhIimnPpTuVvx] [-/+o longname] -c string [name [arg ...]] It's the "name" field in that synopsis. However the rest of the man page for sh for -c is less than great since it does not go into the detail needed. The -c option causes the commands to be read from the string operand instead of from the standard input. Keep in mind that this option only accepts a single string as its argument, hence multi-word strings must be quoted. That's sad. Missing the description of "name" there. The bash manual does a better job of this description in the bash manual. -c If the -c option is present, then commands are read from the first non-option argument command_string. If there are arguments after the command_string, the first argument is assigned to $0 and any remaining arguments are assigned to the positional parameters. The assignment to $0 sets the name of the shell, which is used in warning and error messages. And this is backed up by the online standards. https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html sh -c [-abCefhimnuvx] [-o option]... [+abCefhimnuvx] [+o option]... command_string [command_name [argument...]] -c Read commands from the command_string operand. Set the value of special parameter 0 (see Special Parameters) from the value of the command_name operand and the positional parameters ($1, $2, and so on) in sequence from the remaining argument operands. When using sh -c the first argument should be the name to assign to the process, argv[0], and subsequent arguments are the actual arguments. So with sh -c one always needs to specify the process name. And if not then the first actual argument is lost. And so that is yet another shell idiom. rwp@outrage:/tmp/junk$ sh -c 'echo "$@"' one two three two three rwp@outrage:/tmp/junk$ sh -c 'echo "$@"' sh one two three one two three rwp@outrage:/tmp/junk$ sh -c 'echo arg0="$0" "$@"' one two three arg0=one two three And so when using sh -c we always give it the process name as the first argument. Hence the use of "sh" before the "{} +" when using it with the find utility.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20201018144327254822114>