From owner-freebsd-arch@FreeBSD.ORG Sun May 15 09:24:37 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CD11D16A4CE; Sun, 15 May 2005 09:24:37 +0000 (GMT) Received: from smtp2.server.rpi.edu (smtp2.server.rpi.edu [128.113.2.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4C16643DBF; Sun, 15 May 2005 09:24:37 +0000 (GMT) (envelope-from gad@FreeBSD.org) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp2.server.rpi.edu (8.13.0/8.13.0) with ESMTP id j4F9OZ6Y014161; Sun, 15 May 2005 05:24:36 -0400 Mime-Version: 1.0 Message-Id: In-Reply-To: <4284D038.50805@FreeBSD.org> References: <200410020349.i923nG8v021675@northstar.hetzel.org> <20041002052856.GE17792@nexus.dglawrence.com> <20041002233542.GL714@nexus.dglawrence.com> <421DAD8F.6000704@portaone.com> <4284D038.50805@FreeBSD.org> Date: Sun, 15 May 2005 05:24:35 -0400 To: Maxim Sobolev From: Garance A Drosehn Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-CanItPRO-Stream: default X-RPI-SA-Score: undef - spam-scanning disabled X-Scanned-By: CanIt (www . canit . ca) on 128.113.2.2 cc: freebsd-arch@FreeBSD.org Subject: Re: Bug in #! processing - One More Time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 May 2005 09:24:38 -0000 At 7:05 PM +0300 5/13/05, Maxim Sobolev wrote: >Attached please find patch which rips any special processing >of command line arguments. It should put FreeBSD into the very >same ship with the rest of unices and linuces out there. > >...This corresponds to the case (3) from the Garance's original >message. > >I've run this change through buildworld. Fwiw, this is the patch that I have been working with for the same purpose. I might trim down the comments some more. This includes a kludge to recognize '#!#<' as a way to trigger the historical parsing behavior. I have another version which also recognizes '#!#+' as a way to trigger a whole bunch of debug messages, but that's probably of no interest to anyone but me. This patch is also at: http://people.freebsd.org/imgact_shell.diff and the file you'd end up with by applying the patch is at: http://people.freebsd.org/imgact_shell.c This has installed and tested in i386 and powerpc, and sometime on Sunday I'll have it installed on a sparc64 machine. (it's building now, but my ultra10 machine is much much slower than my i386 and my Mac-mini...). Index: imgact_shell.c =================================================================== RCS file: /home/ncvs/src/sys/kern/imgact_shell.c,v retrieving revision 1.32 diff -u -r1.32 imgact_shell.c --- imgact_shell.c 25 Feb 2005 10:17:53 -0000 1.32 +++ imgact_shell.c 15 May 2005 09:07:13 -0000 @@ -36,6 +36,15 @@ #include #include +#define KEEP_OLDCODE 1 +#if BYTE_ORDER == LITTLE_ENDIAN /* temp for OLD_CODE kludge */ +#define DBG_MAGIC 0x2B23 /* #+ in "little-endian" */ +#define OLD_MAGIC 0x3C23 /* #< */ +#else +#define DBG_MAGIC 0x232B /* #+ in big-endian */ +#define OLD_MAGIC 0x233C /* #< */ +#endif + #if BYTE_ORDER == LITTLE_ENDIAN #define SHELLMAGIC 0x2123 /* #! */ #else @@ -43,15 +52,66 @@ #endif /* - * Shell interpreter image activator. An interpreter name beginning - * at imgp->args->begin_argv is the minimal successful exit requirement. + * At the time of this writing, MAXSHELLCMDLEN == PAGE_SIZE. This is + * significant because the caller has only mapped in one page of the + * file we're reading. This code should be changed to know how to + * read in the second page, but I'm not doing that just yet... + */ +#if MAXSHELLCMDLEN > PAGE_SIZE +#error "MAXSHELLCMDLEN is larger than a single page!" +#endif + +/** + * Shell interpreter image activator. An interpreter name beginning at + * imgp->args->begin_argv is the minimal successful exit requirement. + * + * If the given file is a shell-script, then the first line will start + * with the two characters `#!' (aka SHELLMAGIC), followed by the name + * of the shell-interpreter to run, followed by zero or more tokens. + * + * If there are *any* tokens, then we start up the interpreter such that + * it will see: + * arg[0] -> The name of interpreter as specified after `#!' in the + * first line of the script. The interpreter name must + * not be longer than MAXSHELLCMDLEN bytes. + * arg[1] -> *If* there are any additional tokens on the first line, + * then we add a new arg[1], which is a copy of the rest of + * that line. The copy starts at the first token after the + * interpreter name. We leave it to the interpreter to + * parse the tokens in that value. + * arg[x] -> the full pathname of the script. This will either be + * arg[2] or arg[1], depending on whether or not tokens + * were found after the interpreter name. + * arg[x+1] -> whatever arguments that were specified on the original + * command line (if the user had specified any). + * + * This processing is described in the execve(2) man page. + */ + +/* + * HISTORICAL NOTE: From 1993 to mid-2005, FreeBSD parsed out the tokens as + * found on the first line of the script, and setup each token as a separate + * value in arg[]. This extra processing did not match the behavior of other + * OS's, and caused a few subtle problems. For one, it meant the kernel was + * deciding how those values should be parsed (wrt characters for quoting or + * comments, etc), while the interpreter might have other rules for parsing. + * It also meant the interpreter had no way of knowing which arguments came + * from the first line of the shell script, and which arguments were specified + * by the user on the command line. + * + * Luckily, only few things in the base system would notice that non-standard + * processing (mainly /bin/sh and /usr/bin/env). And for programs which are + * not in the base system, the "newer" behavior matches how NetBSD, OpenBSD, + * Linux, Solaris, AIX, IRIX, and some other Unixes have always setup the + * arg-list for the interpreter. So if a program can handle this behavior on + * any of those other OS's, it should be able to handle it for FreeBSD too. */ int exec_shell_imgact(imgp) struct image_params *imgp; { const char *image_header = imgp->image_header; - const char *ihp; + const char *ihp, *interpb, *interpe, *maxp, *optb, *opte; int error, offset; size_t length, clength; struct vattr vattr; @@ -79,14 +139,34 @@ if (error) return (error); - clength = (vattr.va_size > MAXSHELLCMDLEN) ? - MAXSHELLCMDLEN : vattr.va_size; + /* + * Copy shell name and arguments from image_header into a string + * buffer. Remember that the caller has mapped only the + * first page of the file into memory. + */ + clength = (vattr.va_size > PAGE_SIZE) ? PAGE_SIZE : vattr.va_size; + + maxp = &image_header[clength]; + ihp = &image_header[2]; +#if KEEP_OLDCODE + /* + * XXX - Temporarily provide a quick-and-dirty way to get the + * older, non-standard option-parsing behavior, just in case + * someone finds themselves in an emergency where they need it. + * This will not be documented. It is only for initial testing. + */ + if (*(const short *)ihp == OLD_MAGIC) + ihp += 2; + else + goto new_code; + interpb = ihp; + /* * Figure out the number of bytes that need to be reserved in the * argument string to copy the contents of the interpreter's command * line into the argument string. */ - ihp = &image_header[2]; + ihp = interpb; offset = 0; while (ihp < &image_header[clength]) { /* Skip any whitespace */ @@ -152,7 +232,7 @@ * Loop through the interpreter name yet again, copying as * we go. */ - ihp = &image_header[2]; + ihp = interpb; offset = 0; while (ihp < &image_header[clength]) { /* Skip whitespace */ @@ -174,8 +254,96 @@ imgp->args->begin_argv[offset++] = '\0'; imgp->args->argc++; } + goto common_end; +new_code: +#endif + /* + * Find the beginning and end of the interpreter_name. If the + * line does not include any interpreter, or if the name which + * was found is too long, we bail out. + */ + while (ihp < maxp && ((*ihp == ' ') || (*ihp == '\t'))) + ihp++; + interpb = ihp; + while (ihp < maxp && ((*ihp != ' ') && (*ihp != '\t') && (*ihp != '\n') + && (*ihp != '\0'))) + ihp++; + interpe = ihp; + if (interpb == interpe) + return (ENOEXEC); + if ((interpe - interpb) >= MAXSHELLCMDLEN) + return (ENAMETOOLONG); + + /* + * Find the beginning of the options (if any), and the end-of-line. + * Then trim the trailing blanks off the value. Note that some + * other operating systems do *not* trim the trailing whitespace... + */ + while (ihp < maxp && ((*ihp == ' ') || (*ihp == '\t'))) + ihp++; + optb = ihp; + while (ihp < maxp && ((*ihp != '\n') && (*ihp != '\0'))) + ihp++; + opte = ihp; + while (--ihp > interpe && ((*ihp == ' ') || (*ihp == '\t'))) + opte = ihp; /* + * We need to "pop" (remove) the present value of arg[0], and "push" + * either two or three new values in the arg[] list. To do this, + * we first shift all the other values in the `begin_argv' area to + * provide the exact amount of room for the values added. Set up + * `offset' as the number of bytes to be added to the `begin_argv' + * area, and 'length' as the number of bytes being removed. + */ + offset = interpe - interpb + 1; /* interpreter */ + if (opte != optb) /* options (if any) */ + offset += opte - optb + 1; + offset += strlen(imgp->args->fname) + 1; /* fname of script */ + length = (imgp->args->argc == 0) ? 0 : + strlen(imgp->args->begin_argv) + 1; /* bytes to delete */ + + if (offset - length > imgp->args->stringspace) + return (E2BIG); + + bcopy(imgp->args->begin_argv + length, imgp->args->begin_argv + offset, + imgp->args->endp - (imgp->args->begin_argv + length)); + + offset -= length; /* calculate actual adjustment */ + imgp->args->begin_envv += offset; + imgp->args->endp += offset; + imgp->args->stringspace -= offset; + + /* + * If there was no arg[0] when we started, then the interpreter_name + * is adding an argument (instead of replacing the arg[0] we started + * with). And we're always adding an argument when we include the + * full pathname of the original script. + */ + if (imgp->args->argc == 0) + imgp->args->argc = 1; + imgp->args->argc++; + + /* + * The original arg[] list has been shifted appropriately. Copy in + * the interpreter name and options-string. + */ + length = interpe - interpb; + bcopy(interpb, imgp->args->buf, length); + *(imgp->args->buf + length) = '\0'; + offset = length + 1; + if (opte != optb) { + length = opte - optb; + bcopy(optb, imgp->args->buf + offset, length); + *(imgp->args->buf + offset + length) = '\0'; + offset += length + 1; + imgp->args->argc++; + } + +#if KEEP_OLDCODE +common_end: +#endif + /* * Finally, add the filename onto the end for the interpreter to * use and copy the interpreter's name to imgp->interpreter_name * for exec to use. -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@FreeBSD.org Rensselaer Polytechnic Institute; Troy, NY; USA