From owner-svn-src-head@freebsd.org Sun Jun 3 14:13:12 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7D8FBFD3C68; Sun, 3 Jun 2018 14:13:12 +0000 (UTC) (envelope-from pstef@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2B9BE87935; Sun, 3 Jun 2018 14:13:12 +0000 (UTC) (envelope-from pstef@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 0CA171FB2F; Sun, 3 Jun 2018 14:13:12 +0000 (UTC) (envelope-from pstef@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w53EDBp2016255; Sun, 3 Jun 2018 14:13:11 GMT (envelope-from pstef@FreeBSD.org) Received: (from pstef@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id w53EDBIH016253; Sun, 3 Jun 2018 14:13:11 GMT (envelope-from pstef@FreeBSD.org) Message-Id: <201806031413.w53EDBIH016253@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: pstef set sender to pstef@FreeBSD.org using -f From: Piotr Pawel Stefaniak Date: Sun, 3 Jun 2018 14:13:11 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r334560 - head/usr.bin/indent X-SVN-Group: head X-SVN-Commit-Author: pstef X-SVN-Commit-Paths: head/usr.bin/indent X-SVN-Commit-Revision: 334560 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Jun 2018 14:13:12 -0000 Author: pstef Date: Sun Jun 3 14:13:11 2018 New Revision: 334560 URL: https://svnweb.freebsd.org/changeset/base/334560 Log: indent(1): improve predictability of lexi() lexi() reads the input stream and categorizes the next token. indent will sometimes buffer up a sequence of tokens in order rearrange them. That is needed for properly cuddling else or placing braces correctly according to the chosen style (KNF vs Allman) when comments are around. The loop that buffers tokens up uses lexi() to decide if it's time to stop buffering. Then the temporary buffer is used to feed lexi() the same tokens again, this time for normal processing. The problem is that lexi() apart from recognizing the token, can change a lot of information about the current state, for example ps.last_nl, ps.keyword, buf_ptr. It also abandons leading whitespace, which is needed mainly for comment-related considerations. So the call to lexi() while tokens are buffered up and categorized can change the state before they're read again for normal processing which may easily result in changing interpretation of the current state and lead to incorrect output. To work around the problems: 1) copy the whitespace into the save_com buffer so that it will be read again when processed 2) trick lexi() into modifying a temporary copy of the parser state instead of the original. Modified: head/usr.bin/indent/indent.c head/usr.bin/indent/indent.h head/usr.bin/indent/lexi.c Modified: head/usr.bin/indent/indent.c ============================================================================== --- head/usr.bin/indent/indent.c Sun Jun 3 14:03:20 2018 (r334559) +++ head/usr.bin/indent/indent.c Sun Jun 3 14:13:11 2018 (r334560) @@ -102,6 +102,7 @@ main(int argc, char **argv) int last_else = 0; /* true iff last keyword was an else */ const char *profile_name = NULL; const char *envval = NULL; + struct parser_state transient_state; /* a copy for lookup */ /*-----------------------------------------------*\ | INITIALIZATION | @@ -324,7 +325,7 @@ main(int argc, char **argv) int is_procname; int comment_buffered = false; - type_code = lexi(); /* lexi reads one token. The actual + type_code = lexi(&ps); /* lexi reads one token. The actual * characters read are stored in "token". lexi * returns a code indicating the type of token */ is_procname = ps.procname[0]; @@ -460,9 +461,48 @@ main(int argc, char **argv) break; } } /* end of switch */ - if (type_code != 0) /* we must make this check, just in case there - * was an unexpected EOF */ - type_code = lexi(); /* read another token */ + /* + * We must make this check, just in case there was an unexpected + * EOF. + */ + if (type_code != 0) { + /* + * The only intended purpose of calling lexi() below is to + * categorize the next token in order to decide whether to + * continue buffering forthcoming tokens. Once the buffering + * is over, lexi() will be called again elsewhere on all of + * the tokens - this time for normal processing. + * + * Calling it for this purpose is a bug, because lexi() also + * changes the parser state and discards leading whitespace, + * which is needed mostly for comment-related considerations. + * + * Work around the former problem by giving lexi() a copy of + * the current parser state and discard it if the call turned + * out to be just a look ahead. + * + * Work around the latter problem by copying all whitespace + * characters into the buffer so that the later lexi() call + * will read them. + */ + if (sc_end != NULL) { + while (*buf_ptr == ' ' || *buf_ptr == '\t') { + *sc_end++ = *buf_ptr++; + if (sc_end >= &save_com[sc_size]) { + abort(); + } + } + if (buf_ptr >= buf_end) { + fill_buffer(); + } + } + transient_state = ps; + type_code = lexi(&transient_state); /* read another token */ + if (type_code != newline && type_code != form_feed && + type_code != comment && !transient_state.search_brace) { + ps = transient_state; + } + } } /* end of while (search_brace) */ last_else = 0; check_type: Modified: head/usr.bin/indent/indent.h ============================================================================== --- head/usr.bin/indent/indent.h Sun Jun 3 14:03:20 2018 (r334559) +++ head/usr.bin/indent/indent.h Sun Jun 3 14:13:11 2018 (r334560) @@ -36,7 +36,7 @@ int compute_code_target(void); int compute_label_target(void); int count_spaces(int, char *); int count_spaces_until(int, char *, char *); -int lexi(void); +int lexi(struct parser_state *); void diag2(int, const char *); void diag3(int, const char *, int); void diag4(int, const char *, int, int); Modified: head/usr.bin/indent/lexi.c ============================================================================== --- head/usr.bin/indent/lexi.c Sun Jun 3 14:03:20 2018 (r334559) +++ head/usr.bin/indent/lexi.c Sun Jun 3 14:13:11 2018 (r334560) @@ -141,7 +141,7 @@ strcmp_type(const void *e1, const void *e2) } int -lexi(void) +lexi(struct parser_state *state) { int unary_delim; /* this is set to 1 if the current token * forces a following operator to be unary */ @@ -152,12 +152,13 @@ lexi(void) e_token = s_token; /* point to start of place to save token */ unary_delim = false; - ps.col_1 = ps.last_nl; /* tell world that this token started in - * column 1 iff the last thing scanned was nl */ - ps.last_nl = false; + state->col_1 = state->last_nl; /* tell world that this token started + * in column 1 iff the last thing + * scanned was a newline */ + state->last_nl = false; while (*buf_ptr == ' ' || *buf_ptr == '\t') { /* get rid of blanks */ - ps.col_1 = false; /* leading blanks imply token is not in column + state->col_1 = false; /* leading blanks imply token is not in column * 1 */ if (++buf_ptr >= buf_end) fill_buffer(); @@ -281,18 +282,19 @@ lexi(void) if (++buf_ptr >= buf_end) fill_buffer(); } - ps.keyword = 0; - if (l_struct && !ps.p_l_follow) { + state->keyword = 0; + if (l_struct && !state->p_l_follow) { /* if last token was 'struct' and we're not * in parentheses, then this token * should be treated as a declaration */ l_struct = false; last_code = ident; - ps.last_u_d = true; + state->last_u_d = true; return (decl); } - ps.last_u_d = l_struct; /* Operator after identifier is binary - * unless last token was 'struct' */ + state->last_u_d = l_struct; /* Operator after identifier is + * binary unless last token was + * 'struct' */ l_struct = false; last_code = ident; /* Remember that this is the code we will * return */ @@ -310,13 +312,13 @@ lexi(void) strcmp(u, "_t") == 0) || (typename_top >= 0 && bsearch(s_token, typenames, typename_top + 1, sizeof(typenames[0]), strcmp_type))) { - ps.keyword = 4; /* a type name */ - ps.last_u_d = true; + state->keyword = 4; /* a type name */ + state->last_u_d = true; goto found_typename; } } else { /* we have a keyword */ - ps.keyword = p->rwcode; - ps.last_u_d = true; + state->keyword = p->rwcode; + state->last_u_d = true; switch (p->rwcode) { case 7: /* it is a switch */ return (swstmt); @@ -333,9 +335,9 @@ lexi(void) case 4: /* one of the declaration keywords */ found_typename: - if (ps.p_l_follow) { + if (state->p_l_follow) { /* inside parens: cast, param list, offsetof or sizeof */ - ps.cast_mask |= (1 << ps.p_l_follow) & ~ps.not_cast_mask; + state->cast_mask |= (1 << state->p_l_follow) & ~state->not_cast_mask; break; } last_code = decl; @@ -358,15 +360,15 @@ lexi(void) return (ident); } /* end of switch */ } /* end of if (found_it) */ - if (*buf_ptr == '(' && ps.tos <= 1 && ps.ind_level == 0 && - ps.in_parameter_declaration == 0 && ps.block_init == 0) { + if (*buf_ptr == '(' && state->tos <= 1 && state->ind_level == 0 && + state->in_parameter_declaration == 0 && state->block_init == 0) { char *tp = buf_ptr; while (tp < buf_end) if (*tp++ == ')' && (*tp == ';' || *tp == ',')) goto not_proc; - strncpy(ps.procname, token, sizeof ps.procname - 1); - if (ps.in_decl) - ps.in_parameter_declaration = 1; + strncpy(state->procname, token, sizeof state->procname - 1); + if (state->in_decl) + state->in_parameter_declaration = 1; return (last_code = funcname); not_proc:; } @@ -376,19 +378,19 @@ lexi(void) * typedefd */ if (((*buf_ptr == '*' && buf_ptr[1] != '=') || isalpha(*buf_ptr) || *buf_ptr == '_') - && !ps.p_l_follow - && !ps.block_init - && (ps.last_token == rparen || ps.last_token == semicolon || - ps.last_token == decl || - ps.last_token == lbrace || ps.last_token == rbrace)) { - ps.keyword = 4; /* a type name */ - ps.last_u_d = true; + && !state->p_l_follow + && !state->block_init + && (state->last_token == rparen || state->last_token == semicolon || + state->last_token == decl || + state->last_token == lbrace || state->last_token == rbrace)) { + state->keyword = 4; /* a type name */ + state->last_u_d = true; last_code = decl; return decl; } if (last_code == decl) /* if this is a declared variable, then * following sign is unary */ - ps.last_u_d = true; /* will make "int a -1" work */ + state->last_u_d = true; /* will make "int a -1" work */ last_code = ident; return (ident); /* the ident is not in the list */ } /* end of procesing for alpanum character */ @@ -403,8 +405,8 @@ lexi(void) switch (*token) { case '\n': - unary_delim = ps.last_u_d; - ps.last_nl = true; /* remember that we just had a newline */ + unary_delim = state->last_u_d; + state->last_nl = true; /* remember that we just had a newline */ code = (had_eof ? 0 : newline); /* @@ -473,7 +475,7 @@ stop_lit: break; case '#': - unary_delim = ps.last_u_d; + unary_delim = state->last_u_d; code = preesc; break; @@ -496,21 +498,21 @@ stop_lit: unary_delim = true; /* - * if (ps.in_or_st) ps.block_init = 1; + * if (state->in_or_st) state->block_init = 1; */ - /* ? code = ps.block_init ? lparen : lbrace; */ + /* ? code = state->block_init ? lparen : lbrace; */ code = lbrace; break; case ('}'): unary_delim = true; - /* ? code = ps.block_init ? rparen : rbrace; */ + /* ? code = state->block_init ? rparen : rbrace; */ code = rbrace; break; case 014: /* a form feed */ - unary_delim = ps.last_u_d; - ps.last_nl = true; /* remember this so we can set 'ps.col_1' + unary_delim = state->last_u_d; + state->last_nl = true; /* remember this so we can set 'state->col_1' * right */ code = form_feed; break; @@ -527,7 +529,7 @@ stop_lit: case '-': case '+': /* check for -, +, --, ++ */ - code = (ps.last_u_d ? unary_op : binary_op); + code = (state->last_u_d ? unary_op : binary_op); unary_delim = true; if (*buf_ptr == token[0]) { @@ -535,7 +537,7 @@ stop_lit: *e_token++ = *buf_ptr++; /* buffer overflow will be checked at end of loop */ if (last_code == ident || last_code == rparen) { - code = (ps.last_u_d ? unary_op : postop); + code = (state->last_u_d ? unary_op : postop); /* check for following ++ or -- */ unary_delim = false; } @@ -548,14 +550,14 @@ stop_lit: *e_token++ = *buf_ptr++; unary_delim = false; code = unary_op; - ps.want_blank = false; + state->want_blank = false; } break; /* buffer overflow will be checked at end of * switch */ case '=': - if (ps.in_or_st) - ps.block_init = 1; + if (state->in_or_st) + state->block_init = 1; #ifdef undef if (chartype[*buf_ptr] == opchar) { /* we have two char assignment */ e_token[-1] = *buf_ptr++; @@ -586,7 +588,7 @@ stop_lit: } if (*buf_ptr == '=') *e_token++ = *buf_ptr++; - code = (ps.last_u_d ? unary_op : binary_op); + code = (state->last_u_d ? unary_op : binary_op); unary_delim = true; break; @@ -599,7 +601,7 @@ stop_lit: fill_buffer(); code = comment; - unary_delim = ps.last_u_d; + unary_delim = state->last_u_d; break; } while (*(e_token - 1) == *buf_ptr || *buf_ptr == '=') { @@ -610,7 +612,7 @@ stop_lit: if (++buf_ptr >= buf_end) fill_buffer(); } - code = (ps.last_u_d ? unary_op : binary_op); + code = (state->last_u_d ? unary_op : binary_op); unary_delim = true; @@ -621,7 +623,7 @@ stop_lit: } if (buf_ptr >= buf_end) /* check for input buffer empty */ fill_buffer(); - ps.last_u_d = unary_delim; + state->last_u_d = unary_delim; *e_token = '\0'; /* null terminate the token */ return (code); }