From owner-freebsd-bugs@freebsd.org Thu Aug 13 19:57:48 2015 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 914979B8B70 for ; Thu, 13 Aug 2015 19:57:48 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 59DE8CE1 for ; Thu, 13 Aug 2015 19:57:48 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t7DJvmrO071586 for ; Thu, 13 Aug 2015 19:57:48 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 202290] /usr/bin/vi conversion error on valid character Date: Thu, 13 Aug 2015 19:57:48 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: 10.2-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: lampa@fit.vutbr.cz X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Aug 2015 19:57:48 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=202290 --- Comment #1 from lampa@fit.vutbr.cz --- Looking at /usr/src/contrib/nvi/common/exf.c file_encinit(SCR *sp) ... if (looks_utf8(buf, blen) > 1) o_set(sp, O_FILEENCODING, OS_STRDUP, "utf-8", 0); else if (!O_ISSET(sp, O_FILEENCODING) || !strncasecmp(O_STR(sp, O_FILEENCODING), "utf-8", 5)) o_set(sp, O_FILEENCODING, OS_STRDUP, codeset(), 0); conv_enc(sp, O_FILEENCODING, 0); } 1. There is no way how to disable auto detection of encoding, if looks_utf8() returns 2, then there you are lost!!! You can setup your .exrc, but it will be ignored!!! 2. But why looks_utf() detects 0xe1 0x20 as valid utf-8? IT IS NOT VALID! Looking at /usr/src/contrib/nvi/common/encoding.c looks_utf8(const char *ibuf, size_t nbytes) ... for (n = 0; n < following; n++) { i++; if (i >= nbytes) goto done; if (buf[i] & 0x40) /* 10xxxxxx */ return -1; } That's completely wrong, it doesn't test if bit 7 is set in succeeding bytes! It should be: for (n = 0; n < following; n++) { i++; if (i >= nbytes) goto done; if ((buf[i] & 0xc0) != 0x10) /* 10xxxxxx */ return -1; } This change is was tested and works. Please fix at least broken "auto detection" before 10.2-RELEASE! But some option to disable auto-detection or honor user setting in .exrc is also required. -- You are receiving this mail because: You are the assignee for the bug.