From owner-freebsd-bugs@FreeBSD.ORG Mon Sep 20 19:31:35 2010 Return-Path: Delivered-To: freebsd-bugs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 923001065672; Mon, 20 Sep 2010 19:31:35 +0000 (UTC) (envelope-from pete@twisted.org.uk) Received: from toybox.twisted.org.uk (toybox.twisted.org.uk [IPv6:2002:c390:806:1::6]) by mx1.freebsd.org (Postfix) with ESMTP id 5F2108FC1E; Mon, 20 Sep 2010 19:31:35 +0000 (UTC) Received: from pete by toybox.twisted.org.uk with local (Exim 4.72 (FreeBSD)) (envelope-from ) id 1Oxm5N-000D1p-C3; Mon, 20 Sep 2010 20:31:33 +0100 Date: Mon, 20 Sep 2010 20:31:33 +0100 Message-Id: To: freebsd-bugs@FreeBSD.org, jh@FreeBSD.org In-Reply-To: <201009201332.o8KDWmlo074276@freefall.freebsd.org> From: Pete French Cc: Subject: Re: bin/150727: diff on UTF-8 text files thinks they are binary - regression from 7.X X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Sep 2010 19:31:35 -0000 > I couldn't reproduce this with simple UTF-8 files: I just looked through my example files in detail, and it turns out the problem is not with UTF-8 after all, but with NULL characters which are also in the file. This is what trips up 'diff' - and though it it a charge from 7.X I am not sure that it is really a bug. Sorry for the noise - the code I used to verify that the file was a valid UTF-8 file accepts the zero bytes quite happily and says that it is a text file.