Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 5 May 2014 14:50:53 +0000 (UTC)
From:      "Pedro F. Giffuni" <pfg@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-10@freebsd.org
Subject:   svn commit: r265361 - stable/10/lib/libc/locale
Message-ID:  <201405051450.s45EorXF037442@svn.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: pfg
Date: Mon May  5 14:50:53 2014
New Revision: 265361
URL: http://svnweb.freebsd.org/changeset/base/265361

Log:
  MFC	r265095, r265167;
  
  citrus: Avoid invalid code points.
  
  The UTF-8 decoder should not accept byte sequences which decode to
  unicode code positions U+D800 to U+DFFF (UTF-16 surrogates).[1]
  
  Contrary to the original OpenBSD patch, we do pass U+FFFE and U+FFFF,
  both values are valid "non-characters" [2] and must be mapped through
  UTFs.
  
  [1] http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
  [2] http://www.unicode.org/faq/private_use.html
  
  Reported by:	Stefan Sperling [1]
  Thanks to:	jilles [2]
  Obtained from:	OpenBSD

Modified:
  stable/10/lib/libc/locale/utf8.c

Modified: stable/10/lib/libc/locale/utf8.c
==============================================================================
--- stable/10/lib/libc/locale/utf8.c	Mon May  5 14:50:44 2014	(r265360)
+++ stable/10/lib/libc/locale/utf8.c	Mon May  5 14:50:53 2014	(r265361)
@@ -203,6 +203,13 @@ _UTF8_mbrtowc(wchar_t * __restrict pwc, 
 		errno = EILSEQ;
 		return ((size_t)-1);
 	}
+	if (wch >= 0xd800 && wch <= 0xdfff) {
+		/*
+		 * Malformed input; invalid code points.
+		 */
+		errno = EILSEQ;
+		return ((size_t)-1);
+	}
 	if (pwc != NULL)
 		*pwc = wch;
 	us->want = 0;



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201405051450.s45EorXF037442>