Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 Jul 2007 12:03:06 GMT
From:      Christoph Mallon <christoph.mallon@FreeBSD.org>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   misc/114578: wide character printing using swprintf(dst, n, "%ls", txt) fails depending on LC_CTYPE
Message-ID:  <200707141203.l6EC36fP016824@www.freebsd.org>
Resent-Message-ID: <200707141210.l6ECA2cS000370@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         114578
>Category:       misc
>Synopsis:       wide character printing using swprintf(dst, n, "%ls", txt) fails depending on LC_CTYPE
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Jul 14 12:10:02 GMT 2007
>Closed-Date:
>Last-Modified:
>Originator:     Christoph Mallon
>Release:        RELENG_&
>Organization:
>Environment:
FreeBSD tron.homeunix.org 6.2-STABLE FreeBSD 6.2-STABLE #0: Thu Jan 25 22:43:11 CET 2007     root@tron.homeunix.org:/usr/obj/usr/src/sys/KERNEL  i386
>Description:
Copying a string using swprintf() and the format specifier "%ls" (or "%S") fails if the to be copied string contains characters, which the currently set LC_CTYPE aspect of the locale does not support.
The test program below should just copy the wide character string "Mir" (in cyrillic letters) to an array of wide characters using swprintf(). When the LC_CTYPE aspect of the locale is set to "C" (other encodings like ISO8859-15 fail, too), this call fails and -1 is returned. When the LC_CTYPE aspect of the locale is set to UTF-8 (or probably other encodings, which support full unicode) the call succeeds and returns 3 as expected.
I wonder if this behaviour is correct, because no encoding conversions should be involved here. I could not find anything about conversions in the ANSI C99 standard (§7.24.2.1 clause 8 bullet "s"), either. Only conversions if the format is "%s" are mentioned, which is logical.
Other implementations (glibc and Windows libc) copy the string correctly, when LC_CTYPE is set to "C".
I just discovered, that it already fails, if the format string itself contains characters from a range, that the current LC_CTYPE does not support.
>How-To-Repeat:
Here is a simple test program. It should (imo) print "3" twice, for three copied characters, each. It prints "-1" and "3" though.

#include <locale.h>
#include <stdio.h>
#include <wchar.h>

static const wchar_t txt[] = { 0x41C, 0x43D, 0x440, 0 }; // "Mir" in cyrillic

int main(void)
{
  wchar_t str[4];
  int ret;

  setlocale(LC_CTYPE, "C");
  ret = swprintf(str, sizeof(str) / sizeof(*str), L"%ls", txt);
  printf("%d\n", ret);

  setlocale(LC_CTYPE, "UTF-8");
  ret = swprintf(str, sizeof(str) / sizeof(*str), L"%ls", txt);
  printf("%d\n", ret);

  return 0;
}
>Fix:
I didn't dive into the inner workings of *printf(), sorry.

>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200707141203.l6EC36fP016824>