Date: Sun, 9 Apr 2017 14:06:46 +0300 From: Lev Serebryakov <lev@FreeBSD.org> To: freebsd-i18n@freebsd.org Subject: citrus/BSD iconv doesn't respect ICONV_SET_DISCARD_ILSEQ flag Message-ID: <137414834.20170409140646@serebryakov.spb.ru>
next in thread | raw e-mail | index | archive | help
Hello Freebsd-i18n, I understand, that iconvctl(3) is GNU extension, but as soon as citurs iconv used by FreeBSD libc formally supports this API and ICONV_SET_DISCARD_ILSEQ flag, they should work, IMHO. But they don't. If I try to convert simple UTF-8 string with illegal sequence to ASCII (all legal character in this string is ASCII), it stops on illegal sequence and returns error. GNU iconv from ports works correctly. I didn't try UTF-16 and UTF-32/UCS-4, but by looking at code, I'm afraid, they have same problems. Here are simple program, which reproduce problem: =============== #include <stdio.h> #include <stdlib.h> #include <string.h> #include <iconv.h> int main(int argc, char *argv[]) { const char *src = "X\x80Y"; char dst[64] = {0, 0, 0, 0, 0, 0, 0}; char *s = (char*)src; char *d = &dst[0]; size_t ss = strlen(src) + 1; size_t ds = sizeof(dst); int flag; iconv_t ic = iconv_open("ascii", "utf-8"); flag = 1; iconvctl(ic, ICONV_SET_DISCARD_ILSEQ, &flag); printf("Result: %ld\n", iconv(ic, &s, &ss, &d, &ds)); printf("Converted: from %lu to %lu bytes\n", strlen(src) + 1 - ss, sizeof(dst) - ds); printf("Out: \"%s\"\n", &dst[0]); iconv_close(ic); return 0; } =============== % cc ic.c % ./a.out Result: -1 Converted: from 1 to 1 bytes Out: "X" % cc -L/usr/local/lib -I/usr/local/include ic.c -liconv % ./a.out Result: 0 Converted: from 4 to 3 bytes Out: "XY" % % uname -a FreeBSD blob.home.serebryakov.spb.ru 11.0-STABLE FreeBSD 11.0-STABLE #13 r315153M: Sun Mar 12 20:11:36 MSK 2017 root@blob.home.serebryakov.spb.ru:/usr/obj/usr/src/sys/BLOB amd64 % -- Best regards, Lev mailto:lev@FreeBSD.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?137414834.20170409140646>