From owner-freebsd-i18n@FreeBSD.ORG Sun Aug 3 12:50:59 2003 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 144AF37B401; Sun, 3 Aug 2003 12:50:59 -0700 (PDT) Received: from nagual.pp.ru (pobrecita.freebsd.ru [194.87.13.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 62FA343F75; Sun, 3 Aug 2003 12:50:57 -0700 (PDT) (envelope-from ache@pobrecita.freebsd.ru) Received: from pobrecita.freebsd.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.12.9/8.12.9) with ESMTP id h73JouYk024745; Sun, 3 Aug 2003 23:50:56 +0400 (MSD) (envelope-from ache@pobrecita.freebsd.ru) Received: (from ache@localhost) by pobrecita.freebsd.ru (8.12.9/8.12.9/Submit) id h73JouIU024744; Sun, 3 Aug 2003 23:50:56 +0400 (MSD) Date: Sun, 3 Aug 2003 23:50:56 +0400 From: Andrey Chernov To: current@freebsd.org Message-ID: <20030803195056.GA24697@nagual.pp.ru> References: <20030801004408.GA22054@nagual.pp.ru> <20030801023703.GA23702@nagual.pp.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030801023703.GA23702@nagual.pp.ru> User-Agent: Mutt/1.5.4i cc: i18n@freebsd.org Subject: Re: Revised version (was Re: Serious 'tr' bug, patch for review included) X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Aug 2003 19:50:59 -0000 On Fri, Aug 01, 2003 at 06:37:03 +0400, Andrey Chernov wrote: > On Fri, Aug 01, 2003 at 04:44:08 +0400, Andrey Chernov wrote: > > This patch address two problems. > > Revides patch version with accurate skipping. Surprisingly, the code is > reduced. > If you ever plan, don't try this patch, use variant recently commited into -current instead. From owner-freebsd-i18n@FreeBSD.ORG Wed Aug 6 22:55:42 2003 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A5E5F37B401 for ; Wed, 6 Aug 2003 22:55:42 -0700 (PDT) Received: from smtp02.syd.iprimus.net.au (smtp02.syd.iprimus.net.au [210.50.76.52]) by mx1.FreeBSD.org (Postfix) with ESMTP id 251F343FA3 for ; Wed, 6 Aug 2003 22:55:42 -0700 (PDT) (envelope-from tim@robbins.dropbear.id.au) Received: from mail.robbins.dropbear.id.au (210.50.81.62) by smtp02.syd.iprimus.net.au (7.0.018) id 3F13130D004593B9 for freebsd-i18n@freebsd.org; Thu, 7 Aug 2003 15:55:40 +1000 Received: by mail.robbins.dropbear.id.au (Postfix, from userid 1000) id 77A49C90F; Thu, 7 Aug 2003 15:55:38 +1000 (EST) Date: Thu, 7 Aug 2003 15:55:38 +1000 From: Tim Robbins To: freebsd-i18n@freebsd.org Message-ID: <20030807055538.GA1428@dilbert.robbins.dropbear.id.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i Subject: gb18030(5) manual page for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Aug 2003 05:55:43 -0000 I noticed that support for the GB18030 encoding was recently committed. I had already implemented it in a Perforce branch, along with the rest of my planned overhaul of the character encoding functions in libc for FreeBSD 6. The only thing that my implementation has that Robin Hu's doesn't is a manual page :-) I've attached my manual page, which I plan to commit in the next week or so. I'd appreciate comments from Chinese speakers or anyone who's generally clueful when it comes to character encodings. BTW, just to save duplication of effort in the future: I've already implemented the ISO-2022-CN and ISO-2022-JP encodings and all the related state-dependent encoding support, and will probably be committing it when 6.0-current is created. Thanks, Tim .\" [copyright header trimmed for mail] .\" .\" $FreeBSD$ .Dd March 30, 2003 .Dt GB18030 5 .Os .Sh NAME .Nm gb18030 .Nd "GB 18030 encoding method for Chinese text" .Sh SYNOPSIS .Nm ENCODING .Qq GB18030 .Sh DESCRIPTION The .Nm GB18030 encoding implements GB 18030-2000, a PRC National Standard for the encoding of Chinese characters. It is a superset of the older GB 2312-80 and GBK encodings. .Pp Multibyte characters in the GB18030 encoding can be one byte, two bytes, or four bytes long. There is a total of over 1.5 million code positions. .Pp The .Tn ASCII character set is represented by a single byte in the range 0x00 to 0x7F. .Pp Chinese characters are represented as either two bytes or four bytes. Characters which are represented by two bytes begin with a byte in the range 0x81-0xFE and end with a byte either in the range 0x40-0x7E or 0x80-0xFE. .Pp Characters which are represented by four bytes begin with a byte in the range 0x81-0xFE, have a second byte in the range 0x30-0x39, a third byte in the range 0x81-0xFE and a fourth byte in the range 0x30-0x39. .Sh SEE ALSO .Xr euc 4 , .Xr utf8 5 .Rs .%T "PRC National Standard GB 18030-2000" .%D "March 2000" .Re .Sh STANDARDS The .Nm encoding is believed to be compatible with GB 18030-2000. From owner-freebsd-i18n@FreeBSD.ORG Wed Aug 6 22:58:15 2003 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5022637B401; Wed, 6 Aug 2003 22:58:15 -0700 (PDT) Received: from nagual.pp.ru (pobrecita.freebsd.ru [194.87.13.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 125C343FBD; Wed, 6 Aug 2003 22:58:14 -0700 (PDT) (envelope-from ache@pobrecita.freebsd.ru) Received: from pobrecita.freebsd.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.12.9/8.12.9) with ESMTP id h775w9Yk083518; Thu, 7 Aug 2003 09:58:13 +0400 (MSD) (envelope-from ache@pobrecita.freebsd.ru) Received: (from ache@localhost) by pobrecita.freebsd.ru (8.12.9/8.12.9/Submit) id h775w8cH083517; Thu, 7 Aug 2003 09:58:08 +0400 (MSD) Date: Thu, 7 Aug 2003 09:58:08 +0400 From: Andrey Chernov To: Tim Robbins Message-ID: <20030807055808.GA83475@nagual.pp.ru> References: <20030807055538.GA1428@dilbert.robbins.dropbear.id.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030807055538.GA1428@dilbert.robbins.dropbear.id.au> User-Agent: Mutt/1.5.4i cc: freebsd-i18n@freebsd.org Subject: Re: gb18030(5) manual page for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Aug 2003 05:58:15 -0000 On Thu, Aug 07, 2003 at 15:55:38 +1000, Tim Robbins wrote: > I noticed that support for the GB18030 encoding was recently committed. I had > already implemented it in a Perforce branch, along with the rest of my planned > overhaul of the character encoding functions in libc for FreeBSD 6. The only > thing that my implementation has that Robin Hu's doesn't is a manual page :-) We have GBK encoding too. Could you please write manpage like this for it too? From owner-freebsd-i18n@FreeBSD.ORG Thu Aug 7 00:35:24 2003 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 80B0C37B401; Thu, 7 Aug 2003 00:35:24 -0700 (PDT) Received: from hanirc.kr.psi.net (hanirc.higlobe.net [203.255.112.80]) by mx1.FreeBSD.org (Postfix) with ESMTP id ABA3243F3F; Thu, 7 Aug 2003 00:35:22 -0700 (PDT) (envelope-from perky@hanirc.kr.psi.net) Received: from hanirc.kr.psi.net (wendy@localhost [127.0.0.1]) by hanirc.kr.psi.net (8.12.9/8.12.5) with ESMTP id h777ZLdA042475; Thu, 7 Aug 2003 16:35:21 +0900 (KST) (envelope-from perky@hanirc.kr.psi.net) Received: (from perky@localhost) by hanirc.kr.psi.net (8.12.9/8.12.6/Submit) id h777ZLSf042474; Thu, 7 Aug 2003 16:35:21 +0900 (KST) Date: Thu, 7 Aug 2003 16:35:21 +0900 From: Hye-Shik Chang To: Andrey Chernov Message-ID: <20030807073521.GA42422@i18n.org> References: <20030807055538.GA1428@dilbert.robbins.dropbear.id.au> <20030807055808.GA83475@nagual.pp.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030807055808.GA83475@nagual.pp.ru> User-Agent: Mutt/1.4i Organization: Yonsei University X-PGP: finger perky@freebsd.org cc: Tim Robbins cc: freebsd-i18n@freebsd.org Subject: Re: gb18030(5) manual page for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Aug 2003 07:35:24 -0000 On Thu, Aug 07, 2003 at 09:58:08AM +0400, Andrey Chernov wrote: > On Thu, Aug 07, 2003 at 15:55:38 +1000, Tim Robbins wrote: > > I noticed that support for the GB18030 encoding was recently committed. I had > > already implemented it in a Perforce branch, along with the rest of my planned > > overhaul of the character encoding functions in libc for FreeBSD 6. The only > > thing that my implementation has that Robin Hu's doesn't is a manual page :-) > > We have GBK encoding too. Could you please write manpage like this for it > too? How about to rename GBK encoding to DBCS or anything more general? The GBK encoding implementation can be used for cp949 (MS Korean), cp950 (MS Traditional Chinese), too. Regards, Hye-Shik =) From owner-freebsd-i18n@FreeBSD.ORG Thu Aug 7 01:02:58 2003 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 149FD37B401; Thu, 7 Aug 2003 01:02:58 -0700 (PDT) Received: from nagual.pp.ru (pobrecita.freebsd.ru [194.87.13.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id DA8F043F85; Thu, 7 Aug 2003 01:02:56 -0700 (PDT) (envelope-from ache@pobrecita.freebsd.ru) Received: from pobrecita.freebsd.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.12.9/8.12.9) with ESMTP id h7782sWR000725; Thu, 7 Aug 2003 12:02:55 +0400 (MSD) (envelope-from ache@pobrecita.freebsd.ru) Received: (from ache@localhost) by pobrecita.freebsd.ru (8.12.9/8.12.9/Submit) id h7782mFe000720; Thu, 7 Aug 2003 12:02:49 +0400 (MSD) Date: Thu, 7 Aug 2003 12:02:48 +0400 From: Andrey Chernov To: Hye-Shik Chang Message-ID: <20030807080247.GA676@nagual.pp.ru> References: <20030807055538.GA1428@dilbert.robbins.dropbear.id.au> <20030807055808.GA83475@nagual.pp.ru> <20030807073521.GA42422@i18n.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030807073521.GA42422@i18n.org> User-Agent: Mutt/1.5.4i cc: Tim Robbins cc: freebsd-i18n@freebsd.org Subject: Re: gb18030(5) manual page for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Aug 2003 08:02:58 -0000 On Thu, Aug 07, 2003 at 16:35:21 +0900, Hye-Shik Chang wrote: > > We have GBK encoding too. Could you please write manpage like this for it > > too? > > How about to rename GBK encoding to DBCS or anything more general? > The GBK encoding implementation can be used for cp949 (MS Korean), > cp950 (MS Traditional Chinese), too. I don't think it worth renaming. It is internal libc name, locale name can be any. At this moment we have only one and incomplete locale which use GBK. From owner-freebsd-i18n@FreeBSD.ORG Thu Aug 7 05:00:08 2003 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8D4C937B401 for ; Thu, 7 Aug 2003 05:00:08 -0700 (PDT) Received: from hotmail.com (bay8-dav63.bay8.hotmail.com [64.4.26.198]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1ED8743F93 for ; Thu, 7 Aug 2003 05:00:08 -0700 (PDT) (envelope-from lazykang@hotmail.com) Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC; Thu, 7 Aug 2003 05:00:04 -0700 Received: from 218.107.145.116 by bay8-dav63.bay8.hotmail.com with DAV; Thu, 07 Aug 2003 12:00:04 +0000 X-Originating-IP: [218.107.145.116] X-Originating-Email: [lazykang@hotmail.com] From: "Kang Liu" To: Date: Thu, 7 Aug 2003 19:58:13 +0800 Message-ID: <000301c35cdb$2de4abe0$e04e70ca@lkatschool> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.4510 Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-OriginalArrivalTime: 07 Aug 2003 12:00:04.0637 (UTC) FILETIME=[70312CD0:01C35CDB] Subject: Re: gb18030(5) manual page for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Aug 2003 12:00:09 -0000 Hi, Tim Robbins Thanks for writing manual page for gb18030 :) I think there are 2 points need to be improved.=20 1.why not use "GB2312-1980" instead of "GB2312-80" (more official and = Y2K compatible? ^_^);=20 2."GB18030 is a superset of the older GB2312-80 and GBK encodings" that = is right, but for more exactly GB18030 is not only include GB2312 and GBK = but also contains all CJK Extension(Chinese, Japanese,Korean unification character set). Regards, Kang Liu From owner-freebsd-i18n@FreeBSD.ORG Thu Aug 7 18:35:52 2003 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 82F8437B401 for ; Thu, 7 Aug 2003 18:35:52 -0700 (PDT) Received: from smtp02.syd.iprimus.net.au (smtp02.syd.iprimus.net.au [210.50.76.52]) by mx1.FreeBSD.org (Postfix) with ESMTP id ADAB243F93 for ; Thu, 7 Aug 2003 18:35:51 -0700 (PDT) (envelope-from tim@robbins.dropbear.id.au) Received: from mail.robbins.dropbear.id.au (210.50.32.151) by smtp02.syd.iprimus.net.au (7.0.018) id 3F328CD00000B7D8; Fri, 8 Aug 2003 11:35:50 +1000 Received: by mail.robbins.dropbear.id.au (Postfix, from userid 1000) id 27209C90F; Fri, 8 Aug 2003 11:35:06 +1000 (EST) Date: Fri, 8 Aug 2003 11:35:05 +1000 From: Tim Robbins To: Andrey Chernov Message-ID: <20030808013505.GA37663@dilbert.robbins.dropbear.id.au> References: <20030807055538.GA1428@dilbert.robbins.dropbear.id.au> <20030807055808.GA83475@nagual.pp.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030807055808.GA83475@nagual.pp.ru> User-Agent: Mutt/1.4.1i cc: freebsd-i18n@freebsd.org Subject: Re: gb18030(5) manual page for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Aug 2003 01:35:52 -0000 On Thu, Aug 07, 2003 at 09:58:08AM +0400, Andrey Chernov wrote: > On Thu, Aug 07, 2003 at 15:55:38 +1000, Tim Robbins wrote: > > I noticed that support for the GB18030 encoding was recently committed. I had > > already implemented it in a Perforce branch, along with the rest of my planned > > overhaul of the character encoding functions in libc for FreeBSD 6. The only > > thing that my implementation has that Robin Hu's doesn't is a manual page :-) > > We have GBK encoding too. Could you please write manpage like this for it > too? Sure. I also have mskanji(5) and big5(5) in my branch, so with a gbk(5) manpage, I think we should have all of the encodings covered. Tim From owner-freebsd-i18n@FreeBSD.ORG Thu Aug 7 18:58:03 2003 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0572237B408 for ; Thu, 7 Aug 2003 18:58:03 -0700 (PDT) Received: from smtp02.syd.iprimus.net.au (smtp02.syd.iprimus.net.au [210.50.76.52]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7404543F85 for ; Thu, 7 Aug 2003 18:58:02 -0700 (PDT) (envelope-from tim@robbins.dropbear.id.au) Received: from mail.robbins.dropbear.id.au (210.50.32.151) by smtp02.syd.iprimus.net.au (7.0.018) id 3F328CD00000D052; Fri, 8 Aug 2003 11:58:00 +1000 Received: by mail.robbins.dropbear.id.au (Postfix, from userid 1000) id 99A4DC90F; Fri, 8 Aug 2003 11:57:51 +1000 (EST) Date: Fri, 8 Aug 2003 11:57:51 +1000 From: Tim Robbins To: Kang Liu Message-ID: <20030808015751.GB37663@dilbert.robbins.dropbear.id.au> References: <000301c35cdb$2de4abe0$e04e70ca@lkatschool> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <000301c35cdb$2de4abe0$e04e70ca@lkatschool> User-Agent: Mutt/1.4.1i cc: freebsd-i18n@freebsd.org Subject: Re: gb18030(5) manual page for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Aug 2003 01:58:03 -0000 On Thu, Aug 07, 2003 at 07:58:13PM +0800, Kang Liu wrote: > Thanks for writing manual page for gb18030 :) > I think there are 2 points need to be improved. > 1.why not use "GB2312-1980" instead of "GB2312-80" (more official and Y2K > compatible? ^_^); > 2."GB18030 is a superset of the older GB2312-80 and GBK encodings" that is > right, but for more exactly GB18030 is not only include GB2312 and GBK but > also contains all CJK Extension(Chinese, Japanese,Korean unification > character set). Thanks for the suggestions. I'll do something about both of them before I commit the manual page. Tim