Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 10 Sep 2006 05:48:52 +0800 (CST)
From:      pasear 帕錫爾 <wchunhao@csie.nctu.edu.tw>
To:        FreeBSD-gnats-submit@FreeBSD.org
Subject:   ports/103082: [patch][/usr/ports/mail/elm+ME] hdrdecode and add chinese Big5 
Message-ID:  <200609092148.k89LmqR4072354@ccbsd12.csie.nctu.edu.tw>
Resent-Message-ID: <200609092150.k89LoRFf043666@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         103082
>Category:       ports
>Synopsis:       [patch][/usr/ports/mail/elm+ME] hdrdecode and add chinese Big5
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-ports-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Sep 09 21:50:21 GMT 2006
>Closed-Date:
>Last-Modified:
>Originator:     pasear 帕錫爾
>Release:        FreeBSD 6.1-STABLE amd64
>Organization:
NCTU CSIE
>Environment:
System: FreeBSD ccbsd12 6.1-STABLE FreeBSD 6.1-STABLE #0: Mon Jun 5 22:14:28 CST 2006 root@ccbsd12:/usr/obj/usr/src/sys/AMD64_BSD6 amd64


	
>Description:
    Two patches.

    patch_precompiled_sets.c for adding simple support of chinese Big5.
	Under LC_ALL=en_US.ISO8859-1,
	Chinese Big5 can be displayed well if elm treat it as ISO8859-1, that is, no treatment.
	This patch fails to work if the user specifies LC_ALL=zh_TW.Big5, 
	but elm cannot display very well in zh_TW.Big5.

	patch_hdrdecode.c:
	Many MUAs send encoded From:, To:, and Subject: with double quotes, 
	but elm just pass them undecoded.

	For example,
	From: pasear  "=?big5?B?qay//Lq4?="  <wchunhao@csie.nctu.edu.tw>
	To: pasear =?Big5?B?qay//Lq4?= <wchunhao@csie.nctu.edu.tw>
	Subject: abc""=?Big5?B?UmU6IKuixW9+?="def

	The To: works well, 
	but it cannot decode From:, where it is very common to quote encoded text.
	The Subject: line should also be decoded with prefix and postfix string intact, 
	though this usage is not common.

	Most users just thought elm cannot handle Chinese big5 when they see the undecoded
	text.

	
>How-To-Repeat:

    Copy the following mail to /var/mail/$USER, and run elm.


    From wchunhao@ccbsd12.csie.nctu.edu.tw Sat Sep  9 23:55:57 2006
    Received: from ccbsd12.csie.nctu.edu.tw (wchunhao@ccbsd12.csie.nctu.edu.tw [140.113.209.72])
	    by mailgate.csie.nctu.edu.tw (8.13.4/8.13.4) with ESMTP id k89FtuXe013926
		    for <wchunhao@csie.nctu.edu.tw>; Sat, 9 Sep 2006 23:55:56 +0800 (CST)
			    (envelope-from wchunhao@ccbsd12.csie.nctu.edu.tw)
	Received: (from wchunhao@localhost)
	    by ccbsd12.csie.nctu.edu.tw (8.13.6/8.13.6/Submit) id k89Ftvtd090499
		    for wchunhao@csie.nctu.edu.tw; Sat, 9 Sep 2006 23:55:57 +0800 (CST)
			    (envelope-from wchunhao)
	Date: Sat, 9 Sep 2006 23:55:57 +0800
	From: pasear  "=?big5?B?qay//Lq4?="  <wchunhao@csie.nctu.edu.tw>
	To: pasear ""=?Big5?B?qay//Lq4?="" <wchunhao@csie.nctu.edu.tw>
	Subject: abc""=?Big5?B?UmU6IKuixW9+?="def
	Message-ID: <20060909155557.GA90491@csie.nctu.edu.tw>
	MIME-Version: 1.0
	Content-Type: text/plain; charset=Big5
	Content-Disposition: inline
	Content-Transfer-Encoding: 8bit
	User-Agent: Mutt/1.5.12-2006-07-14
	Status: RO

	大家好,我是正體中文

	
>Fix:

	

--- patch_hdrdecode.c begins here ---
--- work/elm2.4.ME+.122/lib/hdrdecode.c	Sat Jul  9 18:03:15 2005
+++ work.bak/elm2.4.ME+.122/lib/hdrdecode.c	Sun Sep 10 04:57:10 2006
@@ -173,9 +173,16 @@
     char *encoded = NULL;
     struct string *ret = NULL;
     charset_t set;
+	char *front, *end;
+	struct string *fstr, *estr;
 
-    if ('=' != *p++)
+	/* Pasear: front, end are used to solve buffer: abc""=?...?=" problem */
+	front = p;
+	while (*p && '=' != *p) ++p;
+	if (front != p && '=' == *p && '"' == *(p-1)) *(p-1) = '\0';
+    if ('=' != *p)
 	goto fail;
+	*p = '\0'; ++p;
     if ('?' != *p++)
 	goto fail;
     sn = p;
@@ -209,8 +216,8 @@
     p++;
     if ('=' != *p++)
 	goto fail;
-    if (*p)
-	goto fail;
+	if ('"' == *p) ++p;
+	end = p;
 
     set = MIME_name_to_charset(sn,CHARSET_create);
 
@@ -225,6 +232,18 @@
 	break;
     }
 
+	/* Pasear */
+	if (ret){
+		estr = ret;
+		fstr = new_string2(system_charset,us_str(front));
+		fstr = ret = cat_strings(fstr, ret, 0);
+		free_string(&estr);
+		estr = new_string2(system_charset,us_str(end));
+		ret = cat_strings(ret, estr, 0);
+		free_string(&estr);
+		free_string(&fstr);
+	}
+
  fail:
     if (!ret) {
 	DPRINT(Debug,20,(&Debug, 
@@ -341,20 +360,31 @@
     struct string * ret = new_string(defcharset);
     char **tokenized = rfc822_tokenize(buffer);
     unsigned char * last_char = NULL;
-    int i;
+    int i, encoded;
+	char* p;
 
     for (i = 0; tokenized[i]; i++) {
 
 	struct string * ok = NULL;
 	int nostore = 0;
 
+	/* Pasear: detect if it is a encoded string */
+	encoded = 0;
+	if ('"' == tokenized[i][0]){
+		p = tokenized[i];
+		while (*p && *p != '=') ++p;
+		if (*p && *p == '=' && *(p+1) && *(p+1) == '?' )
+			encoded = 1;
+	}
+
+
 	if ('(' == tokenized[i][0]) {
 	    /* we need add last space */
 	    if (last_char) 
 		add_ascii_to_string(ret,last_char);
 	    ok = hdr_comment(tokenized[i],defcharset,demime);
 	    nostore = 1;
-	} else if ('"' == tokenized[i][0]) {
+	} else if (!encoded && '"' == tokenized[i][0]) {
 	    /* we need add last space */
 	    if (last_char) 
 		add_ascii_to_string(ret,last_char);
--- patch_hdrdecode.c ends here ---

--- patch_precompiled_sets.c begins here ---
--- work/elm2.4.ME+.122/lib/precompiled_sets.c	Sat Jul  9 18:03:15 2005
+++ work.bak/elm2.4.ME+.122/lib/precompiled_sets.c	Sun Sep 10 03:29:48 2006
@@ -400,7 +400,8 @@
     { &cs_euc,      &map_EUC_ascii,  SET_valid,  "GB2312",  NULL, 
       &set_EUCCN,         2025,  "GB2312-1980" }, /* ASCII + GB 2312-80 */
 
-    { &cs_unknown,  NULL,  SET_valid,  "Big5",  NULL, NULL,           2026,  NULL },
+    { &cs_ascii, &map_latin1, SET_valid,  "Big5", 
+      ASCII, &(sets_iso_8859_X[1]),                                      2026,   "Big5" },
     { &cs_ascii,    NULL,  SET_valid, "windows-1250", ASCII, NULL,     2250,  NULL },
     { &cs_ascii,    NULL,  SET_valid, "windows-1253", ASCII, NULL,     2253,  NULL },
     { &cs_ascii,    NULL,  SET_valid, "windows-1254", ASCII ,NULL,     2254,  NULL },
--- patch_precompiled_sets.c ends here ---


>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200609092148.k89LmqR4072354>