From owner-freebsd-questions@FreeBSD.ORG  Mon Jan  5 18:00:53 2004
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id B189F16A4CE
	for <questions@freebsd.org>; Mon,  5 Jan 2004 18:00:53 -0800 (PST)
Received: from smtp1.adl2.internode.on.net (smtp1.adl2.internode.on.net
	[203.16.214.181])
	by mx1.FreeBSD.org (Postfix) with ESMTP id E841843D3F
	for <questions@freebsd.org>; Mon,  5 Jan 2004 18:00:51 -0800 (PST)
	(envelope-from malcolm.kay@internode.on.net)
Received: from beta.home (ppp129-234.lns1.adl2.internode.on.net
	[150.101.129.234])i0620gRp097912;
	Tue, 6 Jan 2004 12:30:50 +1030 (CST)
Content-Type: text/plain;
  charset="gb2312"
From: Malcolm Kay <malcolm.kay@internode.on.net>
Organization: At home
To: zhangweiwu@realss.com, "Zhang Weiwu" <weiwuzhang@hotmail.com>,
	questions@freebsd.org
Date: Tue, 6 Jan 2004 12:30:42 +1030
User-Agent: KMail/1.4.3
References: <Law11-F31WerOc0Ne0P00016107@hotmail.com>
In-Reply-To: <Law11-F31WerOc0Ne0P00016107@hotmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Message-Id: <200401061230.42038.malcolm.kay@internode.on.net>
Subject: Re: help me with this sed expression
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>,
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>,
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Jan 2004 02:00:53 -0000

On Mon, 5 Jan 2004 22:19, Zhang Weiwu wrote:
> Hello. I've worked an hour to figure out a serial of sed command to pro=
cess
> some text (without any luck, you kown I'm kinda newbie). I really
> appreciate your help.
>
> The original text file is in this form -- for each line:
> one Chinese word then one or two English word seperated by space.
>
> I wish to change to:
> 1) target file: one English word, then a space, then a Chinese word
> coorisponding to that English word.
> 2) if in the original file one Chinese word has more than one English w=
ord
> following in the same line, repeat the Chinese word to satisfy 1).
>
> Define: Chinese word =3D one or more continous bytes of data where each=
 byte
> is greater then 128 in value. (it is true in GB2312 Chinese charset whi=
ch
> this email is written in.)
> Define: English word =3D one or more continous bytes of [a-z].
>
> Say, for the original file:
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> =D2=BBa av
> =BF=C9=B8=E8=BF=C9=C6=FCaaav
> =CE=DE=BF=C9=B7=EE=B8=E6aacm
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> The target file should be:
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> a =D2=BB
> av =D2=BB
> aaav =BF=C9=B8=E8=BF=C9=C6=FC
> aacm =CE=DE=BF=C9=B7=EE=B8=E6
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>
> I tried to do things like s/\(.*\)\([a-z]*\)/\2 \1/ but the first \(.*\=
) is
> too greedy and included the rest [a-z].

Well the greedy part is easily fixed with:
  s/\([^a-z]*\)\([a-z]*\)/\2 \1/

But this will not work for those lines with 2 english words. The followin=
g should:
% sed -n -e 's/\([^a-z]*\)\([a-z]*\) .*/\2 \1/p' -e 's/\([^a-z]*\)[a-z]* =
\([a-z]*\)/\2 \1/p' original > target

Malcolm Kay