Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Jun 2004 12:17:52 +0200
From:      Palle Girgensohn <girgen@pingpong.net>
To:        Palle Girgensohn <girgen@pingpong.net>, Greg Lewis <glewis@eyesbeyond.com>
Cc:        freebsd-java@freebsd.org
Subject:   Re: problems with java.util.zip and diacritical characters in file names
Message-ID:  <84B75B389C49D6FF3ED95F29@rambutan.pingpong.net>
In-Reply-To: <D17F6CD704077FEABC1AE296@palle.girgensohn.se>
References:  <5C024439534B293EAFE34A55@rambutan.pingpong.net> <20040609175626.GB83936@misty.eyesbeyond.com> <D17F6CD704077FEABC1AE296@palle.girgensohn.se>

next in thread | previous in thread | raw e-mail | index | archive | help
I've tried this on Linux, seems to act in the same way. One problem is Java =

converting the entries to unicode (this is NOT done by jazzlib, it seems to =

keep the name in a byte array instead of a String). Anther problem is=20
winzip uses the character set cp850 (! I though this was dead for ages...), =

so there really seems to be no hope unless I hack up jazzlib and convert=20
the file names somehow?

/Palle

--On Thursday, June 10, 2004 02:25:28 +0200 Palle Girgensohn=20
<girgen@pingpong.net> wrote:

> Hi,
>
> Well, the problem is about character sets. A zip file seems to have no
> attribute telling which charset it uses for representing file names. Not
> very surprising.
>
> Java seems to handle this by reading filenames correctly and converting
> them to java Strings (in unicode). But when fetching data, it uses the
> unicode byte sequence to find and fetch the entry, and comes out empty
> handed, the getInputString returns null. I know of no way to tell
> java.util.zip that it should use some other character set?
>
> Hexdumping the resulting zip file, it is obvious that it has used unicode
> in the zip file when saving the file name entries. I'm not sure how
> winzip would react, but I assume it will show them as latin1, i.e. =E4 ->
> =C3=A4. While this is really bad for me, since there is no standard I'm =
not
> quite sure this is wrong?
>
> BTW, there is a plugin pure java implementation on sourceforge,
> <http://jazzlib.sourceforge.net/>. It seems to result in same filenames
> on input and output.
>
> In  (getName): z/
> Out (getName): z/
> In  (getName): z/=E5=E4=F6=C5=C4=D6.txt
> Out (getName): z/=E5=E4=F6=C5=C4=D6.txt
> in is null
>
> with java.util.zip, in is null and the file is renamed to same thing but
> in unicode, and is zero bytes in the zip file.
>
> with jazzlib, this seems to work, in is not null and the =
=E5=E4=F6=C5=C4=D6.txt file
> is not empty
>
>
> I'm running this in a shell with
> $ echo $LC_ALL
> sv_SE.ISO8859-1
>
> Regards,
> Palle
>
>
> --On onsdag, juni 09, 2004 11.56.26 -0600 Greg Lewis
> <glewis@eyesbeyond.com> wrote:
>
>> On Wed, Jun 09, 2004 at 05:37:27PM +0200, Palle Girgensohn wrote:
>>> java.util.zip cannot inflate a zip archive that contains eight bit
>>> characters in file names, it simply crashes. I haven't been able to try
>>> it  on ither platforms yet, but I'd like to hear from others who might
>>> have  seen this problem. Odd thing is there is no exception or anything
>>> it just  stops when the first character comes up, and returns null.
>>>
>>> Anyone else seen this? Is it just FreeBSD?
>>
>> If you send a small test programme and zip I can quickly try it on
>> Linux to compare.
>>
>> --
>> Greg Lewis                          Email   : glewis@eyesbeyond.com
>> Eyes Beyond                         Web     : http://www.eyesbeyond.com
>> Information Technology              FreeBSD : glewis@FreeBSD.org
>
>
>






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?84B75B389C49D6FF3ED95F29>