Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Jun 2004 02:25:28 +0200
From:      Palle Girgensohn <girgen@pingpong.net>
To:        Greg Lewis <glewis@eyesbeyond.com>
Cc:        freebsd-java@freebsd.org
Subject:   Re: problems with java.util.zip and diacritical characters in file names
Message-ID:  <D17F6CD704077FEABC1AE296@palle.girgensohn.se>
In-Reply-To: <20040609175626.GB83936@misty.eyesbeyond.com>
References:  <5C024439534B293EAFE34A55@rambutan.pingpong.net> <20040609175626.GB83936@misty.eyesbeyond.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--==========29584E48F3C762197735==========
Content-Type: text/plain; charset=iso-8859-15; format=flowed
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Hi,

Well, the problem is about character sets. A zip file seems to have no=20
attribute telling which charset it uses for representing file names. Not=20
very surprising.

Java seems to handle this by reading filenames correctly and converting=20
them to java Strings (in unicode). But when fetching data, it uses the=20
unicode byte sequence to find and fetch the entry, and comes out empty=20
handed, the getInputString returns null. I know of no way to tell=20
java.util.zip that it should use some other character set?

Hexdumping the resulting zip file, it is obvious that it has used unicode=20
in the zip file when saving the file name entries. I'm not sure how winzip=20
would react, but I assume it will show them as latin1, i.e. =E4 -> =C3=A4. =
While=20
this is really bad for me, since there is no standard I'm not quite sure=20
this is wrong?

BTW, there is a plugin pure java implementation on sourceforge,=20
<http://jazzlib.sourceforge.net/>. It seems to result in same filenames on=20
input and output.

In  (getName): z/
Out (getName): z/
In  (getName): z/=E5=E4=F6=C5=C4=D6.txt
Out (getName): z/=E5=E4=F6=C5=C4=D6.txt
in is null

with java.util.zip, in is null and the file is renamed to same thing but in =

unicode, and is zero bytes in the zip file.

with jazzlib, this seems to work, in is not null and the =
=E5=E4=F6=C5=C4=D6.txt file is=20
not empty


I'm running this in a shell with
$ echo $LC_ALL
sv_SE.ISO8859-1

Regards,
Palle


--On onsdag, juni 09, 2004 11.56.26 -0600 Greg Lewis=20
<glewis@eyesbeyond.com> wrote:

> On Wed, Jun 09, 2004 at 05:37:27PM +0200, Palle Girgensohn wrote:
>> java.util.zip cannot inflate a zip archive that contains eight bit
>> characters in file names, it simply crashes. I haven't been able to try
>> it  on ither platforms yet, but I'd like to hear from others who might
>> have  seen this problem. Odd thing is there is no exception or anything
>> it just  stops when the first character comes up, and returns null.
>>
>> Anyone else seen this? Is it just FreeBSD?
>
> If you send a small test programme and zip I can quickly try it on
> Linux to compare.
>
> --
> Greg Lewis                          Email   : glewis@eyesbeyond.com
> Eyes Beyond                         Web     : http://www.eyesbeyond.com
> Information Technology              FreeBSD : glewis@FreeBSD.org




--==========29584E48F3C762197735==========
Content-Type: text/plain; charset=iso-8859-1; name="ZipTest.java"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="ZipTest.java"; size=1411

import java.io.*;
import java.util.*;
import java.util.zip.*;
//import net.sf.jazzlib.*;


/**
   Text a zip file. run as "java ZipText infile.zip filetocreate.zip"
*/

public class ZipTest {

  public static void main(String[] args) {
    try {
      ZipFile zipIn =3D new ZipFile(args[0]);
      ZipOutputStream zipOut =3D new ZipOutputStream(new =
FileOutputStream(args[1]));

      Enumeration inFiles =3D zipIn.entries();

      while(inFiles.hasMoreElements()) {
	ZipEntry inEntry =3D (ZipEntry) inFiles.nextElement();
	System.out.print("In  (getName): ");
	System.out.println(inEntry.getName());
	
	ZipEntry outEntry =3D new ZipEntry(inEntry.getName());
	System.out.print("Out (getName): ");
	System.out.println(outEntry.getName());
	zipOut.putNextEntry(outEntry);

	if (inEntry.isDirectory()) { continue; }

	copy(zipIn.getInputStream(inEntry), zipOut);
	zipOut.closeEntry();
      }
      zipOut.close();
      zipIn.close();

    } catch (Exception e) {
      e.printStackTrace();
    }
  }

  private static void copy(InputStream in, OutputStream out)=20
    throws IOException {
    if (in =3D=3D null) { System.out.println("in is null"); return ; }
    synchronized (in) {
      synchronized (out) {
        byte[] buffer =3D new byte[2048];
        while(true) {
          int bytesRead =3D in.read(buffer);
          if (bytesRead =3D=3D -1) break;
          out.write(buffer, 0, bytesRead);
        }
      }
    }
  }
}

--==========29584E48F3C762197735==========--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D17F6CD704077FEABC1AE296>