From owner-freebsd-java@FreeBSD.ORG Thu Jun 10 00:25:42 2004 Return-Path: Delivered-To: freebsd-java@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7297216A4CE for ; Thu, 10 Jun 2004 00:25:42 +0000 (GMT) Received: from palle.girgensohn.se (1-2-8-5a.asp.sth.bostream.se [82.182.157.66]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2651643D45 for ; Thu, 10 Jun 2004 00:25:41 +0000 (GMT) (envelope-from girgen@pingpong.net) Received: from localhost.girgensohn.se (localhost.girgensohn.se [127.0.0.1]) by palle.girgensohn.se (8.12.11/8.12.11) with ESMTP id i5A0PSdG020438; Thu, 10 Jun 2004 02:25:29 +0200 (CEST) (envelope-from girgen@pingpong.net) Date: Thu, 10 Jun 2004 02:25:28 +0200 From: Palle Girgensohn To: Greg Lewis Message-ID: In-Reply-To: <20040609175626.GB83936@misty.eyesbeyond.com> References: <5C024439534B293EAFE34A55@rambutan.pingpong.net> <20040609175626.GB83936@misty.eyesbeyond.com> X-Mailer: Mulberry/3.1.3 (Linux/x86) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="==========29584E48F3C762197735==========" X-Content-Filtered-By: Mailman/MimeDel 2.1.1 cc: freebsd-java@freebsd.org Subject: Re: problems with java.util.zip and diacritical characters in file names X-BeenThere: freebsd-java@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Porting Java to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jun 2004 00:25:42 -0000 --==========29584E48F3C762197735========== Content-Type: text/plain; charset=iso-8859-15; format=flowed Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hi, Well, the problem is about character sets. A zip file seems to have no=20 attribute telling which charset it uses for representing file names. Not=20 very surprising. Java seems to handle this by reading filenames correctly and converting=20 them to java Strings (in unicode). But when fetching data, it uses the=20 unicode byte sequence to find and fetch the entry, and comes out empty=20 handed, the getInputString returns null. I know of no way to tell=20 java.util.zip that it should use some other character set? Hexdumping the resulting zip file, it is obvious that it has used unicode=20 in the zip file when saving the file name entries. I'm not sure how winzip=20 would react, but I assume it will show them as latin1, i.e. =E4 -> =C3=A4. = While=20 this is really bad for me, since there is no standard I'm not quite sure=20 this is wrong? BTW, there is a plugin pure java implementation on sourceforge,=20 . It seems to result in same filenames on=20 input and output. In (getName): z/ Out (getName): z/ In (getName): z/=E5=E4=F6=C5=C4=D6.txt Out (getName): z/=E5=E4=F6=C5=C4=D6.txt in is null with java.util.zip, in is null and the file is renamed to same thing but in = unicode, and is zero bytes in the zip file. with jazzlib, this seems to work, in is not null and the = =E5=E4=F6=C5=C4=D6.txt file is=20 not empty I'm running this in a shell with $ echo $LC_ALL sv_SE.ISO8859-1 Regards, Palle --On onsdag, juni 09, 2004 11.56.26 -0600 Greg Lewis=20 wrote: > On Wed, Jun 09, 2004 at 05:37:27PM +0200, Palle Girgensohn wrote: >> java.util.zip cannot inflate a zip archive that contains eight bit >> characters in file names, it simply crashes. I haven't been able to try >> it on ither platforms yet, but I'd like to hear from others who might >> have seen this problem. Odd thing is there is no exception or anything >> it just stops when the first character comes up, and returns null. >> >> Anyone else seen this? Is it just FreeBSD? > > If you send a small test programme and zip I can quickly try it on > Linux to compare. > > -- > Greg Lewis Email : glewis@eyesbeyond.com > Eyes Beyond Web : http://www.eyesbeyond.com > Information Technology FreeBSD : glewis@FreeBSD.org --==========29584E48F3C762197735========== Content-Type: text/plain; charset=iso-8859-1; name="ZipTest.java" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="ZipTest.java"; size=1411 import java.io.*; import java.util.*; import java.util.zip.*; //import net.sf.jazzlib.*; /** Text a zip file. run as "java ZipText infile.zip filetocreate.zip" */ public class ZipTest { public static void main(String[] args) { try { ZipFile zipIn =3D new ZipFile(args[0]); ZipOutputStream zipOut =3D new ZipOutputStream(new = FileOutputStream(args[1])); Enumeration inFiles =3D zipIn.entries(); while(inFiles.hasMoreElements()) { ZipEntry inEntry =3D (ZipEntry) inFiles.nextElement(); System.out.print("In (getName): "); System.out.println(inEntry.getName()); ZipEntry outEntry =3D new ZipEntry(inEntry.getName()); System.out.print("Out (getName): "); System.out.println(outEntry.getName()); zipOut.putNextEntry(outEntry); if (inEntry.isDirectory()) { continue; } copy(zipIn.getInputStream(inEntry), zipOut); zipOut.closeEntry(); } zipOut.close(); zipIn.close(); } catch (Exception e) { e.printStackTrace(); } } private static void copy(InputStream in, OutputStream out)=20 throws IOException { if (in =3D=3D null) { System.out.println("in is null"); return ; } synchronized (in) { synchronized (out) { byte[] buffer =3D new byte[2048]; while(true) { int bytesRead =3D in.read(buffer); if (bytesRead =3D=3D -1) break; out.write(buffer, 0, bytesRead); } } } } } --==========29584E48F3C762197735==========--