From owner-freebsd-ports@FreeBSD.ORG Sun Jun 20 13:47:57 2010 Return-Path: Delivered-To: ports@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2EFA9106566B; Sun, 20 Jun 2010 13:47:57 +0000 (UTC) (envelope-from mandree@FreeBSD.org) Received: from unimail.uni-dortmund.de (mx1.HRZ.Uni-Dortmund.DE [129.217.128.51]) by mx1.freebsd.org (Postfix) with ESMTP id 954908FC18; Sun, 20 Jun 2010 13:47:56 +0000 (UTC) Received: from [192.168.0.3] (p4FE337E8.dip.t-dialin.net [79.227.55.232]) (authenticated bits=0) by unimail.uni-dortmund.de (8.14.4/8.14.4) with ESMTP id o5KDXuxT029002 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NOT); Sun, 20 Jun 2010 15:33:57 +0200 (CEST) Message-ID: <4C1E18C4.5020303@FreeBSD.org> Date: Sun, 20 Jun 2010 15:33:56 +0200 From: Matthias Andree Organization: FreeBSD User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100423 Mnenhy/0.8.2 Thunderbird/3.0.4 MIME-Version: 1.0 To: Lasse Collin References: <4C1BA4D4.9000205@FreeBSD.org> <201006190855.05061.lasse.collin@tukaani.org> <4C1C9F9F.8090808@FreeBSD.org> <201006191641.26301.lasse.collin@tukaani.org> In-Reply-To: <201006191641.26301.lasse.collin@tukaani.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: ports@FreeBSD.org, Christian Weisgerber , portmgr@FreeBSD.org Subject: Re: FreeBSD ports USE_XZ critical issue on low-RAM computers X-BeenThere: freebsd-ports@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting software to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Jun 2010 13:47:57 -0000 Am 19.06.2010 15:41, schrieb Lasse Collin: > Perhaps FreeBSD provides a good working way to limit the amount of > memory that a process actually can use. I don't see such a way e.g. in > Linux, so having some method in the application to limit memory usage is > definitely nice. It's even more useful in the compression library, > because a virtual-memory-hog application on a busy server doesn't > necessarily want to use tons of RAM for decompressing data from > untrusted sources. Even there the default should be "max", and the library SHOULD NOT second-guess what trust level of data the application might to process with libxz's help. Expose the limiter interface in the API if you want, but particularly for the library in particular, any other default than "unlimited memory" is a nuisance. And there's still an application, and unlike the xz library, the application should know what kind of data from what sources it is processing, and if - for instance - a virus inspector wants to impose memory limits and quarantine an attachment with what looks like an zip bomb. Typically, after the advent of KDE, GNOME, XFCE, and thereabouts with all their graphical tools, and people hardly use command line tools unless they know exactly what they're doing - and it's not as though xz's behaviour were prone to causing permanent damage somewhere, so it's OK if less skilled users find out the hard way that they need to read manpages. I was surprised because xz has somewhat left the traditional UNIX way, which was try exactly as little and as hard as you were told, until you bump into a brick wall (permission denied on some file, or memory allocation failed, or similar). Don't try to be nice unless you're asked to. Don't try to ask questions unless you're asked to be "interactive". All your defaults limit make using the utility unnecessarily hard, make it harder to explain, because the default limits are surprising and cause self-made failures. And I think many people would just lay xz or the library aside when figuring they need to do this and that and foo and bar and torture a black cat, swinging it over my head, and dance strange figures in the sewers in a full moon night, just so that xz or the library finally condescend to decompressing a file. I am exaggerating here, but please, don't make me jump through hoops with my application or script. I'd say a typical application wants to call xzopen() and decompress a file, and if it wants to impose limits it will use setrlimit() or perhaps xz_set_memory_limit in addition beforehand. I do fear that this will actually hamper, not foster, adoption of the xz software. >> For compression, it's less critical because service is degraded, not >> denied, but I'd still think -M max would be the better default. I can >> always put "export XZ_OPT=-3" in /etc/profile.d/local.sh or wherever >> it belongs on the OS of the day. > > If a script has "xz -9", it overrides XZ_OPT=-3. I know. This isn't a surprise for me. The memory limiting however is. And the memory limiting overrides xz -9 to something lesser, which may not be what I want either. >> I still think utilities and applications should /not/ impose >> arbitrarily lower limits by default though. > > There's no multithreading in xz yet, but when there is, do you want xz > to use as many threads as there are CPU cores _by default_? If so, do > you mind if compressing with "xz -9" used around 3.5 GiB of memory on a > four-core system no matter how much RAM it has? Multithreading in xz is worth discussion if the tasks can be parallelized, which is apparently not the case. You would be duplicating effort, because we have tools to run several xz on distinct files at the same time, for instance BSD portable make or GNU make with a "-j" option. > I think it is quite obvious that you want the number of threads to be > limited so that xz won't accidentally exceed the total amount of > physical RAM, because then it is much slower than using fewer threads. This tells me xz cannot fully parallelize its effort on the CPUs, and should be single-threaded so as not to waste the parallelization overhead. > Being faster is the whole point of threading anyway. Naturally doing > unusual things is sometimes wanted so a limit can be overriden. This is > all about the default behavior only. Yes, and I consider the default behaviour to be "getting in my way" and disturbing. No other compression tool I know would ever spend that much thought on its working environment. All others will only fail if it's "physically" impossible to complete the job, and otherwise just grind away. > In most cases, lowering the compression settings automatically is > friendly towards the user. People easily write "xz -9" to scripts > without thinking if they actually want that, because they are used to -9 > with gzip and bzip2. -9 is quite slow in bzip2, in gzip computers have become fast enough so that -9 hardly hurts today, but I recall times where I thought twice before using gzip -9 rather than gzip -3. People know there is a price tag attached, else the whole option system would be useless and --best were the only option. > I would find it dumb to annoy users of slightly > older hardware with _default behavior_ that puts their system to swap > whenever such a script is ran. They can still get the swap-till-the- > morning behavior if they really want it by disabling the limit when > compressing by using XZ_OPT. This is really xz developing a life of its own. Look: If I specify -9 or --best, but no memory option, that means "compress as hard as you can". Instead, xz assumes an implicit default memory limit, so the -9 gets degraded to -5, -2, -1, ... in an somewhat surprising manner, because depending on which computer I run it on, -9 might mean -9, or -6 on another computer, or -1 on a third. That is what I'd call a nasty surprise - xz overrides my command line option. I would propose that with -9 and without -M option, that it tries to allocate memory, and if it fails, it can still suggest to use the -M option or a lower -[0-8] option so I know how to proceed. > In short, some people find a default limit annoying and some other > people would find lack of default limit annoying. (And most people > probably don't care.) So the question is, which group will complain > more; obviously I cannot make everyone happy. At this point it starts to > look like that your group is winning. ;-) I will have to discuss with > people in the other group before making decisions. The real thing is that the xz software does things that go beyond the options. This has been a negative surprise to me. I also like to recall my earlier argument that xz is a low-level tool, and it's mixing high-level features in. This is astonishing. I think that some of the defaults you've set were trying to address usability concerns, and there are other ways to achieve this usability. Often it's an alternative proposal together with a diagnostic that suffices. Helping towards self-aid, in a way. I hope - egoistically - that xz will lean towards easier use in infrastructure (think build systems or applications using libraries), rather than assisting newbies. The more consistent the world around Unix-compatible and -environment tools is, the easier people will learn. It keeps the documentation simpler because there's not so many ifs and buts, and in the end I think it will pay off to change the default. Best regards Matthias