Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 20 Jun 2010 15:33:56 +0200
From:      Matthias Andree <mandree@FreeBSD.org>
To:        Lasse Collin <lasse.collin@tukaani.org>
Cc:        ports@FreeBSD.org, Christian Weisgerber <naddy@FreeBSD.org>, portmgr@FreeBSD.org
Subject:   Re: FreeBSD ports USE_XZ critical issue on low-RAM computers
Message-ID:  <4C1E18C4.5020303@FreeBSD.org>
In-Reply-To: <201006191641.26301.lasse.collin@tukaani.org>
References:  <4C1BA4D4.9000205@FreeBSD.org> <201006190855.05061.lasse.collin@tukaani.org> <4C1C9F9F.8090808@FreeBSD.org> <201006191641.26301.lasse.collin@tukaani.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Am 19.06.2010 15:41, schrieb Lasse Collin:

> Perhaps FreeBSD provides a good working way to limit the amount of 
> memory that a process actually can use. I don't see such a way e.g. in 
> Linux, so having some method in the application to limit memory usage is 
> definitely nice. It's even more useful in the compression library, 
> because a virtual-memory-hog application on a busy server doesn't 
> necessarily want to use tons of RAM for decompressing data from 
> untrusted sources.

Even there the default should be "max", and the library SHOULD NOT
second-guess what trust level of data the application might to process
with libxz's help.

Expose the limiter interface in the API if you want, but particularly
for the library in particular, any other default than "unlimited memory"
is a nuisance.  And there's still an application, and unlike the xz
library, the application should know what kind of data from what sources
it is processing, and if - for instance - a virus inspector wants to
impose memory limits and quarantine an attachment with what looks like
an zip bomb.

Typically, after the advent of KDE, GNOME, XFCE, and thereabouts with
all their graphical tools, and people hardly use command line tools
unless they know exactly what they're doing - and it's not as though
xz's behaviour were prone to causing permanent damage somewhere, so it's
OK if less skilled users find out the hard way that they need to read
manpages.

I was surprised because xz has somewhat left the traditional UNIX way,
which was try exactly as little and as hard as you were told, until you
bump into a brick wall (permission denied on some file, or memory
allocation failed, or similar).
Don't try to be nice unless you're asked to.
Don't try to ask questions unless you're asked to be "interactive".

All your defaults limit make using the utility unnecessarily hard, make
it harder to explain, because the default limits are surprising and
cause self-made failures.

And I think many people would just lay xz or the library aside when
figuring they need to do this and that and foo and bar and torture a
black cat, swinging it over my head, and dance strange figures in the
sewers in a full moon night, just so that xz or the library finally
condescend to decompressing a file. I am exaggerating here, but please,
don't make me jump through hoops with my application or script.

I'd say a typical application wants to call xzopen() and decompress a
file, and if it wants to impose limits it will use setrlimit() or
perhaps xz_set_memory_limit in addition beforehand.

I do fear that this will actually hamper, not foster, adoption of the xz
software.

>> For compression, it's less critical because service is degraded, not
>> denied, but I'd still think -M max would be the better default. I can
>> always put "export XZ_OPT=-3" in /etc/profile.d/local.sh or wherever
>> it belongs on the OS of the day.
> 
> If a script has "xz -9", it overrides XZ_OPT=-3.

I know. This isn't a surprise for me. The memory limiting however is.
And the memory limiting overrides xz -9 to something lesser, which may
not be what I want either.

>> I still think utilities and applications should /not/ impose
>> arbitrarily lower limits by default though.
> 
> There's no multithreading in xz yet, but when there is, do you want xz 
> to use as many threads as there are CPU cores _by default_? If so, do 
> you mind if compressing with "xz -9" used around 3.5 GiB of memory on a 
> four-core system no matter how much RAM it has?

Multithreading in xz is worth discussion if the tasks can be
parallelized, which is apparently not the case.  You would be
duplicating effort, because we have tools to run several xz on distinct
files at the same time, for instance BSD portable make or GNU make with
a "-j" option.

> I think it is quite obvious that you want the number of threads to be 
> limited so that xz won't accidentally exceed the total amount of 
> physical RAM, because then it is much slower than using fewer threads. 

This tells me xz cannot fully parallelize its effort on the CPUs, and
should be single-threaded so as not to waste the parallelization overhead.

> Being faster is the whole point of threading anyway. Naturally doing 
> unusual things is sometimes wanted so a limit can be overriden. This is 
> all about the default behavior only.

Yes, and I consider the default behaviour to be "getting in my way" and
disturbing.  No other compression tool I know would ever spend that much
thought on its working environment. All others will only fail if it's
"physically" impossible to complete the job, and otherwise just grind away.

> In most cases, lowering the compression settings automatically is 
> friendly towards the user. People easily write "xz -9" to scripts 
> without thinking if they actually want that, because they are used to -9 
> with gzip and bzip2.

-9 is quite slow in bzip2, in gzip computers have become fast enough so
that -9 hardly hurts today, but I recall times where I thought twice
before using gzip -9 rather than gzip -3.  People know there is a price
tag attached, else the whole option system would be useless and --best
were the only option.

> I would find it dumb to annoy users of slightly
> older hardware with _default behavior_ that puts their system to swap 
> whenever such a script is ran. They can still get the swap-till-the-
> morning behavior if they really want it by disabling the limit when 
> compressing by using XZ_OPT.

This is really xz developing a life of its own.

Look:

If I specify -9 or --best, but no memory option, that means "compress as
hard as you can".

Instead, xz assumes an implicit default memory limit, so the -9 gets
degraded to -5, -2, -1, ... in an somewhat surprising manner, because
depending on which computer I run it on, -9 might mean -9, or -6 on
another computer, or -1 on a third.

That is what I'd call a nasty surprise - xz overrides my command line
option.  I would propose that with -9 and without -M option, that it
tries to allocate memory, and if it fails, it can still suggest to use
the -M option or a lower -[0-8] option so I know how to proceed.

> In short, some people find a default limit annoying and some other 
> people would find lack of default limit annoying. (And most people 
> probably don't care.) So the question is, which group will complain 
> more; obviously I cannot make everyone happy. At this point it starts to 
> look like that your group is winning. ;-) I will have to discuss with 
> people in the other group before making decisions.

The real thing is that the xz software does things that go beyond the
options. This has been a negative surprise to me.

I also like to recall my earlier argument that xz is a low-level tool,
and it's mixing high-level features in. This is astonishing.

I think that some of the defaults you've set were trying to address
usability concerns, and there are other ways to achieve this usability.
Often it's an alternative proposal together with a diagnostic that
suffices. Helping towards self-aid, in a way.

I hope - egoistically - that xz will lean towards easier use in
infrastructure (think build systems or applications using libraries),
rather than assisting newbies. The more consistent the world around
Unix-compatible and -environment tools is, the easier people will learn.
It keeps the documentation simpler because there's not so many ifs and
buts, and in the end I think it will pay off to change the default.

Best regards
Matthias



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C1E18C4.5020303>