Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 4 Feb 2008 04:58:31 -0800
From:      Jeremy Chadwick <koitsu@FreeBSD.org>
To:        Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?= <des@des.no>
Cc:        hackers@freebsd.org, Ed Schouten <ed@fxq.nl>
Subject:   Re: sort(1) memory usage
Message-ID:  <20080204125831.GA4052@eos.sc1.parodius.com>
In-Reply-To: <86lk62kqeh.fsf@ds4.des.no>
References:  <8663x6mc2o.fsf@ds4.des.no> <20080203131322.GK1179@hoeg.nl> <20080203151550.GA67020@owl.midgard.homeip.net> <86prvekqs2.fsf@ds4.des.no> <86lk62kqeh.fsf@ds4.des.no>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Feb 03, 2008 at 04:31:34PM +0100, Dag-Erling Smørgrav wrote:
> Dag-Erling Smørgrav <des@des.no> writes:
> > Erik Trulsson <ertr1013@student.uu.se> writes:
> > > Yep, it seems that GNU sort allocates a quite large buffer by default when
> > > the size of the input is unknown (such as when it reads input from stdin.)
> > > A quick check in the source code indicates that it tries to size this buffer
> > > according to how much memory the system has (and according to any limits set
> > > on how much memory the process is allowed to use.)
> > Uh, OK.  This scaling doesn't seem to work correctly.  It seems to
> > allocate 27 MB on 32-bit machines and 54 MB on 64-bit machines,
> > regardless of memory size.

I've looked at the code, as has a peer of mine.

As you said: the code shows that when no files are specified (e.g.  read
off a pipe), sort will make some assumptions regarding the initial
buffer size to read data into.  The buffer size allocated in that case
is fairly large, rather than basing it off of the first line off stdin;
it looks like this is done to save CPU time in the long run (otherwise
you'd have to rellocate more later and take a hit; initbuf() is
responsible for that).

I think being able to select which implementation you want (less memory
with more potential CPU overhead, vs. more memory with less potential
CPU overhead) would be best.

Regarding Erik's concern over how much is allocated on 32-bit vs.
64-bit: two things come to mind:

1) There was a recent discussion about this on freebsd-am64,
specifically in regards to increased VSZ of processes on amd64 vs. i386
with shared libraries:
http://lists.freebsd.org/pipermail/freebsd-amd64/2007-September/010254.html

2) int/long on amd64 are obviously 64-bit in size, while on i386 they're
32-bit (e.g. half the size).  27*2=54, so possibly that's where the
excess comes from?

> Looking at the code, it seems to go to extreme lengths to get it
> absolutely wrong.  For instance, if hw.physmem / 8 > hw.usermem, it will
> pick the former, which means it's pretty much guaranteed to either fail
> or hose your system (or both).

Can you expand on this?  Looking at the code, it doesn't appear that's
possible.  The code in question is default_sort_size(), which is used
when no -S or --buffer-size argument is specified.

> Count this as a vote for ditching GNU sort in favor of a BSD-licensed
> implementation (from {Net,Open}BSD for instance).

In this specific case, I think you're bashing GNU just because you feel
like it.  Come on man... =/

-- 
| Jeremy Chadwick                                    jdc at parodius.com |
| Parodius Networking                           http://www.parodius.com/ |
| UNIX Systems Administrator                      Mountain View, CA, USA |
| Making life hard for others since 1977.                  PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080204125831.GA4052>