Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 Aug 2008 18:56:13 +0200
From:      cpghost <cpghost@cordula.ws>
To:        Laszlo Nagy <gandalf@shopzeus.com>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: Max. number of opened files, efficiency
Message-ID:  <20080813165613.GB18638@epia-2.farid-hajji.net>
In-Reply-To: <48A2EBD7.9000903@shopzeus.com>
References:  <48A2EBD7.9000903@shopzeus.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Aug 13, 2008 at 04:12:39PM +0200, Laszlo Nagy wrote:
> How many files can I open under FreeBSD, at the same time?

% sysctl -a | grep maxfiles
kern.maxfiles: 7880
kern.maxfilesperproc: 7092

But remember that you're already using a few hundred file descriptors,
so usually, you won't have more than 6800 or so open files for your
application... unless you crank up those values (in /etc/sysctl.conf
IIRC)

Your shell may also limit the number of open files (cf openfiles
below):

% limits
Resource limits (current):
  cputime          infinity secs
  filesize         infinity kB
  datasize           524288 kB
  stacksize           65536 kB
  coredumpsize     infinity kB
  memoryuse        infinity kB
  memorylocked     infinity kB
  maxprocesses         3546
  openfiles            7092
  sbsize           infinity bytes
  vmemoryuse       infinity kB

> Problem: I'm making a pivot table, and when I drill down the facts, I 
> would like to create a new temporary file for each possible dimension 
> value. In most cases, there will be less than 1000 dimension values. I 
> tried to open 1000 temporary files and I could do so within one second.
> 
> But how efficient is that? What happens when I open 1000 temporary 
> files, and write data into them randomly, 10 million times. (avg. 10 000 
> write operations per file) Will this be handled efficiently by the OS? 
> Is efficiency affected by the underlying filesystem?

Wouldn't it be more efficient to use a DBM file (anydbm, bsddb),
indexed by dimension, for this?

You may also want to consider numpy and some modules in scipy
for this kind of computations: IIRC they do have some functions
to efficiently store and read back binary data to/from files.
And numpy (ndarray) does have a nice slice-like syntax too.

> I also tried to create 10 000 temporary files, but performance dropped down.
> 
> Example in Python:
> 
> import tempfile
> import time
> N = 10000
> start = time.time()
> files = [ tempfile.TemporaryFile() for i in range(N)]
> stop = time.time()
> print "created %s files/second" % ( int(N/(stop-start)) )
> 
> On my computer this program prints "3814 files/second" for N=1000, and  
> "1561 files/second" for N=10000.
> 
> Thanks,
> 
>    Laszlo

Regards,
-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080813165613.GB18638>