From owner-freebsd-questions  Tue Jul 23 09:04:49 1996
Return-Path: owner-questions
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id JAA22677
          for questions-outgoing; Tue, 23 Jul 1996 09:04:49 -0700 (PDT)
Received: from merlin.nando.net (root@merlin.nando.net [152.52.2.2])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id JAA22672
          for <questions@freebsd.org>; Tue, 23 Jul 1996 09:04:47 -0700 (PDT)
Received: from Gary120.nando.net (grail1511.nando.net [152.52.29.51]) by merlin.nando.net (8.7/8.6.9) with SMTP id MAA28863 for <questions@freebsd.org>; Tue, 23 Jul 1996 12:04:34 -0400 (EDT)
Message-ID: <31F5227B.652@swanlake.com>
Date: Tue, 23 Jul 1996 12:05:31 -0700
From: "G. Jin" <gj@swanlake.com>
Organization: swanlake
X-Mailer: Mozilla 2.02 (Win95; I)
MIME-Version: 1.0
To: questions@freebsd.org
Subject: one large file or many small files
X-URL: http://www.freebsd.org/support.html
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-questions@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

To all:

I have a web application that if become successful it could support up 
to 100K users.
I will use a SQL database to maintain the users. However, each user will 
be able
to add/delete/modify their own data area dynamically, and the length of 
data per user
vary greatly from 10K to 20M, averaging some 200K per user. My original 
design was
one big file as central store for these dynamical data. The design is 
basically done
with my own elaborate storage management and garbage collection. I 
believe this way
is much faster and storage efficient than a pure SQL database 
implementation.

Now my concern is one big file may cause many lock out problems when 
accessed by
many users simultaneously. Also one humongous file will certainly cause 
backup problems.

I am thinking about giving each user a separate file, which will result 
to 100K 
files of 10K to 20M size instead of one big 20G-sized file. 

Can anybody tell me which way is better?

I have the following considerations so far:

1. Can Linux/FreeBSD support 100K files?
2. Will 100K files cause a lot of disk fragmentation?
3. If the user id = "12345", 
   a. In case of one big file, I have a SQL database table, given user 
ID,
      it returns the address of the data in the big file, and with one 
seek I
      can access to its data in the big file.
   b. In case many individual files, I will assign his data file to 
"f12345.dat".
      with 100K files in place, will accessing to a certain file 
"f12345.dat"
      cause too slow a directory search to find its address?

Any feedback would be greatly appreciated!


Ganlin Jin