From owner-freebsd-questions Tue Jul 23 09:04:49 1996 Return-Path: owner-questions Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id JAA22677 for questions-outgoing; Tue, 23 Jul 1996 09:04:49 -0700 (PDT) Received: from merlin.nando.net (root@merlin.nando.net [152.52.2.2]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id JAA22672 for ; Tue, 23 Jul 1996 09:04:47 -0700 (PDT) Received: from Gary120.nando.net (grail1511.nando.net [152.52.29.51]) by merlin.nando.net (8.7/8.6.9) with SMTP id MAA28863 for ; Tue, 23 Jul 1996 12:04:34 -0400 (EDT) Message-ID: <31F5227B.652@swanlake.com> Date: Tue, 23 Jul 1996 12:05:31 -0700 From: "G. Jin" Organization: swanlake X-Mailer: Mozilla 2.02 (Win95; I) MIME-Version: 1.0 To: questions@freebsd.org Subject: one large file or many small files X-URL: http://www.freebsd.org/support.html Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-questions@freebsd.org X-Loop: FreeBSD.org Precedence: bulk To all: I have a web application that if become successful it could support up to 100K users. I will use a SQL database to maintain the users. However, each user will be able to add/delete/modify their own data area dynamically, and the length of data per user vary greatly from 10K to 20M, averaging some 200K per user. My original design was one big file as central store for these dynamical data. The design is basically done with my own elaborate storage management and garbage collection. I believe this way is much faster and storage efficient than a pure SQL database implementation. Now my concern is one big file may cause many lock out problems when accessed by many users simultaneously. Also one humongous file will certainly cause backup problems. I am thinking about giving each user a separate file, which will result to 100K files of 10K to 20M size instead of one big 20G-sized file. Can anybody tell me which way is better? I have the following considerations so far: 1. Can Linux/FreeBSD support 100K files? 2. Will 100K files cause a lot of disk fragmentation? 3. If the user id = "12345", a. In case of one big file, I have a SQL database table, given user ID, it returns the address of the data in the big file, and with one seek I can access to its data in the big file. b. In case many individual files, I will assign his data file to "f12345.dat". with 100K files in place, will accessing to a certain file "f12345.dat" cause too slow a directory search to find its address? Any feedback would be greatly appreciated! Ganlin Jin