Date: Sat, 18 Jan 2003 09:48:55 +0800 From: "Cheen Liao" <cheen@synology.com> To: <freebsd-fs@freebsd.org> Subject: Transaction File System - a replacement of JFS Message-ID: <001401c2be93$c36c7490$681adf3d@homexp> References: <20030114192634.75751.qmail@web13505.mail.yahoo.com> <20030117075118.GA3493@HAL9000.homeunix.com> <3E27DA7F.D5DBEFB@mindspring.com> <20030117222410.GA5449@HAL9000.homeunix.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Recently there are discussions on JFS on FreeBSD. I think my company's development plan may meet the demands. My company is planning to build a Transactional File System (TFS) on FreeBSD, which has journaling (logging) capability and database capability. The basic idea is to build a file system on a database engine. When it is done, it should supersede JFS with its database functionality. TFS also has some advantages over the traditional UNIX file systems. If we put all "inodes" in a btree table, it will have faster path lookup by replacing paired inode-directory block IOs with a btree search for the inode. If design properly, a btee search takes a little bit more than 1 IOs on the average, depending on how many internal pages are cached. Also a small file's contents can be stored in a regular variable length field and be part of the "inode". This greatly improve performance and space for small files. The TFS project is a long term project and now is in the early planning stage. Here is the rough plan, and no schedule :) . develop a prototype on FreeBSD 4.x. . use postgreSQL as the internal database engine. . define the database schema for a) storing directories and files "inodes". b) storing large objects (i.e. storing the block numbers for large files) . write all VFS functions using postgreSQL lib in user mode. . write a file system which will "callback (or pop up)" to user mode functions described above. At the end of this stage, we will have a running prototype of TFS, and obviously it has serious performance problems. Also some of the database functions are not good enough for the file system functions and need to be strengthened. With the database engine inside we can easily add extended attributes for each directory or file object and search on them. So in next stage, we will . move the core database engine into kernel. And this has to be FreeBSD 5.0 kernel. Because some of the database functions can take a long time to run. The pre-5.0 kernel process is non-preemptive, the system could hang in kernel because the long-running functions. Obviously we will need a lot of helps from the FreeBSD community to make the move smooth. Especially merging the database buffers with system cache will be a big challenge. . strengthen the database functions: a) add new free space management that is suitable for database extension, so it can run on raw block device. b) improve btree - store record in btree (clustered Btree), add btree deletion function. c) improve large object storage - including clustering policy and recovery policy. d) make logging robust. It will handle the "torn write". e) expose the database functions through new system calls or other creative methods. At this stage, we should have all the basic TFS working, and we will need a lot of fine tunings and tools, such as . performance tuning - a task that never ends. . fsck on TFS - in case logs are lost and it will fix TFS to a consistent state. . add snapshot capability - it will be a piece of cake with logging supports. . add replication by shipping the log to another system and replay the log there. By now if you are still reading, then you probably know what we are trying to achieve. Suggestions and discussions on TFS are extremely welcome. Any suggestion on how to merge our efforts with BSD community's, if any, and speed up the development? Thanks, Cheen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?001401c2be93$c36c7490$681adf3d>