Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 18 Jan 2003 09:48:55 +0800
From:      "Cheen Liao" <cheen@synology.com>
To:        <freebsd-fs@freebsd.org>
Subject:   Transaction File System - a replacement of JFS
Message-ID:  <001401c2be93$c36c7490$681adf3d@homexp>
References:  <20030114192634.75751.qmail@web13505.mail.yahoo.com> <20030117075118.GA3493@HAL9000.homeunix.com> <3E27DA7F.D5DBEFB@mindspring.com> <20030117222410.GA5449@HAL9000.homeunix.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Recently there are discussions on JFS on FreeBSD. I think my company's
development plan may meet the demands.

My company is planning to build a Transactional File System (TFS) on
FreeBSD, which has journaling (logging) capability and database capability.
The basic idea is to build a file system on a database engine. When it is
done, it should supersede JFS with its database functionality.

TFS also has some advantages over the traditional UNIX file systems. If we
put all "inodes" in a btree table, it will have faster path lookup by
replacing paired inode-directory block IOs with a btree search for the
inode. If design properly, a btee search takes a little bit more than 1 IOs
on the average, depending on how many internal pages are cached. Also a
small file's contents can be stored in a regular variable length field and
be part of the "inode". This greatly improve performance and space for small
files.

The TFS project is a long term project and now is in the early planning
stage. Here is the rough plan, and no schedule :)

. develop a prototype on FreeBSD 4.x.
. use postgreSQL as the internal database engine.
. define the database schema for
   a) storing directories and files "inodes".
   b) storing large objects (i.e. storing the block numbers for large files)
. write all VFS functions using postgreSQL lib in user mode.
. write a file system which will "callback (or pop up)" to user mode
functions described above.

At the end of this stage, we will have a running prototype of TFS, and
obviously it has serious performance problems. Also some of the database
functions are not good enough for the file system functions and need to be
strengthened. With the database engine inside we can easily add extended
attributes for each directory or file object and search on them. So in next
stage, we will

. move the core database engine into kernel. And this has to be FreeBSD 5.0
kernel. Because some of the database functions can take a long time to run.
The pre-5.0 kernel process is non-preemptive, the system could hang in
kernel because the long-running functions. Obviously we will need a lot of
helps from the FreeBSD community to make the move smooth. Especially merging
the database buffers with system cache will be a big challenge.

. strengthen the database functions:
   a) add new free space management that is suitable for database extension,
so it can run on raw block device.
   b) improve btree - store record in btree (clustered Btree), add btree
deletion function.
   c) improve large object storage - including clustering policy and
recovery policy.
   d) make logging robust. It will handle the "torn write".
   e) expose the database functions through new system calls or other
creative methods.

At this stage, we should have all the basic TFS working, and we will need a
lot of fine tunings and tools, such as

. performance tuning - a task that never ends.
. fsck on TFS - in case logs are lost and it will fix TFS to a consistent
state.
. add snapshot capability - it will be a piece of cake with logging
supports.
. add replication by shipping the log to another system and replay the log
there.

By now if you are still reading, then you probably know what we are trying
to achieve. Suggestions and discussions on TFS are extremely welcome. Any
suggestion on how to merge our efforts with BSD community's, if any, and
speed up the development?

Thanks,
Cheen


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?001401c2be93$c36c7490$681adf3d>