Date: Wed, 29 Sep 2010 09:17:04 -0400 From: John Baldwin <jhb@freebsd.org> To: freebsd-fs@freebsd.org Subject: Re: ext2fs now extremely slow Message-ID: <201009290917.05269.jhb@freebsd.org> In-Reply-To: <20100929041650.GA1553@aditya> References: <20100929031825.L683@besplex.bde.org> <20100929084801.M948@besplex.bde.org> <20100929041650.GA1553@aditya>
next in thread | previous in thread | raw e-mail | index | archive | help
--Boundary-00=_QxzoMs/Iug8+N80 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit On Wednesday, September 29, 2010 12:16:55 am Aditya Sarawgi wrote: > On Wed, Sep 29, 2010 at 09:14:57AM +1000, Bruce Evans wrote: > > On Wed, 29 Sep 2010, Bruce Evans wrote: > > > > > On Wed, 29 Sep 2010, Bruce Evans wrote: > > > > > >> For benchmarks on ext2fs: > > >> > > >> Under FreeBSD-~5.2 rerun today: > > >> untar: 59.17 real > > >> tar: 19.52 real > > >> > > >> Under -current run today: > > >> untar: 101.16 real > > >> tar: 172.03 real > > >> > > >> So, -current is 8.8 times slower for tar, but only 1.7 times slower for > > >> untar. > > >> ... > > >> So it seems that only 1 block in every 8 is used, and there is a seek > > >> after every block. This asks for an 8-fold reduction in throughput, > > >> and it seems to have got that and a bit more for reading although not > > >> for writing. Even (or especially) with perfect hardware, it must give > > >> an 8-fold reduction. And it is likely to give more, since it defeats > > >> vfs clustering by making all runs of contiguous blocks have length 1. > > >> > > >> Simple sequential allocation should be used unless the allocation policy > > >> and implementation are very good. > > > > > > This work a bit better after zapping the 8-fold way: > > Things > > > ... > > > This gives an improvement of: > > > > > > untar: 101.16 real -> 63.46 > > > tar: 172.03 real -> 50.70 > > > > > > Now -current is only 1.1 times slower for untar and 2.6 times slower for > > > tar. > > > > > > There must be a problem with bpref for things to have been so bad. There > > > is some point to leaving a gap of 7 blocks for expansion, but the gap was > > > left even between blocks in a single file. > > > ... > > > I haven't tried the bde_blkpref hack in the above. It should kill bpref > > > completely so that there is no jump between lbn0 and lbn1, and break > > > cylinder group based allocation even better. Setting bde_blkpref to 1 > > > restores the bug that was present in ext2fs in FreeBSD between 1995 and > > > 2010. This bug gave seqential allocation starting at the beginning of > > > the disk in almost all cases, so map searches were slow and early groups > > > filled up before later groups were used at all. > > > > Tried this (patch repeated below), and it gave essentially the same > > speed as old versions. > > > > The main problem seems to be that the `goal' variables aren't initialized. > > After restoring bits verbatim from an old version, things seem to work as > > expected: > > > > % Index: ext2_alloc.c > > % =================================================================== > > % RCS file: /home/ncvs/src/sys/fs/ext2fs/ext2_alloc.c,v > > % retrieving revision 1.2 > > % diff -u -2 -r1.2 ext2_alloc.c > > % --- ext2_alloc.c 1 Sep 2010 05:34:17 -0000 1.2 > > % +++ ext2_alloc.c 28 Sep 2010 21:08:42 -0000 > > % @@ -1,2 +1,5 @@ > > % +int bde_blkpref = 0; > > % +int bde_alloc8 = 0; > > % + > > % /*- > > % * modified for Lites 1.1 > > % @@ -117,4 +120,8 @@ > > % ext2_alloccg); > > % if (bno > 0) { > > % + /* set next_alloc fields as done in block_getblk */ > > % + ip->i_next_alloc_block = lbn; > > % + ip->i_next_alloc_goal = bno; > > % + > > % ip->i_blocks += btodb(fs->e2fs_bsize); > > % ip->i_flag |= IN_CHANGE | IN_UPDATE; > > > > The only things that changed recently in this block were the 4 deleted > > lines and 4 lines with tabs corrupted to spaces. Perhaps an editing > > error. > > > > % @@ -542,6 +549,12 @@ > > % then set the goal to what we thought it should be > > % */ > > % +if (bde_blkpref == 0) { > > % if(ip->i_next_alloc_block == lbn && ip->i_next_alloc_goal != 0) > > % return ip->i_next_alloc_goal; > > % +} else if (bde_blkpref == 1) { > > % + if(ip->i_next_alloc_block == lbn) > > % + return ip->i_next_alloc_goal; > > % +} else > > % + return 0; > > % > > % /* now check whether we were provided with an array that basically > > > > Not needed now. > > > > % @@ -662,4 +675,5 @@ > > % * block. > > % */ > > % +if (bde_alloc8 == 0) { > > % if (bpref) > > % start = dtogd(fs, bpref) / NBBY; > > % @@ -679,4 +693,5 @@ > > % } > > % } > > % +} > > % > > % bno = ext2_mapsearch(fs, bbp, bpref); > > > > The code to skip to the next 8-block boundary should be removed permanently. > > After fixing the initialization, it doesn't generate holes inside files but > > it still generates holes between files. The holes are quite large with > > 4K-blocks. > > > > Benchmark results with just the initialization of `goal' variables restored: > > > > %%% > > ext2fs-1024-1024: > > tarcp /f srcs: 78.79 real 0.31 user 4.94 sys > > tar cf /dev/zero srcs: 24.62 real 0.19 user 1.82 sys > > ext2fs-1024-1024-as: > > tarcp /f srcs: 52.07 real 0.26 user 4.95 sys > > tar cf /dev/zero srcs: 24.80 real 0.10 user 1.93 sys > > ext2fs-4096-4096: > > tarcp /f srcs: 74.14 real 0.34 user 3.96 sys > > tar cf /dev/zero srcs: 33.82 real 0.10 user 1.19 sys > > ext2fs-4096-4096-as: > > tarcp /f srcs: 53.54 real 0.36 user 3.87 sys > > tar cf /dev/zero srcs: 33.91 real 0.14 user 1.15 sys > > %%% > > > > The much larger holes between the files are apparently responsible for the > > decreased speed with 4K-blocks. 1K-blocks are really too small, so 4K-blocks > > should be faster. > > > > Benchmark results with the fix and bde_alloc8 = 1. > > > > ext2fs-1024-1024: > > tarcp /f srcs: 71.60 real 0.15 user 2.04 sys > > tar cf /dev/zero srcs: 22.34 real 0.05 user 0.79 sys > > ext2fs-1024-1024-as: > > tarcp /f srcs: 46.03 real 0.14 user 2.02 sys > > tar cf /dev/zero srcs: 21.97 real 0.05 user 0.80 sys > > ext2fs-4096-4096: > > tarcp /f srcs: 59.66 real 0.13 user 1.63 sys > > tar cf /dev/zero srcs: 19.88 real 0.07 user 0.46 sys > > ext2fs-4096-4096-as: > > tarcp /f srcs: 37.30 real 0.12 user 1.60 sys > > tar cf /dev/zero srcs: 19.93 real 0.05 user 0.49 sys > > > > Bruce > > Hi, > > I see what you are saying. The gap of 8 block between the files > is due to the old preallocation which used to allocate additional > 8 blocks in advance for a particular inode when allocating a block > for it. The gap between blocks of the same file shouldn't be there > too. Both of these cases should be removed. I will look into this > during this week. The slowness is also due to lack of preallocation > in the new code. One of the GSoC students worked on a patch to add preallocation back to ext2fs this summer. Would you be interested in reviewing and/or testing that patch? (I've attached it). Here is his original e-mail: <quote> Hi all, There is a patch in attachment which implements a preallocation algorithm in ext2fs. I implement this algorithm during FreeBSD SoC 2010. This patch implements the in-memory ext2/3 block preallocation algorithm from reservation window. It uses a RB-tree to index block allocation request and reserve a number of blocks for each file which has requested to allocate a block. When a file request to allocate a block, it will find a block to allocate to this file. When it find the block to allocate, it will try to allocate a block, which is in the same cylinder group with inode and is not in other reservation window in RB-tree. Meanwhile there are some contiguous free blocks after this block. It uses a data structure to store this block's position and the length of contiguous free blocks. Then it inserts this data structure into RB-tree. When this file request to allocate a block again, It will find corresponding data structure in RB-tree. If it can find, the next free block will be allocated to this file directly. Otherwise, it will search a new block again. I have run some benchmarks to test this algorithm. Please review it in wiki page (' http://wiki.freebsd.org/SOC2010ZhengLiu'). The performance is better when the number of threads is smaller than 4. When the number of threads is greater than 4, the performance can be increased a little. Please test it. Thanks and best regards, lz </quote> -- John Baldwin --Boundary-00=_QxzoMs/Iug8+N80 Content-Type: text/x-patch; charset="UTF-8"; name="ext2fs_prealloc.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="ext2fs_prealloc.patch" diff -urN /usr/src/sys/fs/ext2fs/ext2_alloc.c new/ext2_alloc.c --- /usr/src/sys/fs/ext2fs/ext2_alloc.c 2010-01-14 22:30:54.000000000 +0800 +++ new/ext2_alloc.c 2010-08-19 02:47:29.000000000 +0800 @@ -50,6 +50,9 @@ #include <fs/ext2fs/ext2fs.h> #include <fs/ext2fs/fs.h> #include <fs/ext2fs/ext2_extern.h> +#include <fs/ext2fs/ext2_rsv_win.h> + +#define phy_blk(cg, fs) (((cg) * (fs->e2fs->e2fs_fpg)) + fs->e2fs->e2fs_first_dblock) static daddr_t ext2_alloccg(struct inode *, int, daddr_t, int); static u_long ext2_dirpref(struct inode *); @@ -59,37 +62,524 @@ int)); static daddr_t ext2_nodealloccg(struct inode *, int, daddr_t, int); static daddr_t ext2_mapsearch(struct m_ext2fs *, char *, daddr_t); + +/* For reservation window */ +static u_long ext2_alloc_blk(struct inode *, int, struct buf *, int32_t, struct ext2_rsv_win *); +static int ext2_alloc_new_rsv(struct inode *, int, struct buf *, int32_t); +static int ext2_bpref_in_rsv(struct ext2_rsv_win *, int32_t); +static int ext2_find_rsv(struct ext2_rsv_win *, struct ext2_rsv_win *, + struct m_ext2fs *, int32_t, int); +static void ext2_remove_rsv_win(struct m_ext2fs *, struct ext2_rsv_win *); +static u_long ext2_rsvalloc(struct m_ext2fs *, struct inode *, + int, struct buf *, int32_t, int); +static daddr_t ext2_search_next_block(struct m_ext2fs *, char *, int, int); +static struct ext2_rsv_win *ext2_search_rsv(struct ext2_rsv_win_tree *, int32_t); + +RB_GENERATE(ext2_rsv_win_tree, ext2_rsv_win, rsv_link, ext2_rsv_win_cmp); + /* * Allocate a block in the file system. * - * A preference may be optionally specified. If a preference is given - * the following hierarchy is used to allocate a block: - * 1) allocate the requested block. - * 2) allocate a rotationally optimal block in the same cylinder. - * 3) allocate a block in the same cylinder group. - * 4) quadradically rehash into other cylinder groups, until an - * available block is located. - * If no block preference is given the following hierarchy is used - * to allocate a block: - * 1) allocate a block in the cylinder group that contains the - * inode for the file. - * 2) quadradically rehash into other cylinder groups, until an - * available block is located. - * - * A preference may be optionally specified. If a preference is given - * the following hierarchy is used to allocate a block: - * 1) allocate the requested block. - * 2) allocate a rotationally optimal block in the same cylinder. - * 3) allocate a block in the same cylinder group. - * 4) quadradically rehash into other cylinder groups, until an - * available block is located. - * If no block preference is given the following hierarchy is used - * to allocate a block: - * 1) allocate a block in the cylinder group that contains the - * inode for the file. - * 2) quadradically rehash into other cylinder groups, until an - * available block is located. + * By given preference: + * Check whether inode has a reservation window and preference + * is within it and try to allocate a free block from + * this reservation window. + * If not, traverse RB tree to find a place, which is not in + * any window and insert it to RB tree to try to allocate a + * free block again. + * If it fails, try to allocate a free block in other cylinder + * groups without preference. + */ + +/* + * Allocate a free block. + * + * First check whether reservation window is used. + * If reservation window is used, try to allocate a free + * block from the reservation window. If it fails, traverse + * the bitmap to find a free block. + * If reservation window is not used, try to allocate + * a free block by bpref. If it fails, traverse the bitmap + * to find a free block. */ +static u_long +ext2_alloc_blk(struct inode *ip, int cg, struct buf *bp, + int32_t bpref, struct ext2_rsv_win *rp) +{ + struct m_ext2fs *fs; + struct ext2mount *ump; + int bno, start, end; + char *bbp; + + fs = ip->i_e2fs; + ump = ip->i_ump; + bbp = (char *)bp->b_data; + + if (fs->e2fs_gd[cg].ext2bgd_nbfree == 0) + return (0); + + if (bpref < 0) + bpref = 0; + + /* Check whether it use reservation window */ + if (rp != NULL) { + /* + * If window's start is not in this cylinder group, + * try to allocate from the beginning, otherwise + * try to allocate from the beginning of the + * window. + */ + if (dtog(fs, rp->rsv_start) < cg) + start = 0; + else + start = rp->rsv_start; + + /* + * If window's end crosses the end of this group, + * set end variable to the end of this group. + * Otherwise, set it to the window's end. + */ + if (dtog(fs, rp->rsv_end) > cg) + end = phy_blk(cg + 1, fs) - 1; + else + end = rp->rsv_end; + + /* If preference block is within the window, try to allocate it. */ + if (start <= bpref && bpref <= end) { + bpref = dtogd(fs, bpref); + if (isclr(bbp, bpref)) { + rp->rsv_alloc_hit++; + bno = bpref; + goto gotit; + } + } else + if (dtog(fs, rp->rsv_start) == cg) + bpref = dtogd(fs, rp->rsv_start); + else + bpref = 0; + } else { + if (dtog(fs, bpref) != cg) + bpref = 0; + if (bpref != 0) { + bpref = dtogd(fs, bpref); + if (isclr(bbp, bpref)) { + bno = bpref; + goto gotit; + } + } + } + + bno = ext2_mapsearch(fs, bbp, bpref); + if (bno < 0) + return (0); + +gotit: + setbit(bbp, (daddr_t)bno); + EXT2_LOCK(ump); + fs->e2fs->e2fs_fbcount--; + fs->e2fs_gd[cg].ext2bgd_nbfree--; + fs->e2fs_fmod = 1; + EXT2_UNLOCK(ump); + bdwrite(bp); + bno = phy_blk(cg, fs) + bno; + return (bno); +} + +/* + * Initialize reservation window per inode. + */ +void +ext2_init_rsv(struct inode *ip) +{ + struct ext2_rsv_win *rp; + + rp = malloc(sizeof(struct ext2_rsv_win), + M_EXT2NODE, M_WAITOK | M_ZERO); + + /* + * If malloc failed, we just do not use the + * reservation window mechanism. + */ + if (rp == NULL) + return; + + rp->rsv_start = EXT2_RSV_NOT_ALLOCATED; + rp->rsv_end = EXT2_RSV_NOT_ALLOCATED; + + rp->rsv_goal_size = EXT2_RSV_DEFAULT_RESERVE_BLKS; + rp->rsv_alloc_hit = 0; + + ip->i_rsv = rp; +} + +/* + * Discard reservation window. + * + * It is called during the following situations: + * 1. free an inode + * 2. sync inode + * 3. truncate a file + */ +void +ext2_discard_rsv(struct inode *ip) +{ + struct ext2_rsv_win *rp; + + if (ip->i_rsv == NULL) + return; + + rp = ip->i_rsv; + + /* If reservation window is empty, nothing to do */ + if (rp->rsv_end == EXT2_RSV_NOT_ALLOCATED) + return; + + EXT2_TREE_LOCK(ip->i_e2fs); + ext2_remove_rsv_win(ip->i_e2fs, rp); + EXT2_TREE_UNLOCK(ip->i_e2fs); + rp->rsv_goal_size = EXT2_RSV_DEFAULT_RESERVE_BLKS; +} + +/* + * Remove a ext2_rsv_win structure from RB tree. + */ +static void +ext2_remove_rsv_win(struct m_ext2fs *fs, struct ext2_rsv_win *rp) +{ + RB_REMOVE(ext2_rsv_win_tree, fs->e2fs_rsv_tree, rp); + rp->rsv_start = EXT2_RSV_NOT_ALLOCATED; + rp->rsv_end = EXT2_RSV_NOT_ALLOCATED; + rp->rsv_alloc_hit = 0; +} + +/* + * Check bpref is in the reservation window. + */ +static int +ext2_bpref_in_rsv(struct ext2_rsv_win *rp, int32_t bpref) +{ + if (bpref >= 0 && (bpref < rp->rsv_start || bpref > rp->rsv_end)) + return (0); + + return (1); +} + +/* + * Search a tree node from RB tree. It includes the bpref or + * the previous one if bpref is not in any window. + */ +static struct ext2_rsv_win * +ext2_search_rsv(struct ext2_rsv_win_tree *root, int32_t start) +{ + struct ext2_rsv_win *prev, *next; + + if (RB_EMPTY(root)) + return (NULL); + + next = RB_ROOT(root); + do { + prev = next; + if (start < next->rsv_start) + next = RB_LEFT(next, rsv_link); + else if (start > next->rsv_end) + next = RB_RIGHT(next, rsv_link); + else + return (next); + } while (next != NULL); + + if (prev->rsv_start > start) { + next = RB_PREV(ext2_rsv_win_tree, root, prev); + if (next != NULL) + prev = next; + } + + return (prev); +} + +/* + * Find a reservation window by given range from start to + * the end of this cylinder group. + */ +static int +ext2_find_rsv(struct ext2_rsv_win *search, struct ext2_rsv_win *rp, + struct m_ext2fs *fs, int32_t start, int cg) +{ + struct ext2_rsv_win *rsv, *prev; + int32_t cur; + int size = rp->rsv_goal_size; + + if (search == NULL) { + rp->rsv_start = start & ~7; + rp->rsv_end = start + size - 1; + rp->rsv_alloc_hit = 0; + + RB_INSERT(ext2_rsv_win_tree, fs->e2fs_rsv_tree, rp); + + return (0); + } + + /* + * Make the start of reservation window byte-aligned + * in order to can find a free block with bit operations + * in the ext2_search_next_block() function. + */ + cur = start & ~7; + rsv = search; + prev = NULL; + + while (1) { + if (cur <= rsv->rsv_end) + cur = rsv->rsv_end + 1; + + if (dtog(fs, cur) != cg) + return (-1); + + prev = rsv; + rsv = RB_NEXT(ext2_rsv_win_tree, fs->e2fs_rsv_tree, rsv); + + if (rsv == NULL) + break; + + if (cur + size <= rsv->rsv_start) + break; + } + + if (prev != rp && rp->rsv_end != EXT2_RSV_NOT_ALLOCATED) + ext2_remove_rsv_win(fs, rp); + + rp->rsv_start = cur; + rp->rsv_end = cur + size - 1; + rp->rsv_alloc_hit = 0; + + if (prev != rp) + RB_INSERT(ext2_rsv_win_tree, fs->e2fs_rsv_tree, rp); + + return (0); +} + +/* + * Find a free block by given range from bpref to + * the end of this cylinder group. + */ +static daddr_t +ext2_search_next_block(struct m_ext2fs *fs, char *bbp, int bpref, int cg) +{ + daddr_t bno; + int start, loc, len, map, i; + + start = bpref / NBBY; + len = howmany(fs->e2fs->e2fs_fpg, NBBY) - start; + loc = skpc(0xff, len, &bbp[start]); + if (loc == 0) + return (-1); + + i = start + len - loc; + map = bbp[i]; + bno = i * NBBY; + for (i = 1; i < (1 << NBBY); i <<= 1, bno++) { + if ((map & i) == 0) + return (bno); + } + + return (-1); +} + +/* + * Allocate a new reservation window. + */ +static int +ext2_alloc_new_rsv(struct inode *ip, int cg, struct buf *bp, int32_t bpref) +{ + struct m_ext2fs *fs; + struct ext2_rsv_win *rp, *search; + char *bbp; + int start, size, ret; + + fs = ip->i_e2fs; + rp = ip->i_rsv; + bbp = bp->b_data; + size = rp->rsv_goal_size; + + if (bpref <= 0) + start = phy_blk(cg, fs); + else + start = bpref; + + /* Dynamically increase the size of window */ + if (rp->rsv_end != EXT2_RSV_NOT_ALLOCATED) { + if (rp->rsv_alloc_hit > + ((rp->rsv_end - rp->rsv_start + 1) / 2)) { + size = size * 2; + if (size > EXT2_RSV_MAX_RESERVE_BLKS) + size = EXT2_RSV_MAX_RESERVE_BLKS; + rp->rsv_goal_size = size; + } + } + + EXT2_TREE_LOCK(fs); + + search = ext2_search_rsv(fs->e2fs_rsv_tree, start); + +repeat: + ret = ext2_find_rsv(search, rp, fs, start, cg); + if (ret < 0) { + if (rp->rsv_end != EXT2_RSV_NOT_ALLOCATED) + ext2_remove_rsv_win(fs, rp); + EXT2_TREE_UNLOCK(fs); + return (-1); + } + EXT2_TREE_UNLOCK(fs); + + start = dtogd(fs, rp->rsv_start); + start = ext2_search_next_block(fs, bbp, start, cg); + if (start < 0) { + EXT2_TREE_LOCK(fs); + if (rp->rsv_end != EXT2_RSV_NOT_ALLOCATED) + ext2_remove_rsv_win(fs, rp); + EXT2_TREE_UNLOCK(fs); + return (-1); + } + + start = phy_blk(cg, fs) + start; + if (start >= rp->rsv_start && start <= rp->rsv_end) + return (0); + + search = rp; + EXT2_TREE_LOCK(fs); + goto repeat; +} + +/* + * Allocate a free block from reservation window. + */ +static u_long +ext2_rsvalloc(struct m_ext2fs *fs, struct inode *ip, int cg, + struct buf *bp, int32_t bpref, int size) +{ + struct ext2_rsv_win *rp; + int ret; + + rp = ip->i_rsv; + if (rp == NULL) + return (ext2_alloc_blk(ip, cg, bp, bpref, NULL)); + + if (rp->rsv_end == EXT2_RSV_NOT_ALLOCATED || + !ext2_bpref_in_rsv(rp, bpref)) { + ret = ext2_alloc_new_rsv(ip, cg, bp, bpref); + if (ret < 0) + return (0); + } + + return (ext2_alloc_blk(ip, cg, bp, bpref, rp)); +} + +/* + * Allocate a block using reservation window in ext2 file system. + * + * NOTE: This function will replace the ext2_alloc() function. + */ +int +ext2_alloc_rsv(struct inode *ip, int32_t lbn, int32_t bpref, + int size, struct ucred *cred, int32_t *bnp) +{ + struct m_ext2fs *fs; + struct ext2mount *ump; + struct buf *bp; + int32_t bno = 0; + int i, cg, error; + + *bnp = 0; + fs = ip->i_e2fs; + ump = ip->i_ump; + mtx_assert(EXT2_MTX(ump), MA_OWNED); + + if (size == fs->e2fs_bsize && fs->e2fs->e2fs_fbcount == 0) + goto nospace; + if (cred->cr_uid != 0 && + fs->e2fs->e2fs_fbcount < fs->e2fs->e2fs_rbcount) + goto nospace; + + if (bpref >= fs->e2fs->e2fs_bcount) + bpref = 0; + if (bpref == 0) + cg = ino_to_cg(fs, ip->i_number); + else + cg = dtog(fs, bpref); + + /* If cg has some free blocks, then try to allocate a free block from this cg */ + if (fs->e2fs_gd[cg].ext2bgd_nbfree > 0) { + /* Read block bitmap from buffer */ + EXT2_UNLOCK(ump); + error = bread(ip->i_devvp, + fsbtodb(fs, fs->e2fs_gd[cg].ext2bgd_b_bitmap), + (int)fs->e2fs_bsize, NOCRED, &bp); + if (error) { + brelse(bp); + goto ioerror; + } + + EXT2_RSV_LOCK(ip); + /* Try to allocate from reservation window */ + bno = ext2_rsvalloc(fs, ip, cg, bp, bpref, size); + EXT2_RSV_UNLOCK(ip); + if (bno > 0) + goto allocated; + + brelse(bp); + EXT2_LOCK(ump); + } + + /* Just need to try to allocate a free block from rest groups. */ + cg = (cg + 1) % fs->e2fs_gcount; + for (i = 1; i < fs->e2fs_gcount; i++) { + if (fs->e2fs_gd[cg].ext2bgd_nbfree > 0) { + /* Read block bitmap from buffer */ + EXT2_UNLOCK(ump); + error = bread(ip->i_devvp, + fsbtodb(fs, fs->e2fs_gd[cg].ext2bgd_b_bitmap), + (int)fs->e2fs_bsize, NOCRED, &bp); + if (error) { + brelse(bp); + goto ioerror; + } + + EXT2_RSV_LOCK(ip); + bno = ext2_rsvalloc(fs, ip, cg, bp, -1, size); + EXT2_RSV_UNLOCK(ip); + if (bno > 0) + goto allocated; + + brelse(bp); + EXT2_LOCK(ump); + } + + cg++; + if (cg == fs->e2fs_gcount) + cg = 0; + } + +allocated: + if (bno > 0) { + ip->i_next_alloc_block = lbn; + ip->i_next_alloc_goal = bno; + + ip->i_blocks += btodb(fs->e2fs_bsize); + ip->i_flag |= IN_CHANGE | IN_UPDATE; + *bnp = bno; + return (0); + } + +nospace: + EXT2_UNLOCK(ump); + ext2_fserr(fs, cred->cr_uid, "file system full"); + uprintf("\n%s: write failed, file system is full\n", fs->e2fs_fsmnt); + return (ENOSPC); + +ioerror: + ext2_fserr(fs, cred->cr_uid, "file system IO error"); + uprintf("\n%s: write failed, file system IO error\n", fs->e2fs_fsmnt); + return (EIO); +} int ext2_alloc(ip, lbn, bpref, size, cred, bnp) @@ -923,9 +1413,11 @@ start = 0; loc = skpc(0xff, len, &bbp[start]); if (loc == 0) { - printf("start = %d, len = %d, fs = %s\n", - start, len, fs->e2fs_fsmnt); - panic("ext2fs_alloccg: map corrupted"); + /* XXX: just for reservation window */ + return -1; + /*printf("start = %d, len = %d, fs = %s\n",*/ + /*start, len, fs->e2fs_fsmnt);*/ + /*panic("ext2fs_alloccg: map corrupted");*/ /* NOTREACHED */ } } diff -urN /usr/src/sys/fs/ext2fs/ext2_balloc.c new/ext2_balloc.c --- /usr/src/sys/fs/ext2fs/ext2_balloc.c 2010-01-14 22:30:54.000000000 +0800 +++ new/ext2_balloc.c 2010-08-19 02:47:29.000000000 +0800 @@ -49,6 +49,7 @@ #include <fs/ext2fs/fs.h> #include <fs/ext2fs/ext2_extern.h> #include <fs/ext2fs/ext2_mount.h> +#include <fs/ext2fs/ext2_rsv_win.h> /* * Balloc defines the structure of file system storage * by allocating the physical blocks on a device given @@ -78,6 +79,9 @@ fs = ip->i_e2fs; ump = ip->i_ump; + if (ip->i_rsv == NULL) + ext2_init_rsv(ip); + /* * check if this is a sequential block allocation. * If so, increment next_alloc fields to allow ext2_blkpref @@ -136,9 +140,9 @@ else nsize = fs->e2fs_bsize; EXT2_LOCK(ump); - error = ext2_alloc(ip, lbn, - ext2_blkpref(ip, lbn, (int)lbn, &ip->i_db[0], 0), - nsize, cred, &newb); + error = ext2_alloc_rsv(ip, lbn, + ext2_blkpref(ip, lbn, (int)lbn, &ip->i_db[0], 0), + nsize, cred, &newb); if (error) return (error); bp = getblk(vp, lbn, nsize, 0, 0, 0); @@ -170,9 +174,9 @@ EXT2_LOCK(ump); pref = ext2_blkpref(ip, lbn, indirs[0].in_off + EXT2_NDIR_BLOCKS, &ip->i_db[0], 0); - if ((error = ext2_alloc(ip, lbn, pref, - (int)fs->e2fs_bsize, cred, &newb))) - return (error); + if ((error = ext2_alloc_rsv(ip, lbn, pref, + (int)fs->e2fs_bsize, cred, &newb))) + return (error); nb = newb; bp = getblk(vp, indirs[1].in_lbn, fs->e2fs_bsize, 0, 0, 0); bp->b_blkno = fsbtodb(fs, newb); @@ -211,7 +215,7 @@ if (pref == 0) pref = ext2_blkpref(ip, lbn, indirs[i].in_off, bap, bp->b_lblkno); - error = ext2_alloc(ip, lbn, pref, (int)fs->e2fs_bsize, cred, &newb); + error = ext2_alloc_rsv(ip, lbn, pref, (int)fs->e2fs_bsize, cred, &newb); if (error) { brelse(bp); return (error); @@ -250,8 +254,8 @@ EXT2_LOCK(ump); pref = ext2_blkpref(ip, lbn, indirs[i].in_off, &bap[0], bp->b_lblkno); - if ((error = ext2_alloc(ip, - lbn, pref, (int)fs->e2fs_bsize, cred, &newb)) != 0) { + if ((error = ext2_alloc_rsv(ip, lbn, pref, + (int)fs->e2fs_bsize, cred, &newb)) != 0) { brelse(bp); return (error); } diff -urN /usr/src/sys/fs/ext2fs/ext2_inode.c new/ext2_inode.c --- /usr/src/sys/fs/ext2fs/ext2_inode.c 2010-01-14 22:30:54.000000000 +0800 +++ new/ext2_inode.c 2010-08-19 02:47:29.000000000 +0800 @@ -52,6 +52,7 @@ #include <fs/ext2fs/ext2fs.h> #include <fs/ext2fs/fs.h> #include <fs/ext2fs/ext2_extern.h> +#include <fs/ext2fs/ext2_rsv_win.h> static int ext2_indirtrunc(struct inode *, int32_t, int32_t, int32_t, int, long *); @@ -153,6 +154,11 @@ } fs = oip->i_e2fs; osize = oip->i_size; + + EXT2_RSV_LOCK(oip); + ext2_discard_rsv(oip); + EXT2_RSV_UNLOCK(oip); + /* * Lengthen the size of the file. We must ensure that the * last byte of the file is allocated. Since the smallest @@ -484,6 +490,10 @@ if (prtactive && vrefcnt(vp) != 0) vprint("ext2_inactive: pushing active", vp); + EXT2_RSV_LOCK(ip); + ext2_discard_rsv(ip); + EXT2_RSV_UNLOCK(ip); + /* * Ignore inodes related to stale file handles. */ @@ -525,11 +535,21 @@ if (prtactive && vrefcnt(vp) != 0) vprint("ufs_reclaim: pushing active", vp); ip = VTOI(vp); + if (ip->i_flag & IN_LAZYMOD) { ip->i_flag |= IN_MODIFIED; ext2_update(vp, 0); } vfs_hash_remove(vp); + + EXT2_RSV_LOCK(ip); + if (ip->i_rsv != NULL) { + free(ip->i_rsv, M_EXT2NODE); + ip->i_rsv = NULL; + } + EXT2_RSV_UNLOCK(ip); + mtx_destroy(&ip->i_rsv_lock); + free(vp->v_data, M_EXT2NODE); vp->v_data = 0; vnode_destroy_vobject(vp); diff -urN /usr/src/sys/fs/ext2fs/ext2_rsv_win.h new/ext2_rsv_win.h --- /usr/src/sys/fs/ext2fs/ext2_rsv_win.h 1970-01-01 08:00:00.000000000 +0800 +++ new/ext2_rsv_win.h 2010-08-19 02:47:29.000000000 +0800 @@ -0,0 +1,78 @@ +/*- + * Copyright (c) 2010, 2010 Zheng Liu <lz@freebsd.org> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: src/sys/fs/ext2fs/ext2_rsv_win.h,v 0.1 2010/05/08 12:41:51 lz Exp $ + */ +#ifndef _FS_EXT2FS_EXT2_RSV_WIN_H_ +#define _FS_EXT2FS_EXT2_RSV_WIN_H_ + +#include <sys/tree.h> + +#define EXT2_RSV_DEFAULT_RESERVE_BLKS 8 +#define EXT2_RSV_MAX_RESERVE_BLKS 1024 +#define EXT2_RSV_NOT_ALLOCATED 0 + +#define EXT2_RSV_LOCK(ip) mtx_lock(&ip->i_rsv_lock) +#define EXT2_RSV_UNLOCK(ip) mtx_unlock(&ip->i_rsv_lock) + +#define EXT2_TREE_LOCK(fs) mtx_lock(&fs->e2fs_rsv_lock); +#define EXT2_TREE_UNLOCK(fs) mtx_unlock(&fs->e2fs_rsv_lock); + +/* + * Reservation window entry + */ +struct ext2_rsv_win { + RB_ENTRY(ext2_rsv_win) rsv_link; /* RB tree links */ + + int32_t rsv_goal_size; /* Default reservation window size */ + int32_t rsv_alloc_hit; /* Number of allocated windows */ + + int32_t rsv_start; /* First bytes of window */ + int32_t rsv_end; /* End bytes of window */ +}; + +RB_HEAD(ext2_rsv_win_tree, ext2_rsv_win); + +static __inline int +ext2_rsv_win_cmp(const struct ext2_rsv_win *a, + const struct ext2_rsv_win *b) +{ + if (a->rsv_start < b->rsv_start) + return (-1); + if (a->rsv_start == b->rsv_start) + return (0); + + return (1); +} +RB_PROTOTYPE(ext2_rsv_win_tree, ext2_rsv_win, rsv_link, ext2_rsv_win_cmp); + +/* predefine */ +struct inode; +/* ext2_alloc.c */ +void ext2_init_rsv(struct inode *ip); +void ext2_discard_rsv(struct inode *ip); +int ext2_alloc_rsv(struct inode *, int32_t, int32_t, int, struct ucred *, int32_t *); + +#endif /* !_FS_EXT2FS_EXT2_RSV_WIN_H_ */ diff -urN /usr/src/sys/fs/ext2fs/ext2_vfsops.c new/ext2_vfsops.c --- /usr/src/sys/fs/ext2fs/ext2_vfsops.c 2010-01-14 22:30:54.000000000 +0800 +++ new/ext2_vfsops.c 2010-08-19 02:47:29.000000000 +0800 @@ -1,4 +1,4 @@ -/*- +/* * modified for EXT2FS support in Lites 1.1 * * Aug 1995, Godmar Back (gback@cs.utah.edu) @@ -61,6 +61,7 @@ #include <fs/ext2fs/fs.h> #include <fs/ext2fs/ext2_extern.h> #include <fs/ext2fs/ext2fs.h> +#include <fs/ext2fs/ext2_rsv_win.h> static int ext2_flushfiles(struct mount *mp, int flags, struct thread *td); static int ext2_mountfs(struct vnode *, struct mount *); @@ -95,9 +96,9 @@ static int compute_sb_data(struct vnode * devvp, struct ext2fs * es, struct m_ext2fs * fs); -static const char *ext2_opts[] = { "from", "export", "acls", "noexec", - "noatime", "union", "suiddir", "multilabel", "nosymfollow", - "noclusterr", "noclusterw", "force", NULL }; +static const char *ext2_opts[] = { "acls", "async", "export", "force", + "from", "multilabel", "noatime", "noclusterr", "noclusterw", + "noexec", "nosymfollow", "suiddir", "union", NULL }; /* * VFS Operations. @@ -581,6 +582,14 @@ if ((error = compute_sb_data(devvp, ump->um_e2fs->e2fs, ump->um_e2fs))) goto out; + /* Initial reservation window index and lock */ + bzero(&ump->um_e2fs->e2fs_rsv_lock, sizeof(struct mtx)); + mtx_init(&ump->um_e2fs->e2fs_rsv_lock, + "rsv tree lock", NULL, MTX_DEF); + ump->um_e2fs->e2fs_rsv_tree = malloc(sizeof(struct ext2_rsv_win_tree), + M_EXT2MNT, M_WAITOK | M_ZERO); + RB_INIT(ump->um_e2fs->e2fs_rsv_tree); + brelse(bp); bp = NULL; fs = ump->um_e2fs; @@ -680,6 +689,8 @@ g_topology_unlock(); PICKUP_GIANT(); vrele(ump->um_devvp); + free(fs->e2fs_rsv_tree, M_EXT2MNT); + mtx_destroy(&fs->e2fs_rsv_lock); free(fs->e2fs_gd, M_EXT2MNT); free(fs->e2fs_contigdirs, M_EXT2MNT); free(fs->e2fs, M_EXT2MNT); @@ -919,6 +930,10 @@ ip->i_prealloc_count = 0; ip->i_prealloc_block = 0; + bzero(&ip->i_rsv_lock, sizeof(struct mtx)); + mtx_init(&ip->i_rsv_lock, "inode rsv lock", NULL, MTX_DEF); + ip->i_rsv = NULL; + /* * Now we want to make sure that block pointers for unused * blocks are zeroed out - ext2_balloc depends on this diff -urN /usr/src/sys/fs/ext2fs/ext2fs.h new/ext2fs.h --- /usr/src/sys/fs/ext2fs/ext2fs.h 2010-01-14 22:30:54.000000000 +0800 +++ new/ext2fs.h 2010-08-19 02:47:29.000000000 +0800 @@ -38,6 +38,7 @@ #define _FS_EXT2FS_EXT2_FS_H #include <sys/types.h> +#include <sys/lock.h> /* * Special inode numbers @@ -174,6 +175,9 @@ char e2fs_wasvalid; /* valid at mount time */ off_t e2fs_maxfilesize; struct ext2_gd *e2fs_gd; /* Group Descriptors */ + + struct mtx e2fs_rsv_lock; /* Protect reservation window RB tree */ + struct ext2_rsv_win_tree *e2fs_rsv_tree; /* Reservation window index */ }; /* diff -urN /usr/src/sys/fs/ext2fs/inode.h new/inode.h --- /usr/src/sys/fs/ext2fs/inode.h 2010-01-14 22:30:54.000000000 +0800 +++ new/inode.h 2010-08-19 02:47:29.000000000 +0800 @@ -100,6 +100,10 @@ int32_t i_gen; /* Generation number. */ u_int32_t i_uid; /* File owner. */ u_int32_t i_gid; /* File group. */ + + /* Fields for reservation window */ + struct mtx i_rsv_lock; /* Protects i_rsv */ + struct ext2_rsv_win *i_rsv; /* Reservation window */ }; /* --Boundary-00=_QxzoMs/Iug8+N80--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201009290917.05269.jhb>