From owner-freebsd-fs@FreeBSD.ORG  Mon May 30 15:07:58 2005
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
X-Original-To: freebsd-fs@freebsd.org
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id AFA8816A41C;
	Mon, 30 May 2005 15:07:58 +0000 (GMT)
	(envelope-from dom@goodforbusiness.co.uk)
Received: from mail.helenmarks.co.uk (mail.helenmarks.co.uk [82.68.196.22])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D193243D4C;
	Mon, 30 May 2005 15:07:57 +0000 (GMT)
	(envelope-from dom@goodforbusiness.co.uk)
Received: from localhost (localhost [127.0.0.1])
	by mail.helenmarks.co.uk (Postfix) with ESMTP id 87BE0222403;
	Mon, 30 May 2005 16:07:52 +0100 (BST)
Received: from mail.helenmarks.co.uk ([127.0.0.1])
	by localhost (mail.helenmarks.co.uk [127.0.0.1]) (amavisd-new,
	port 10024)
	with ESMTP id 56399-07; Mon, 30 May 2005 16:07:47 +0100 (BST)
Received: from egg.helenmarks.co.uk (egg.helenmarks.co.uk [192.168.15.3])
	by mail.helenmarks.co.uk (Postfix) with ESMTP id 7EC90222402;
	Mon, 30 May 2005 16:07:47 +0100 (BST)
From: Dominic Marks <dom@goodforbusiness.co.uk>
Organization: GoodforBusiness.co.uk
To: Bruce Evans <bde@zeta.org.au>
Date: Mon, 30 May 2005 16:09:11 +0100
User-Agent: KMail/1.8
References: <200505271328.58072.dom@goodforbusiness.co.uk>
	<20050530155609.Q1473@epsplex.bde.org>
	<20050530193711.I843@epsplex.bde.org>
In-Reply-To: <20050530193711.I843@epsplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200505301609.11857.dom@goodforbusiness.co.uk>
X-Virus-Scanned: By ClamAV 0.80
Cc: freebsd-fs@freebsd.org, freebsd-gnats-submit@freebsd.org, banhalmi@field.hu
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 30 May 2005 15:07:58 -0000

On Monday 30 May 2005 11:11, Bruce Evans wrote:
> On Mon, 30 May 2005, Bruce Evans wrote:
> > On Sun, 29 May 2005, Dominic Marks wrote:
> >> I have been experimenting in msdosfs_read and I have managed to come up
> >> with
> >> something that works, but I'm sure it is flawed. On large file reads it
> >> will
> >> ...
> >> %%
> >> Index: msdosfs_vnops.c
> >> ===================================================================
> >> RCS file: /usr/cvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
> >> retrieving revision 1.149.2.1
> >> diff -u -r1.149.2.1 msdosfs_vnops.c
> >> --- msdosfs_vnops.c	31 Jan 2005 23:25:56 -0000	1.149.2.1
> >> +++ msdosfs_vnops.c	29 May 2005 14:10:18 -0000
> >> @@ -565,14 +567,21 @@
> >> 			error = bread(pmp->pm_devvp, lbn, blsize, NOCRED,
> >> &bp);
> >> 		} else {
> >> 			blsize = pmp->pm_bpcluster;
> >> -			rablock = lbn + 1;
> >> -			if (seqcount > 1 &&
> >> -			    de_cn2off(pmp, rablock) < dep->de_FileSize) {
> >> -				rasize = pmp->pm_bpcluster;
> >> -				error = breadn(vp, lbn, blsize,
> >> -				    &rablock, &rasize, 1, NOCRED, &bp);
> >> +			/* XXX what is the best value for crsize? */
> >> + 			crsize = blsize * nblks > MAXBSIZE ? MAXBSIZE :
> >> blsize * nblks;
> >> +			if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) {
> >> +				error = cluster_read(vp, dep->de_FileSize,
> >> lbn,
> >> +					crsize, NOCRED, uio->uio_resid,
> >> seqcount, &bp);
> >
> > crsize should be just the block size (cluster size in msdosfs and
> > blsize variable here) according to this code in all other file systems.
> > ...
>
> The main problem is that VOP_BMAP() is not fully implemented for msdosfs.
> msdosfs_bmap() only has a stub which pretends that clustering ins never
> possible:
>
> % /*
> %  * vp  - address of vnode file the file
> %  * bn  - which cluster we are interested in mapping to a filesystem block
> number. %  * vpp - returns the vnode for the block special file holding the
> filesystem %  *	 containing the file of interest
> %  * bnp - address of where to return the filesystem relative block number
> %  */
>
> This comment rotted in 1994 when 4.4BSD packed the args into a struct
> and added the a_runp and a_runb args to support clustering.
>
> % static int
> % msdosfs_bmap(ap)
> % 	struct vop_bmap_args /* {
> % 		struct vnode *a_vp;
> % 		daddr_t a_bn;
> % 		struct vnode **a_vpp;
> % 		daddr_t *a_bnp;
> % 		int *a_runp;
> % 		int *a_runb;
> % 	} */ *ap;
> % {
> % 	struct denode *dep = VTODE(ap->a_vp);
> % 	daddr_t blkno;
> % 	int error;
> %
> % 	if (ap->a_vpp != NULL)
> % 		*ap->a_vpp = dep->de_devvp;
> % 	if (ap->a_bnp == NULL)
> % 		return (0);
> % 	if (ap->a_runp) {
> % 		/*
> % 		 * Sequential clusters should be counted here.
>    		                       ^^^^^^^^^
> % 		 */
> % 		*ap->a_runp = 0;
> % 	}
> % 	if (ap->a_runb) {
> % 		*ap->a_runb = 0;
> % 	}
> % 	error = pcbmap(dep, ap->a_bn, &blkno, 0, 0);
> % 	*ap->a_bnp = blkno;
> % 	return (error);
> % }

If I understand what is supposed to be done here (I looked at cd9660 but
I don't know if the rules are different from msdos), a_runp should be set
to the extent of contiguous blocks from the current position within the
same region? I put some debugging into msdosfs_bmap and here it is copied:

(fsz is dep->de_FileSize)

msdosfs_bmap: fsz  81047  blkno  6374316  lblkno 5
msdosfs_bmap: fsz  81047  blkno  6374324  lblkno 6
msdosfs_bmap: fsz  81047  blkno  6374332  lblkno 7
msdosfs_bmap: fsz  81047  blkno  6374340  lblkno 8
msdosfs_bmap: fsz  81047  blkno  6374348  lblkno 9
msdosfs_bmap: fsz  81047  blkno  6374356  lblkno 10
msdosfs_bmap: fsz  81047  blkno  6374364  lblkno 11
msdosfs_bmap: fsz  81047  blkno  6374372  lblkno 12 # A1
msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 13 # A2
msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 14
msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 15
msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 16
msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 17
msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 18
msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 19

I should compute the position of the boundary illustrated in A1 I should set 
that to the read ahead value, until setting a new value at A2, perhaps this 
should only be done for particularly large files? I will look at the other 
_bmap routines to see what they do.

> Here is a cleaned up version of the patch to add (not actually working)
> clustering to msdosfs_read().
>
> %%%
> Index: msdosfs_vnops.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
> retrieving revision 1.147
> diff -u -2 -r1.147 msdosfs_vnops.c
> --- msdosfs_vnops.c	4 Feb 2004 21:52:53 -0000	1.147
> +++ msdosfs_vnops.c	30 May 2005 08:57:02 -0000
> @@ -541,5 +555,7 @@
>   		if (uio->uio_offset >= dep->de_FileSize)
>   			break;
> +		blsize = pmp->pm_bpcluster;
>   		lbn = de_cluster(pmp, uio->uio_offset);
> +		rablock = lbn + 1;
>   		/*
>   		 * If we are operating on a directory file then be sure to
> @@ -556,15 +573,15 @@
>   				break;
>   			error = bread(pmp->pm_devvp, lbn, blsize, NOCRED, &bp);
> +		} else if (de_cn2off(pmp, rablock) >= dep->de_FileSize) {
> +			error = bread(vp, lbn, blsize, NOCRED, &bp);
> +		} else if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) {
> +			error = cluster_read(vp, dep->de_FileSize, lbn, blsize,
> +			    NOCRED, uio->uio_resid, seqcount, &bp);
> +		} else if (seqcount > 1) {
> +			rasize = blsize;
> +			error = breadn(vp, lbn,
> +			    blsize, &rablock, &rasize, 1, NOCRED, &bp);
>   		} else {
> -			blsize = pmp->pm_bpcluster;
> -			rablock = lbn + 1;
> -			if (seqcount > 1 &&
> -			    de_cn2off(pmp, rablock) < dep->de_FileSize) {
> -				rasize = pmp->pm_bpcluster;
> -				error = breadn(vp, lbn, blsize,
> -				    &rablock, &rasize, 1, NOCRED, &bp);
> -			} else {
> -				error = bread(vp, lbn, blsize, NOCRED, &bp);
> -			}
> +			error = bread(vp, lbn, blsize, NOCRED, &bp);
>   		}
>   		if (error) {
> %%%
>
> I rearranged the code to be almost lexically identical with that in
> ffs_read().

Thanks, I will use this as a basis for any other things I try.

> I only tested this on a relatively fast ATA drive.  It made little
> difference.  Most writes were clustered to give a block size of 64K
> and write speed of over 40+MB/s until the disk is nearly full, but
> reads weren't clustered with or without the patch so the block size
> remained at the fs block size (4K); the drive handles this block size
> mediocrely and gave a read speed of 20+MB/sec.  (The drive is a WDC
> 1200JB-00CRA1.  This drive has the interesting behaviour of giving
> almost the same mediocre read speed for all block sizes between 2.5K
> and 19.5K.  A block size 20K gives maximal speed which is about twice
> as fast as the speed for a block size of 19.5K.)

I am still confused as to how reading blsize * 16 actually improved
the transfer rate after a long period of making it worse. Perhaps it
is related to the buffer resource problem you describe below.

> Both reading and writing a 1GB file to/from msdosfs caused noticable
> buffer resource problems.  Accesses to other file systems on the same
> disk sometimes blocked for many seconds.  I have debugging code in
> getblk().  It reported that a process waited 17 seconds in or near
> getblk().  The process only stopped waiting because I suspended the
> process accessing msdosfs.  This may be a local bug.

I'll look for buffer resource statistics in the system tools and
measure those. There are no obvious signs, to me, that the systems are
in any specific difficulties while running the transfers.

> Bruce

Thanks a lot for the answers and code,
-- 
Dominic
GoodforBusiness.co.uk
I.T. Services for SMEs in the UK.