Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 07 Feb 2008 01:33:13 +0200
From:      Andriy Gapon <avg@icyb.net.ua>
To:        pav@FreeBSD.org, scottl@FreeBSD.org
Cc:        freebsd-fs@FreeBSD.org, freebsd-hackers@FreeBSD.org
Subject:   Re: fs/udf: vm pages "overlap" while reading  large dir [patch]
Message-ID:  <47AA43B9.1040608@icyb.net.ua>
In-Reply-To: <47A9E191.8000607@icyb.net.ua>
References:  <200612221824.kBMIOhfM049471@freefall.freebsd.org>	 <47A2EDB0.8000801@icyb.net.ua> <47A2F404.7010208@icyb.net.ua>	 <47A735A4.3060506@icyb.net.ua> <47A75B47.2040604@elischer.org>	 <1202155663.62432.0.camel@ikaros.oook.cz> <47A8754C.5010607@icyb.net.ua> <1202235368.68281.12.camel@ikaros.oook.cz> <47A9E191.8000607@icyb.net.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------090706050706020808090509
Content-Type: text/plain; charset=ISO-8859-2
Content-Transfer-Encoding: 7bit

on 06/02/2008 18:34 Andriy Gapon said the following:
> 
> Actually the patch is not entirely correct. max_size returned from
> udf_bmap_internal should be used to calculate number of continuous
> sectors for read-ahead (as opposed to file size in the patch).
> 

Attached is an updated patch.

The most prominent changes from the previous version:
1. udf_read can handle files with data embedded into fentry
(sysutils/udfclient can produce such things for small files).
2. above mentioned files can now be mmap-ed, so that you can not only
cat them, but also cp them; btw, I believe that cp(1) should have logic
to try to fallback to read if mmap() fails - remember kern/92040 ? :-)
3. udf_bmap uses max_size[*] returned by udf_bmap_internal to calculate
a number of contiguous blocks for read-ahead; [*] - this is a number of
contiguous bytes from a given offset within a file.

Things that stay the same:
1. I still use bread() via directory vnode; I think it's better even if
reading via device vnode would work correctly.
2. I still try to read as much data as possible in directory bread(), I
think this is a sort of an ad-hoc read-ahead without all the heuristics
and logic that cluster reading does; I think that this should be ok
because we do not have random reading of directory, it's always
sequential - either to read-in the whole directory or linearly search
through it until wanted entry is found.

Detailed description of the patch.
udf_vnops.c:
Hunk1 - this is "just in case" patch, reject files with unsupported
strategy right away instead of deferring failure to udf_bmap_internal.
Hunk2 - fix typo in existing macro, add new macro to check if a file has
data embedded into fentry ("unusual file").
Hunk3 - for an unusual file just uiomove data from fentry.
Hunk4 - cosmetic changes plus a fix for udf_bmap_internal errors not
actually being honored; also added an additional printf, because
udf_strategy should never really be called for unusual files.
Hunk5 - return EOPNOTSUPP for unusual files, this is correct because we
can not correctly bmap them and also this enables correct handling of
this files in vm code (vnode_pager); also code for read-ahead
calculation borrowed from cd9660.
Hunk6- explain function of udf_bmap_internal call in this place.
Hunk7 - some cosmetics; prevent size passed to bread from being greater
than MAXBSIZE; do bread via directory vnode rather than device vnode
(udf_readlblks was a macro-wrapper around bread).

udf_vfsops.c:
Hunk1 - this is borrowed from cd9660, apparently this data is needed for
correct cluster reading.


Couple of words about testing.
udf in 7.0-RC1 plus this patch correctly handled everything I could
throw at it - huge directories, unusual files, reading, mmaping.
Using udfclientfs I wrote a directory on DVD-RAM UDF disk that contained
 2G of files from ports distfiles. The files varied in size from about
100 of bytes to hundreds of megabytes. I watched systat -vmstat while
copying this directory. With unpatched code some files would not be
copied (small "unusual files"), KB/t value stayed at 2K (meaning reads
always were sector sized), MB/s was about 1. With the patch all files
were copied successfully and correctly (md5 verified), KB/t varied from
2K to 64K (apparently depending on size of currently copied file), MB/s
varied from 1 to 4 which is not bad for these DVD-RAM disk and drive.
If somebody asks I can produce a UDF image that would contain various
test cases: large directory, unusual files, etc. Or you can give a try
to udfclient/udfclientfs from sysutils/udfclient port.

I hope this patch will be useful.

-- 
Andriy Gapon

--------------090706050706020808090509
Content-Type: text/x-patch;
 name="udf_latest2.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="udf_latest2.diff"

--- udf_vnops.c.orig	2008-01-29 23:50:49.000000000 +0200
+++ udf_vnops.c	2008-02-07 00:12:34.000000000 +0200
@@ -168,6 +168,16 @@ udf_open(struct vop_open_args *ap) {
 	struct udf_node *np = VTON(ap->a_vp);
 	off_t fsize;
 
+	/*
+	 * We currently do not support any other strategy type, so
+	 * udf_bmap_internal, udf_bmap, udf_strategy would fail.
+	 * I.e. we can not access content of this file anyway.
+	 */
+	if (le16toh(np->fentry->icbtag.strat_type) != 4) {
+		printf("Unsupported strategy type %d\n", le16toh(np->fentry->icbtag.strat_type));
+		return ENODEV;
+	}
+
 	fsize = le64toh(np->fentry->inf_len);
 	vnode_create_vobject(ap->a_vp, fsize, ap->a_td);
 	return 0;
@@ -340,7 +350,9 @@ udf_pathconf(struct vop_pathconf_args *a
 
 #define lblkno(udfmp, loc)	((loc) >> (udfmp)->bshift)
 #define blkoff(udfmp, loc)	((loc) & (udfmp)->bmask)
-#define lblktosize(imp, blk)	((blk) << (udfmp)->bshift)
+#define lblktosize(udfmp, blk)	((blk) << (udfmp)->bshift)
+
+#define HAS_EMBEDDED_DATA(node) ((le16toh((node)->fentry->icbtag.flags) & 0x7) == 3)
 
 static int
 udf_read(struct vop_read_args *ap)
@@ -359,6 +371,26 @@ udf_read(struct vop_read_args *ap)
 		return (0);
 	if (uio->uio_offset < 0)
 		return (EINVAL);
+
+	if (HAS_EMBEDDED_DATA(node)) {
+		struct file_entry *fentry;
+		uint8_t *data;
+
+		fentry = node->fentry;
+		data = &fentry->data[le32toh(fentry->l_ea)];
+		fsize = le32toh(fentry->l_ad);
+
+		n = uio->uio_resid;
+		diff = fsize - uio->uio_offset;
+		if (diff <= 0)
+			return (0);
+		if (diff < n)
+			n = diff;
+
+		error = uiomove(data + uio->uio_offset, (int)n, uio);
+		return (error);
+	}
+
 	fsize = le64toh(node->fentry->inf_len);
 	udfmp = node->udfmp;
 	do {
@@ -799,10 +831,10 @@ udf_strategy(struct vop_strategy_args *a
 	struct buf *bp;
 	struct vnode *vp;
 	struct udf_node *node;
+	struct bufobj *bo;
 	int maxsize;
 	daddr_t sector;
-	struct bufobj *bo;
-	int multiplier;
+	int error;
 
 	bp = a->a_bp;
 	vp = a->a_vp;
@@ -813,24 +845,23 @@ udf_strategy(struct vop_strategy_args *a
 		 * Files that are embedded in the fentry don't translate well
 		 * to a block number.  Reject.
 		 */
-		if (udf_bmap_internal(node, bp->b_lblkno * node->udfmp->bsize,
-		    &sector, &maxsize)) {
+		error = udf_bmap_internal(node, lblktosize(node->udfmp, bp->b_lblkno), &sector, &maxsize);
+		if (error) {
+			if (error == UDF_INVALID_BMAP)
+				printf("udf_strategy called for file with data embedded in fentry\n");
 			clrbuf(bp);
 			bp->b_blkno = -1;
+			bufdone(bp);
+			return (0);
 		}
-
-		/* bmap gives sector numbers, bio works with device blocks */
-		multiplier = node->udfmp->bsize / DEV_BSIZE;
-		bp->b_blkno = sector * multiplier;
-
-	}
-	if ((long)bp->b_blkno == -1) {
-		bufdone(bp);
-		return (0);
+		/* udf_bmap_internal gives sector numbers, bio works with device blocks */
+		bp->b_blkno = sector << (node->udfmp->bshift - DEV_BSHIFT);
 	}
+
 	bo = node->udfmp->im_bo;
 	bp->b_iooffset = dbtob(bp->b_blkno);
 	BO_STRATEGY(bo, bp);
+
 	return (0);
 }
 
@@ -851,17 +882,43 @@ udf_bmap(struct vop_bmap_args *a)
 	if (a->a_runb)
 		*a->a_runb = 0;
 
-	error = udf_bmap_internal(node, a->a_bn * node->udfmp->bsize, &lsector,
-	    &max_size);
+	/*
+	 * UDF_INVALID_BMAP means data embedded into fentry, this is an internal
+	 * error that should not be propagated to calling code.
+	 * Most obvious mapping for this error is EOPNOTSUPP as we can not truly
+	 * translate block numbers in this case.
+	 * Incidentally, this return code will make vnode pager to use VOP_READ
+	 * to get data for mmap-ed pages and udf_read knows how to do the right
+	 * thing for this kind of files.
+	 */
+	error = udf_bmap_internal(node, a->a_bn << node->udfmp->bshift, &lsector, &max_size);
+	if (error == UDF_INVALID_BMAP)
+		return EOPNOTSUPP;
 	if (error)
 		return (error);
 
 	/* Translate logical to physical sector number */
 	*a->a_bnp = lsector << (node->udfmp->bshift - DEV_BSHIFT);
 
-	/* Punt on read-ahead for now */
-	if (a->a_runp)
-		*a->a_runp = 0;
+	/*
+	 * Determine maximum number of readahead blocks following the
+	 * requested block.
+	 */
+	if (a->a_runp) {
+		int nblk;
+
+		nblk = (max_size >> node->udfmp->bshift) - 1;
+		if (nblk <= 0)
+			*a->a_runp = 0;
+		else if (nblk >= (MAXBSIZE >> node->udfmp->bshift))
+			*a->a_runp = (MAXBSIZE >> node->udfmp->bshift) - 1;
+		else
+			*a->a_runp = nblk;
+	}
+
+	if (a->a_runb) {
+		*a->a_runb = 0;
+	}
 
 	return (0);
 }
@@ -1060,7 +1117,10 @@ udf_readatoffset(struct udf_node *node, 
 
 	udfmp = node->udfmp;
 
-	*bp = NULL;
+	/*
+	 * This call is made not only to detect UDF_INVALID_BMAP case,
+	 * max_size is used as an ad-hoc read-ahead hint for "normal" case.
+	 */
 	error = udf_bmap_internal(node, offset, &sector, &max_size);
 	if (error == UDF_INVALID_BMAP) {
 		/*
@@ -1078,9 +1138,10 @@ udf_readatoffset(struct udf_node *node, 
 	/* Adjust the size so that it is within range */
 	if (*size == 0 || *size > max_size)
 		*size = max_size;
-	*size = min(*size, MAXBSIZE);
+	*size = min(*size, MAXBSIZE - blkoff(udfmp, offset));
 
-	if ((error = udf_readlblks(udfmp, sector, *size + (offset & udfmp->bmask), bp))) {
+	*bp = NULL;
+	if ((error = bread(node->i_vnode, lblkno(udfmp, offset), (*size + blkoff(udfmp, offset) + udfmp->bmask) & ~udfmp->bmask, NOCRED, bp))) {
 		printf("warning: udf_readlblks returned error %d\n", error);
 		/* note: *bp may be non-NULL */
 		return (error);
--- udf_vfsops.c.orig	2007-03-13 03:50:24.000000000 +0200
+++ udf_vfsops.c	2008-02-05 01:29:10.000000000 +0200
@@ -330,6 +330,11 @@ udf_mountfs(struct vnode *devvp, struct 
 
 	bo = &devvp->v_bufobj;
 
+	if (devvp->v_rdev->si_iosize_max != 0)
+		mp->mnt_iosize_max = devvp->v_rdev->si_iosize_max;
+	if (mp->mnt_iosize_max > MAXPHYS)
+		mp->mnt_iosize_max = MAXPHYS;
+
 	/* XXX: should be M_WAITOK */
 	MALLOC(udfmp, struct udf_mnt *, sizeof(struct udf_mnt), M_UDFMOUNT,
 	    M_NOWAIT | M_ZERO);

--------------090706050706020808090509--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47AA43B9.1040608>