Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Nov 2003 19:30:37 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Soren Schmidt <sos@spider.deepcore.dk>
Cc:        hackers@freebsd.org
Subject:   Re: Problems with use of M_NOWWAIT in ATA (w/ demonstration patch)
Message-ID:  <200311290330.hAT3UbNL004349@apollo.backplane.com>

next in thread | raw e-mail | index | archive | help
    All right, here's an example patch.  This patch is against the old
    ATA code in DragonFly but most of the issues are the same with the
    ATA code in -STABLE and -CURRENT.  It does not fix all the problems
    (which would be a waste since we are about to import ATAng and do not
    want to apply the same changes twice).  What this patch does is
    demonstrate a solution to some of the issues.

    Basically this patch introduces guarentees by pipelining requests via
    the MPIPE code (see patch sys/mpipe.h and kern_mpipe.c).  

    Take for example the ad_start() code in both FreeBSD-STABLE and
    FreeBSD-CURRENT (that is, in ATAng).

    ad_start() calls ata_alloc_request().  ata_alloc_request() calls
    malloc(... M_NOWAIT), which can fail.  In ATAng if this malloc fails
    the I/O request will fail with an error for absolutely no good reason.

    Remember, high level code *DEPENDS* on being able to issue I/Os, 
    especially in heavily memory loaded situations when the system is 
    undergoing paging.  It is *NOT* acceptable for low level ATA code to
    fail due to not being able to allocate memory for a request structure.

    --------------------------------- example #1 ----------------------------

    In ATAold if ad_start() fails to allocate memory it leaves the request
    queued, which is the correct operation but only avoids a deadlock if there
    is already at least one outstanding request (so ad_start() gets called 
    again when that request completes).

    In ATAng if ad_start() fails to allocate memory it fails the I/O request,
    which is much worse then the deadlock that ATAold had because higher
    layers are not robust enough to retry the request in all situations 
    or make sufficient guarentees that the failed request will not result in,
    for example, filesystem corruption.

    In the MPIPE patch below I use the ATAold ad_start() model in that I
    leave the request queued, but I use MPIPE to allocate the request which
    *GUARENTEES* that there will be at least 4 I/O's in progress on this
    particular channel, which means that ad_start() will be called again
    when one of those I/O's completes which guarentees that (A) no deadlock
    will occur and (B) that the request will be processed later on as the
    I/O queue clears.

    --------------------------------- example #2 ----------------------------

    DMA code.  There are two issues.  The first issue is only in ATAold,
    the second issue is only in ATAng.

    In ATAold if a DMA buffer could not be allocated the ATA driver silently
    fell back into PIO mode, which is bad.  This is removed in the patch...
    it is not necessary to fall back to PIO mode if a memory request fails.

    In ATAng the bus dma code is used and M_NOWAIT is not used.  However,
    this means that M_WAITOK is used which can deadlock the system when 
    severe memory shortages exist because, again, the VM paging code assumse
    that the I/O's it issues at the device level will complete at some
    point.  In otherwords, you cannot safely hang waiting for a DMA 
    allocation if there isn't already at least one I/O request in progress.

    --------------------------------- example #3 ----------------------------

    Not addressed by this patch.

    While purusing the code I noticed a huge number of cases where
    malloc(... M_NOWAIT) is called and the result is either assumed to be
    non-NULL or the error, if any, is completely ignored.  I checked 
    the code in FREEBSD-CURRENT to make sure that these cases were still
    an issue.  Just search all the instances of M_NOWAIT and check 
    (A) whether the return value is checked for NULL and (B) whether the
    caller of the parent procedure checks the error code.

    For example, ata_set_name() fails silently if the allocation fails.
    Various initialization and ioctl code, while it may not crash, and
    even if it does return an error code, also does not perform its function
    if an allocation fails and there are instances where calls are made
    to these functions which assume success.

    For example, if ata_dmainit() fails ch->dma is silently left NULL.

    The RAID code seems especially vulnerable to allocation failures,
    there are instances where buffers are allocated (for example, line
    714 of ata-raid.c in CURRENT) and the result is assumed assuming
    a non-NULL result.  These malloc's are being called M_NOWAIT, which
    means that NULL can be returned.

    In anycase, David Rhodus has been testing the ATAng from -STABLE and we
    will be importing that into DragonFly in the next week or so, but we
    will have to make the same MPIPE changes to that code that I am making
    to the original code.  In fact, the changes will wind up being quite
    a bit more extensive then the hack I put together in the patch set below
    in order to cover the ATAPI code as well as the unconditional struct buf
    allocation code.

    Once we get the new ATAng code into DragonFly and make the appropriate
    changes to deal with the malloc() issues, I will post the patch for
    FreeBSD to take or ignore as it sees fit.

						-Matt

Index: conf/files
===================================================================
RCS file: /cvs/src/sys/conf/files,v
retrieving revision 1.30
diff -u -r1.30 files
--- conf/files	21 Nov 2003 05:29:05 -0000	1.30
+++ conf/files	28 Nov 2003 22:49:30 -0000
@@ -627,6 +627,7 @@
 kern/kern_random.c	standard
 kern/kern_resource.c	standard
 kern/kern_slaballoc.c	standard
+kern/kern_mpipe.c	standard
 kern/kern_shutdown.c	standard
 kern/kern_sig.c		standard
 kern/kern_upcall.c	standard
Index: dev/disk/ata/ata-all.c
===================================================================
RCS file: /cvs/src/sys/dev/disk/ata/ata-all.c,v
retrieving revision 1.8
diff -u -r1.8 ata-all.c
--- dev/disk/ata/ata-all.c	9 Nov 2003 02:22:34 -0000	1.8
+++ dev/disk/ata/ata-all.c	29 Nov 2003 00:19:46 -0000
@@ -61,6 +61,11 @@
 #include "ata-raid.h"
 #include "atapi-all.h"
 
+union ata_request {
+	struct ad_request	ad;
+	struct atapi_request	atapi;
+};
+
 /* device structures */
 static	d_ioctl_t	ataioctl;
 static struct cdevsw ata_cdevsw = {  
@@ -97,6 +102,12 @@
 /* sysctl vars */
 SYSCTL_NODE(_hw, OID_AUTO, ata, CTLFLAG_RD, 0, "ATA driver parameters");
 
+int ata_mpipe_size = 4;
+TUNABLE_INT("hw.ata.mpipe_size", &ata_mpipe_size);
+SYSCTL_INT(_hw_ata, OID_AUTO, mpipe_size, CTLFLAG_RW, &ata_mpipe_size, 0,
+           "ATA global I/O pipeline max size");
+
+
 /* global vars */
 devclass_t ata_devclass;
 
@@ -155,6 +166,10 @@
     ch->device[SLAVE].mode = ATA_PIO;
     TAILQ_INIT(&ch->ata_queue);
     TAILQ_INIT(&ch->atapi_queue);
+
+    mpipe_init(&ch->req_mpipe, M_ATA, sizeof(union ata_request), 4, ata_mpipe_size);
+    mpipe_init(&ch->dma_mpipe, M_DEVBUF, PAGE_SIZE, 4, ata_mpipe_size);
+
     return 0;
     
 failure:
@@ -286,6 +301,9 @@
     ch->r_altio = NULL;
     ch->r_bmio = NULL;
     ch->r_irq = NULL;
+    mpipe_done(&ch->req_mpipe);
+    mpipe_done(&ch->dma_mpipe);
+
     ATA_UNLOCK_CH(ch);
     return 0;
 }
@@ -435,7 +453,7 @@
 				 ATA_ATAPI_MASTER : ATA_ATAPI_SLAVE)))
 		return ENODEV;
 
-	    if (!(buf = malloc(iocmd->u.atapi.count, M_ATA, M_NOWAIT)))
+	    if (!(buf = malloc(iocmd->u.atapi.count, M_ATA, M_WAITOK)))
 		return ENOMEM;
 
 	    if (iocmd->u.atapi.flags & ATAPI_CMD_WRITE) {
@@ -473,7 +491,7 @@
     struct ata_params *ata_parm;
     int retry = 0;
 
-    if (!(ata_parm = malloc(sizeof(struct ata_params), M_ATA, M_NOWAIT))) {
+    if (!(ata_parm = malloc(sizeof(struct ata_params), M_ATA, M_WAITOK))) {
 	ata_prtdev(atadev, "malloc for identify data failed\n");
 	return -1;
     }
@@ -1400,7 +1418,7 @@
 void
 ata_set_name(struct ata_device *atadev, char *name, int lun)
 {
-    atadev->name = malloc(strlen(name) + 4, M_ATA, M_NOWAIT);
+    atadev->name = malloc(strlen(name) + 4, M_ATA, M_WAITOK);
     if (atadev->name)
 	sprintf(atadev->name, "%s%d", name, lun);
 }
@@ -1559,7 +1577,7 @@
     /* register boot attach to be run when interrupts are enabled */
     if (!(ata_delayed_attach = (struct intr_config_hook *)
 			       malloc(sizeof(struct intr_config_hook),
-				      M_TEMP, M_NOWAIT | M_ZERO))) {
+				      M_TEMP, M_WAITOK | M_ZERO))) {
 	printf("ata: malloc of delayed attach hook failed\n");
 	return;
     }
Index: dev/disk/ata/ata-all.h
===================================================================
RCS file: /cvs/src/sys/dev/disk/ata/ata-all.h,v
retrieving revision 1.3
diff -u -r1.3 ata-all.h
--- dev/disk/ata/ata-all.h	19 Jul 2003 21:14:18 -0000	1.3
+++ dev/disk/ata/ata-all.h	29 Nov 2003 00:13:36 -0000
@@ -29,6 +29,10 @@
  * $DragonFly: src/sys/dev/disk/ata/ata-all.h,v 1.3 2003/07/19 21:14:18 dillon Exp $
  */
 
+#ifndef _SYS_MPIPE_H_
+#include <sys/mpipe.h>
+#endif
+
 /* ATA register defines */
 #define ATA_DATA			0x00	/* data register */
 #define ATA_ERROR			0x01	/* (R) error register */
@@ -225,6 +229,8 @@
     TAILQ_HEAD(, ad_request)	ata_queue;	/* head of ATA queue */
     TAILQ_HEAD(, atapi_request) atapi_queue;	/* head of ATAPI queue */
     void			*running;	/* currently running request */
+    struct malloc_pipe		req_mpipe;	/* request allocations */
+    struct malloc_pipe		dma_mpipe;	/* dma allocations */
 };
 
 /* disk bay/enclosure related */
@@ -236,6 +242,7 @@
 
 /* externs */
 extern devclass_t ata_devclass;
+extern int	ata_mpipe_size;
  
 /* public prototypes */
 int ata_probe(device_t);
@@ -263,7 +270,8 @@
 int ata_umode(struct ata_params *);
 int ata_find_dev(device_t, u_int32_t, u_int32_t);
 
-void *ata_dmaalloc(struct ata_channel *, int);
+void *ata_dmaalloc(struct ata_channel *, int, int);
+void ata_dmafree(struct ata_channel *, void *buf);
 void ata_dmainit(struct ata_channel *, int, int, int, int);
 int ata_dmasetup(struct ata_channel *, int, struct ata_dmaentry *, caddr_t, int);
 void ata_dmastart(struct ata_channel *, int, struct ata_dmaentry *, int);
Index: dev/disk/ata/ata-disk.c
===================================================================
RCS file: /cvs/src/sys/dev/disk/ata/ata-disk.c,v
retrieving revision 1.7
diff -u -r1.7 ata-disk.c
--- dev/disk/ata/ata-disk.c	7 Aug 2003 21:16:51 -0000	1.7
+++ dev/disk/ata/ata-disk.c	29 Nov 2003 00:17:28 -0000
@@ -115,10 +115,13 @@
     struct ad_softc *adp;
     dev_t dev;
 
-    if (!(adp = malloc(sizeof(struct ad_softc), M_AD, M_NOWAIT | M_ZERO))) {
+    if (!(adp = malloc(sizeof(struct ad_softc), M_AD, M_WAITOK | M_ZERO))) {
 	ata_prtdev(atadev, "failed to allocate driver storage\n");
 	return;
     }
+
+    KKASSERT(atadev->channel->req_mpipe.max_count != 0);
+
     adp->device = atadev;
 #ifdef ATA_STATIC_ID
     adp->lun = (device_get_unit(atadev->channel->dev)<<1)+ATA_DEV(atadev->unit);
@@ -397,8 +400,14 @@
 	    return;
     }
 
-    if (!(request = malloc(sizeof(struct ad_request), M_AD, M_NOWAIT|M_ZERO))) {
-	ata_prtdev(atadev, "out of memory in start\n");
+    /*
+     * Allocate a request.  The allocation can only fail if the pipeline
+     * is full, in which case the request will be picked up later when
+     * ad_start() is called after another request completes.
+     */
+    request = mpipe_alloc(&atadev->channel->req_mpipe, M_NOWAIT|M_ZERO);
+    if (request == NULL) {
+	ata_prtdev(atadev, "pipeline full allocating request in ad_start\n");
 	return;
     }
 
@@ -412,8 +421,13 @@
     if (bp->b_flags & B_READ) 
 	request->flags |= ADR_F_READ;
     if (adp->device->mode >= ATA_DMA) {
-	if (!(request->dmatab = ata_dmaalloc(atadev->channel, atadev->unit)))
-	    adp->device->mode = ATA_PIO;
+	request->dmatab = ata_dmaalloc(atadev->channel, atadev->unit, M_NOWAIT);
+	if (request->dmatab == NULL) {
+	    mpipe_free(&atadev->channel->req_mpipe, request);
+	    ata_prtdev(atadev, "pipeline full allocated dmabuf in ad_start\n");
+	    /* do not revert to PIO, wait for ad_start after I/O completion */
+	    return;
+	}
     }
 
     /* insert in tag array */
@@ -804,9 +818,9 @@
     int s = splbio();
 
     if (request->dmatab)
-	free(request->dmatab, M_DEVBUF);
+	ata_dmafree(request->softc->device->channel, request->dmatab);
     request->softc->tags[request->tag] = NULL;
-    free(request, M_AD);
+    mpipe_free(&request->softc->device->channel->req_mpipe, request);
     splx(s);
 }
 
Index: dev/disk/ata/ata-dma.c
===================================================================
RCS file: /cvs/src/sys/dev/disk/ata/ata-dma.c,v
retrieving revision 1.5
diff -u -r1.5 ata-dma.c
--- dev/disk/ata/ata-dma.c	26 Nov 2003 14:24:46 -0000	1.5
+++ dev/disk/ata/ata-dma.c	29 Nov 2003 00:13:01 -0000
@@ -34,6 +34,7 @@
 #include <sys/ata.h>
 #include <sys/buf.h>
 #include <sys/malloc.h> 
+#include <sys/mpipe.h> 
 #include <sys/bus.h>
 #include <sys/disk.h>
 #include <sys/devicestat.h>
@@ -60,19 +61,21 @@
 	 (device == ATA_SLAVE && ch->devices & ATA_ATAPI_SLAVE))
 
 void *
-ata_dmaalloc(struct ata_channel *ch, int device)
+ata_dmaalloc(struct ata_channel *ch, int device, int flags)
 {
     void *dmatab;
 
-    if ((dmatab = malloc(PAGE_SIZE, M_DEVBUF, M_NOWAIT))) {
-	if (((uintptr_t)dmatab >> PAGE_SHIFT) ^
-	    (((uintptr_t)dmatab + PAGE_SIZE - 1) >> PAGE_SHIFT)) {
-	    ata_printf(ch, device, "dmatab crosses page boundary, no DMA\n");
-	    free(dmatab, M_DEVBUF);
-	    dmatab = NULL;
-	}
-    }
-    return dmatab;
+    KKASSERT(ch->dma_mpipe.max_count != 0);
+    dmatab = mpipe_alloc(&ch->dma_mpipe, flags);
+    KKASSERT(((uintptr_t)dmatab & PAGE_MASK) == 0);
+    return (dmatab);
+}
+
+void
+ata_dmafree(struct ata_channel *ch, void *dmatab)
+{
+    if (dmatab)
+	mpipe_free(&ch->dma_mpipe, dmatab);
 }
 
 void
Index: dev/disk/ata/ata-isa.c
===================================================================
RCS file: /cvs/src/sys/dev/disk/ata/ata-isa.c,v
retrieving revision 1.3
diff -u -r1.3 ata-isa.c
--- dev/disk/ata/ata-isa.c	7 Aug 2003 21:16:51 -0000	1.3
+++ dev/disk/ata/ata-isa.c	29 Nov 2003 00:13:48 -0000
@@ -109,9 +109,14 @@
 #include "use_pci.h"
 #if NPCI == 0
 void *
-ata_dmaalloc(struct ata_channel *ch, int device)
+ata_dmaalloc(struct ata_channel *ch, int device, int flags)
 {
     return 0;
+}
+
+void
+ata_dmafree(struct ata_channel *ch, void *dmatab)
+{
 }
 
 void
Index: dev/disk/ata/ata-raid.c
===================================================================
RCS file: /cvs/src/sys/dev/disk/ata/ata-raid.c,v
retrieving revision 1.8
diff -u -r1.8 ata-raid.c
--- dev/disk/ata/ata-raid.c	7 Aug 2003 21:16:51 -0000	1.8
+++ dev/disk/ata/ata-raid.c	28 Nov 2003 23:36:19 -0000
@@ -116,7 +116,7 @@
 
     if (!ar_table)
 	ar_table = malloc(sizeof(struct ar_soft *) * MAX_ARRAYS,
-			  M_AR, M_NOWAIT | M_ZERO);
+			  M_AR, M_WAITOK | M_ZERO);
     if (!ar_table) {
 	ata_prtdev(adp->device, "no memory for ATA raid array\n");
 	return 0;
@@ -250,7 +250,7 @@
 
     if (!ar_table)
 	ar_table = malloc(sizeof(struct ar_soft *) * MAX_ARRAYS,
-			  M_AR, M_NOWAIT | M_ZERO);
+			  M_AR, M_WAITOK | M_ZERO);
     if (!ar_table) {
 	printf("ar: no memory for ATA raid array\n");
 	return 0;
@@ -263,7 +263,7 @@
 	return ENOSPC;
 
     if (!(rdp = (struct ar_softc*)malloc(sizeof(struct ar_softc), M_AR,
-					 M_NOWAIT | M_ZERO))) {
+					 M_WAITOK | M_ZERO))) {
 	printf("ar%d: failed to allocate raid config storage\n", array);
 	return ENOMEM;
     }
@@ -550,7 +550,7 @@
 	    return;
 	}
 
-	buf1 = malloc(sizeof(struct ar_buf), M_AR, M_NOWAIT | M_ZERO);
+	buf1 = malloc(sizeof(struct ar_buf), M_AR, M_WAITOK | M_ZERO);
 	BUF_LOCKINIT(&buf1->bp);
 	BUF_LOCK(&buf1->bp, LK_EXCLUSIVE);
 	buf1->bp.b_pblkno = lba;
@@ -631,7 +631,7 @@
 			((rdp->flags & AR_F_REBUILDING) &&
 			 (rdp->disks[buf1->drive].flags & AR_DF_SPARE) &&
 			 buf1->bp.b_pblkno < rdp->lock_start)) {
-			buf2 = malloc(sizeof(struct ar_buf), M_AR, M_NOWAIT);
+			buf2 = malloc(sizeof(struct ar_buf), M_AR, M_WAITOK);
 			bcopy(buf1, buf2, sizeof(struct ar_buf));
 			BUF_LOCKINIT(&buf2->bp);
 			BUF_LOCK(&buf2->bp, LK_EXCLUSIVE);
@@ -827,7 +827,7 @@
     rdp->lock_end = rdp->lock_start + 256;
     rdp->flags |= AR_F_REBUILDING;
     splx(s);
-    buffer = malloc(256 * DEV_BSIZE, M_AR, M_NOWAIT | M_ZERO);
+    buffer = malloc(256 * DEV_BSIZE, M_AR, M_WAITOK | M_ZERO);
 
     /* now go copy entire disk(s) */
     while (rdp->lock_end < (rdp->total_sectors / rdp->width)) {
@@ -896,7 +896,7 @@
     int array, disk_number = 0, retval = 0;
 
     if (!(info = (struct highpoint_raid_conf *)
-	  malloc(sizeof(struct highpoint_raid_conf), M_AR, M_NOWAIT | M_ZERO)))
+	  malloc(sizeof(struct highpoint_raid_conf), M_AR, M_WAITOK | M_ZERO)))
 	return retval;
 
     if (ar_rw(adp, HPT_LBA, sizeof(struct highpoint_raid_conf),
@@ -925,7 +925,7 @@
 	if (!raidp[array]) {
 	    raidp[array] = 
 		(struct ar_softc*)malloc(sizeof(struct ar_softc), M_AR,
-					 M_NOWAIT | M_ZERO);
+					 M_WAITOK | M_ZERO);
 	    if (!raidp[array]) {
 		printf("ar%d: failed to allocate raid config storage\n", array);
 		goto highpoint_out;
@@ -1045,7 +1045,7 @@
     for (disk = 0; disk < rdp->total_disks; disk++) {
 	if (!(config = (struct highpoint_raid_conf *)
 	      malloc(sizeof(struct highpoint_raid_conf),
-		     M_AR, M_NOWAIT | M_ZERO))) {
+		     M_AR, M_WAITOK | M_ZERO))) {
 	    printf("ar%d: Highpoint write conf failed\n", rdp->lun);
 	    return -1;
 	}
@@ -1125,7 +1125,7 @@
     int array, count, disk, disksum = 0, retval = 0; 
 
     if (!(info = (struct promise_raid_conf *)
-	  malloc(sizeof(struct promise_raid_conf), M_AR, M_NOWAIT | M_ZERO)))
+	  malloc(sizeof(struct promise_raid_conf), M_AR, M_WAITOK | M_ZERO)))
 	return retval;
 
     if (ar_rw(adp, PR_LBA(adp), sizeof(struct promise_raid_conf),
@@ -1171,7 +1171,7 @@
 	if (!raidp[array]) {
 	    raidp[array] = 
 		(struct ar_softc*)malloc(sizeof(struct ar_softc), M_AR,
-					 M_NOWAIT | M_ZERO);
+					 M_WAITOK | M_ZERO);
 	    if (!raidp[array]) {
 		printf("ar%d: failed to allocate raid config storage\n", array);
 		goto promise_out;
@@ -1286,7 +1286,7 @@
 
     for (disk = 0; disk < rdp->total_disks; disk++) {
 	if (!(config = (struct promise_raid_conf *)
-	      malloc(sizeof(struct promise_raid_conf), M_AR, M_NOWAIT))) {
+	      malloc(sizeof(struct promise_raid_conf), M_AR, M_WAITOK))) {
 	    printf("ar%d: %s write conf failed\n",
 		   rdp->lun, local ? "FreeBSD" : "Promise");
 	    return -1;
@@ -1411,7 +1411,7 @@
     struct buf *bp;
     int retry = 0, error = 0;
 
-    if (!(bp = (struct buf *)malloc(sizeof(struct buf), M_AR, M_NOWAIT|M_ZERO)))
+    if (!(bp = (struct buf *)malloc(sizeof(struct buf), M_AR, M_WAITOK|M_ZERO)))
 	return ENOMEM;
     BUF_LOCKINIT(bp);
     BUF_LOCK(bp, LK_EXCLUSIVE);
Index: dev/disk/ata/atapi-all.c
===================================================================
RCS file: /cvs/src/sys/dev/disk/ata/atapi-all.c,v
retrieving revision 1.4
diff -u -r1.4 atapi-all.c
--- dev/disk/ata/atapi-all.c	7 Aug 2003 21:16:51 -0000	1.4
+++ dev/disk/ata/atapi-all.c	29 Nov 2003 00:15:17 -0000
@@ -94,7 +94,7 @@
     ATA_UNLOCK_CH(atadev->channel);
 
     if (!(atadev->result = malloc(sizeof(struct atapi_reqsense), M_ATAPI,
-				  M_NOWAIT | M_ZERO)))
+				  M_WAITOK | M_ZERO)))
 	ata_prtdev(atadev, "no memory for sense data\n");
 
     switch (atadev->param->type) {
@@ -162,7 +162,7 @@
 	    biodone(bp);
 	}
 	if (request->dmatab)
-	    free(request->dmatab, M_DEVBUF);
+	    ata_dmafree(atadev->channel, request->dmatab);
 	free(request, M_ATAPI);
     }
     free(atadev->result, M_ATAPI);
@@ -178,10 +178,12 @@
 {
     struct atapi_request *request;
     int error, s;
- 
-    if (!(request = malloc(sizeof(struct atapi_request), M_ATAPI,
-			   M_NOWAIT | M_ZERO)))
-	return ENOMEM;
+
+    request = malloc(sizeof(struct atapi_request), M_ATAPI, M_NOWAIT|M_ZERO);
+    if (request == NULL) {
+	printf("WARNNIG: atapi_queue_cmd: malloc() would block\n");
+	request = malloc(sizeof(struct atapi_request), M_ATAPI, M_WAITOK|M_ZERO);
+    }
 
     request->device = atadev;
     request->data = data;
@@ -196,8 +198,12 @@
 	request->driver = driver;
     }
     if (atadev->mode >= ATA_DMA) {
-	if (!(request->dmatab = ata_dmaalloc(atadev->channel, atadev->unit)))
-	    atadev->mode = ATA_PIO;
+	request->dmatab = ata_dmaalloc(atadev->channel, atadev->unit, M_NOWAIT);
+	if (request->dmatab == NULL) {
+	    printf("WARNING: atapi_queue_cmd: ata_dmaalloc() would block\n");
+	    request->dmatab = ata_dmaalloc(atadev->channel,
+					atadev->unit, M_WAITOK);
+	}
     }
 
 #ifdef ATAPI_DEBUG
@@ -226,7 +232,7 @@
     if (error)
 	 bcopy(&request->sense, atadev->result, sizeof(struct atapi_reqsense));
     if (request->dmatab)
-	free(request->dmatab, M_DEVBUF);
+	ata_dmafree(atadev->channel, request->dmatab);
     free(request, M_ATAPI);
     return error;
 }
@@ -606,7 +612,7 @@
     if (request->callback) {
 	if (!((request->callback)(request))) {
 	    if (request->dmatab)
-		free(request->dmatab, M_DEVBUF);
+		ata_dmafree(request->device->channel, request->dmatab);
 	    free(request, M_ATAPI);
 	}
     }
Index: dev/disk/ata/atapi-cam.c
===================================================================
RCS file: /cvs/src/sys/dev/disk/ata/atapi-cam.c,v
retrieving revision 1.3
diff -u -r1.3 atapi-cam.c
--- dev/disk/ata/atapi-cam.c	7 Aug 2003 21:16:51 -0000	1.3
+++ dev/disk/ata/atapi-cam.c	28 Nov 2003 23:37:24 -0000
@@ -119,7 +119,7 @@
     }
 
     if ((scp = malloc(sizeof(struct atapi_xpt_softc),
-		      M_ATACAM, M_NOWAIT | M_ZERO)) == NULL)
+		      M_ATACAM, M_WAITOK | M_ZERO)) == NULL)
 	goto error;
 
     scp->ata_ch = ata_ch;
@@ -481,7 +481,7 @@
 	if ((ccb_h->flags & CAM_DIR_MASK) == CAM_DIR_IN && (len & 1)) {
 	    /* ATA always transfers an even number of bytes */
 	    if (!(buf = hcb->dxfer_alloc = malloc(++len, M_ATACAM,
-						  M_NOWAIT | M_ZERO)))
+						  M_WAITOK | M_ZERO)))
 		goto action_oom;
 	}
 	s = splbio();
@@ -652,7 +652,7 @@
 allocate_hcb(struct atapi_xpt_softc *softc, int unit, int bus, union ccb *ccb)
 {
     struct atapi_hcb *hcb = (struct atapi_hcb *)
-    malloc(sizeof(struct atapi_hcb), M_ATACAM, M_NOWAIT | M_ZERO);
+    malloc(sizeof(struct atapi_hcb), M_ATACAM, M_WAITOK | M_ZERO);
 
     if (hcb != NULL) {
 	hcb->softc = softc;
Index: dev/disk/ata/atapi-cd.c
===================================================================
RCS file: /cvs/src/sys/dev/disk/ata/atapi-cd.c,v
retrieving revision 1.8
diff -u -r1.8 atapi-cd.c
--- dev/disk/ata/atapi-cd.c	7 Aug 2003 21:16:51 -0000	1.8
+++ dev/disk/ata/atapi-cd.c	28 Nov 2003 23:38:17 -0000
@@ -132,7 +132,7 @@
 			   sizeof(struct changer)>>8, sizeof(struct changer),
 			   0, 0, 0, 0, 0, 0 };
 
-	chp = malloc(sizeof(struct changer), M_ACD, M_NOWAIT | M_ZERO);
+	chp = malloc(sizeof(struct changer), M_ACD, M_WAITOK | M_ZERO);
 	if (chp == NULL) {
 	    ata_prtdev(atadev, "out of memory\n");
 	    free(cdp, M_ACD);
@@ -148,7 +148,7 @@
 
 	    chp->table_length = htons(chp->table_length);
 	    if (!(cdparr = malloc(sizeof(struct acd_softc) * chp->slots,
-				  M_ACD, M_NOWAIT))) {
+				  M_ACD, M_WAITOK))) {
 		ata_prtdev(atadev, "out of memory\n");
 		free(chp, M_ACD);
 		free(cdp, M_ACD);
@@ -172,7 +172,7 @@
 				  DEVSTAT_TYPE_CDROM | DEVSTAT_TYPE_IF_IDE,
 				  DEVSTAT_PRIORITY_CD);
 	    }
-	    name = malloc(strlen(atadev->name) + 2, M_ACD, M_NOWAIT);
+	    name = malloc(strlen(atadev->name) + 2, M_ACD, M_WAITOK);
 	    strcpy(name, atadev->name);
 	    strcat(name, "-");
 	    ata_free_name(atadev);
@@ -248,7 +248,7 @@
 {
     struct acd_softc *cdp;
 
-    if (!(cdp = malloc(sizeof(struct acd_softc), M_ACD, M_NOWAIT | M_ZERO)))
+    if (!(cdp = malloc(sizeof(struct acd_softc), M_ACD, M_WAITOK | M_ZERO)))
 	return NULL;
     TAILQ_INIT(&cdp->dev_list);
     bufq_init(&cdp->queue);
@@ -258,7 +258,7 @@
     cdp->slot = -1;
     cdp->changer_info = NULL;
     if (!(cdp->stats = malloc(sizeof(struct devstat), M_ACD,
-			      M_NOWAIT | M_ZERO))) {
+			      M_WAITOK | M_ZERO))) {
 	free(cdp, M_ACD);
 	return NULL;
     }
@@ -673,7 +673,7 @@
 	    if (te->address_format == CD_MSF_FORMAT) {
 		struct cd_toc_entry *entry;
 
-		toc = malloc(sizeof(struct toc), M_ACD, M_NOWAIT | M_ZERO);
+		toc = malloc(sizeof(struct toc), M_ACD, M_WAITOK | M_ZERO);
 		bcopy(&cdp->toc, toc, sizeof(struct toc));
 		entry = toc->tab + (toc->hdr.ending_track + 1 -
 			toc->hdr.starting_track) + 1;
@@ -718,7 +718,7 @@
 	    if (te->address_format == CD_MSF_FORMAT) {
 		struct cd_toc_entry *entry;
 
-		toc = malloc(sizeof(struct toc), M_ACD, M_NOWAIT | M_ZERO);
+		toc = malloc(sizeof(struct toc), M_ACD, M_WAITOK | M_ZERO);
 		bcopy(&cdp->toc, toc, sizeof(struct toc));
 
 		entry = toc->tab + (track - toc->hdr.starting_track);
@@ -858,7 +858,7 @@
 #ifndef CD_BUFFER_BLOCKS
 #define CD_BUFFER_BLOCKS 13
 #endif
-	    if (!(buffer = malloc(CD_BUFFER_BLOCKS * 2352, M_ACD, M_NOWAIT))){
+	    if (!(buffer = malloc(CD_BUFFER_BLOCKS * 2352, M_ACD, M_WAITOK))){
 		error = ENOMEM;
 		break;
 	    }
@@ -1340,7 +1340,7 @@
 	char name[16];
 
 	sprintf(name, "acd%dt%d", cdp->lun, track);
-	entry = malloc(sizeof(struct acd_devlist), M_ACD, M_NOWAIT | M_ZERO);
+	entry = malloc(sizeof(struct acd_devlist), M_ACD, M_WAITOK | M_ZERO);
 	entry->dev = make_dev(&acd_cdevsw, (cdp->lun << 3) | (track << 16),
 			      0, 0, 0644, name, NULL);
 	entry->dev->si_drv1 = cdp->dev->si_drv1;
@@ -1649,7 +1649,7 @@
     if ((error = acd_mode_select(cdp, (caddr_t)&param, param.page_length + 10)))
 	return error;
 
-    buffer = malloc(cuesheet->len, M_ACD, M_NOWAIT);
+    buffer = malloc(cuesheet->len, M_ACD, M_WAITOK);
     if (!buffer)
 	return ENOMEM;
     if ((error = copyin(cuesheet->entries, buffer, cuesheet->len)))
@@ -1711,7 +1711,7 @@
     ccb[9] = length & 0xff;
     ccb[10] = (ai->agid << 6) | ai->format;
 
-    d = malloc(length, M_ACD, M_NOWAIT | M_ZERO);
+    d = malloc(length, M_ACD, M_WAITOK | M_ZERO);
     d->length = htons(length - 2);
 
     error = atapi_queue_cmd(cdp->device, ccb, (caddr_t)d, length,
@@ -1775,19 +1775,19 @@
     switch (ai->format) {
     case DVD_SEND_CHALLENGE:
 	length = 16;
-	d = malloc(length, M_ACD, M_NOWAIT | M_ZERO);
+	d = malloc(length, M_ACD, M_WAITOK | M_ZERO);
 	bcopy(ai->keychal, &d->data[0], 10);
 	break;
 
     case DVD_SEND_KEY2:
 	length = 12;
-	d = malloc(length, M_ACD, M_NOWAIT | M_ZERO);
+	d = malloc(length, M_ACD, M_WAITOK | M_ZERO);
 	bcopy(&ai->keychal[0], &d->data[0], 5);
 	break;
     
     case DVD_SEND_RPC:
 	length = 8;
-	d = malloc(length, M_ACD, M_NOWAIT | M_ZERO);
+	d = malloc(length, M_ACD, M_WAITOK | M_ZERO);
 	d->data[0] = ai->region;
 	break;
 
@@ -1850,7 +1850,7 @@
 	return EINVAL;
     }
 
-    d = malloc(length, M_ACD, M_NOWAIT | M_ZERO);
+    d = malloc(length, M_ACD, M_WAITOK | M_ZERO);
     d->length = htons(length - 2);
 	
     bzero(ccb, sizeof(ccb));
Index: dev/disk/ata/atapi-fd.c
===================================================================
RCS file: /cvs/src/sys/dev/disk/ata/atapi-fd.c,v
retrieving revision 1.8
diff -u -r1.8 atapi-fd.c
--- dev/disk/ata/atapi-fd.c	15 Nov 2003 21:05:41 -0000	1.8
+++ dev/disk/ata/atapi-fd.c	28 Nov 2003 23:38:22 -0000
@@ -89,7 +89,7 @@
     struct afd_softc *fdp;
     dev_t dev;
 
-    fdp = malloc(sizeof(struct afd_softc), M_AFD, M_NOWAIT | M_ZERO);
+    fdp = malloc(sizeof(struct afd_softc), M_AFD, M_WAITOK | M_ZERO);
     if (!fdp) {
 	ata_prtdev(atadev, "out of memory\n");
 	return 0;
Index: dev/disk/ata/atapi-tape.c
===================================================================
RCS file: /cvs/src/sys/dev/disk/ata/atapi-tape.c,v
retrieving revision 1.7
diff -u -r1.7 atapi-tape.c
--- dev/disk/ata/atapi-tape.c	7 Aug 2003 21:16:51 -0000	1.7
+++ dev/disk/ata/atapi-tape.c	28 Nov 2003 23:38:25 -0000
@@ -99,7 +99,7 @@
     struct ast_readposition position;
     dev_t dev;
 
-    stp = malloc(sizeof(struct ast_softc), M_AST, M_NOWAIT | M_ZERO);
+    stp = malloc(sizeof(struct ast_softc), M_AST, M_WAITOK | M_ZERO);
     if (!stp) {
 	ata_prtdev(atadev, "out of memory\n");
 	return 0;
Index: kern/kern_mpipe.c
===================================================================
RCS file: kern/kern_mpipe.c
diff -N kern/kern_mpipe.c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ kern/kern_mpipe.c	29 Nov 2003 00:11:31 -0000
@@ -0,0 +1,167 @@
+/*
+ * Copyright (c) 2003 Matthew Dillon <dillon@backplane.com>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * $DragonFly$
+ */
+
+#include <sys/param.h>
+#include <sys/systm.h>
+#include <sys/kernel.h>
+#include <sys/slaballoc.h>
+#include <sys/mpipe.h>
+#include <sys/mbuf.h>
+#include <sys/vmmeter.h>
+#include <sys/lock.h>
+#include <sys/thread.h>
+#include <sys/globaldata.h>
+
+#include <sys/thread2.h>
+
+#define arysize(ary)	(sizeof(ary)/sizeof((ary)[0]))
+
+typedef struct mpipe_buf {
+	TAILQ_ENTRY(mpipe_buf)	entry;
+} *mpipe_buf_t;
+
+/*
+ * Initialize a malloc pipeline for the specified malloc type and allocation
+ * size, and immediately allocate nnow buffers and set the nominal maximum
+ * to nmax.
+ */
+void
+mpipe_init(malloc_pipe_t mpipe, malloc_type_t type, int bytes,
+	int nnow, int nmax)
+{
+    if (bytes < sizeof(struct mpipe_buf))
+	bytes = sizeof(struct mpipe_buf);
+    bzero(mpipe, sizeof(struct malloc_pipe));
+    TAILQ_INIT(&mpipe->queue);
+    mpipe->type = type;
+    mpipe->bytes = bytes;
+    mpipe->max_count = nmax;
+    if (nnow > 0) {
+	void *buf;
+
+	buf = malloc(bytes, mpipe->type, M_WAITOK);
+	KKASSERT(buf != NULL);
+	++mpipe->total_count;
+	mpipe_free(mpipe, buf);
+	while (--nnow > 0) {
+	    buf = malloc(bytes, mpipe->type, M_NOWAIT);
+	    if (buf == NULL)
+		break;
+	    ++mpipe->total_count;
+	    mpipe_free(mpipe, buf);
+	}
+    }
+    if (mpipe->max_count < mpipe->total_count)
+	mpipe->max_count = mpipe->total_count;
+}
+
+void
+mpipe_done(malloc_pipe_t mpipe)
+{
+    struct mpipe_buf *buf;
+
+    KKASSERT(mpipe->free_count == mpipe->total_count);
+    while (mpipe->free_count) {
+	buf = TAILQ_FIRST(&mpipe->queue);
+	KKASSERT(buf != NULL);
+	TAILQ_REMOVE(&mpipe->queue, buf, entry);
+	--mpipe->free_count;
+	--mpipe->total_count;
+	free(buf, mpipe->type);
+    }
+    KKASSERT(TAILQ_EMPTY(&mpipe->queue));
+}
+
+/*
+ * Allocate an entry.  flags can be M_NOWAIT which tells us not to block.
+ * Unlike a normal malloc, if we block in mpipe_alloc() no deadlock will occur
+ * because it will unblock the moment an existing in-use buffer is freed.
+ */
+void *
+mpipe_alloc(malloc_pipe_t mpipe, int flags)
+{
+    mpipe_buf_t buf;
+
+    crit_enter();
+    while (mpipe->free_count == 0) {
+	if (mpipe->total_count < mpipe->max_count) {
+	    ++mpipe->total_count;
+	    if ((buf = malloc(mpipe->bytes, mpipe->type, flags)) != NULL) {
+		crit_exit();
+		return(buf);
+	    }
+	    --mpipe->total_count;
+	} else if (flags & M_NOWAIT) {
+	    crit_exit();
+	    return(NULL);
+	} else {
+	    mpipe->pending = 1;
+	    tsleep(mpipe, 0, "mpipe", 0);
+	}
+    }
+    buf = TAILQ_FIRST(&mpipe->queue);
+    KKASSERT(buf != NULL);
+    TAILQ_REMOVE(&mpipe->queue, buf, entry);
+    --mpipe->free_count;
+    crit_exit();
+    if (flags & M_ZERO)
+	bzero(buf, mpipe->bytes);
+    return(buf);
+}
+
+/*
+ * Free an entry, unblock any waiters.
+ */
+void
+mpipe_free(malloc_pipe_t mpipe, void *vbuf)
+{
+    struct mpipe_buf *buf;
+
+    if ((buf = vbuf) != NULL) {
+	crit_enter();
+	if (mpipe->total_count > mpipe->max_count) {
+	    --mpipe->total_count;
+	    crit_exit();
+	    free(buf, mpipe->type);
+	} else {
+	    TAILQ_INSERT_TAIL(&mpipe->queue, buf, entry);
+	    ++mpipe->free_count;
+	    crit_exit();
+	    if (mpipe->free_count >= (mpipe->total_count >> 2) + 1) {
+		if (mpipe->trigger) {
+		    mpipe->trigger(mpipe->trigger_data);
+		}
+		if (mpipe->pending) {
+		    mpipe->pending = 0;
+		    wakeup(mpipe);
+		}
+	    }
+	}
+    }
+}
+
Index: sys/malloc.h
===================================================================
RCS file: /cvs/src/sys/sys/malloc.h,v
retrieving revision 1.15
diff -u -r1.15 malloc.h
--- sys/malloc.h	21 Nov 2003 22:46:13 -0000	1.15
+++ sys/malloc.h	28 Nov 2003 22:07:05 -0000
@@ -93,6 +93,8 @@
 	long	ks_reserved[4];	/* future use (module compatibility) */
 };
 
+typedef struct malloc_type	*malloc_type_t;
+
 #if defined(_KERNEL) || defined(_KERNEL_STRUCTURES)
 
 #define	MALLOC_DEFINE(type, shortdesc, longdesc)	\
Index: sys/mpipe.h
===================================================================
RCS file: sys/mpipe.h
diff -N sys/mpipe.h
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ sys/mpipe.h	28 Nov 2003 22:52:18 -0000
@@ -0,0 +1,68 @@
+/*
+ * Copyright (c) 2003 Matthew Dillon <dillon@backplane.com>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * $DragonFly$
+ */
+
+#ifndef _SYS_MPIPE_H_
+#define _SYS_MPIPE_H_
+
+#ifndef _SYS_MALLOC_H_
+#include <sys/malloc.h>
+#endif
+
+/*
+ * Pipeline memory allocations.  This implements a pipeline for allocations
+ * of a particular size.  It is used in order to allow memory allocations
+ * to block while at the same time guarenteeing that no deadlocks will occur.
+ */
+struct mpipe_buf;
+
+struct malloc_pipe {
+    TAILQ_HEAD(, mpipe_buf) queue;
+    malloc_type_t type;		/* malloc bucket */
+    int		bytes;		/* allocation size */
+    int		pending;	/* there is a request pending */
+    int		free_count;	/* entries in free list */
+    int		total_count;	/* total free + outstanding */
+    int		max_count;	/* maximum cache size */
+    void	(*trigger)(void *data);	/* trigger function on free */
+    void	*trigger_data;
+};
+
+typedef struct malloc_pipe *malloc_pipe_t;
+
+#ifdef _KERNEL
+
+void mpipe_init(malloc_pipe_t mpipe, malloc_type_t type,
+		int bytes, int nnow, int nmax);
+void mpipe_done(malloc_pipe_t mpipe);
+void *mpipe_alloc(malloc_pipe_t mpipe, int flags);
+void mpipe_free(malloc_pipe_t mpipe, void *vbuf);
+
+#endif
+
+#endif
+



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200311290330.hAT3UbNL004349>