Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Sep 2021 03:24:23 GMT
From:      Warner Losh <imp@FreeBSD.org>
To:        src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org
Subject:   git: 7d5eebe0f4a0 - main - nvme: Add sanity check for phase on startup.
Message-ID:  <202109290324.18T3ONUd075559@gitrepo.freebsd.org>

next in thread | raw e-mail | index | archive | help
The branch main has been updated by imp:

URL: https://cgit.FreeBSD.org/src/commit/?id=7d5eebe0f4a0f2aa5c8c7dfdd1a9ce1513849da8

commit 7d5eebe0f4a0f2aa5c8c7dfdd1a9ce1513849da8
Author:     Warner Losh <imp@FreeBSD.org>
AuthorDate: 2021-09-29 03:11:17 +0000
Commit:     Warner Losh <imp@FreeBSD.org>
CommitDate: 2021-09-29 03:18:00 +0000

    nvme: Add sanity check for phase on startup.
    
    The proper phase for the qpiar right after reset in the first interrupt
    is 1. For it, make sure that we're not still in phase 0. This is an
    illegal state to be processing interrupts and indicates that we've
    failed to properly protect against a race between initializing our state
    and processing interrupts. Modify stat resetting code so it resets the
    number of interrpts to 1 instead of 0 so we don't trigger a false
    positive panic.
    
    Sponsored by:           Netflix
    Reviewed by:            cperciva, mav (prior version)
    Differential Revision:  https://reviews.freebsd.org/D32211
---
 sys/dev/nvme/nvme_qpair.c  | 21 ++++++++++++++++++---
 sys/dev/nvme/nvme_sysctl.c |  7 ++++++-
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/sys/dev/nvme/nvme_qpair.c b/sys/dev/nvme/nvme_qpair.c
index 6ee5fa9d4c30..827054efd48e 100644
--- a/sys/dev/nvme/nvme_qpair.c
+++ b/sys/dev/nvme/nvme_qpair.c
@@ -536,16 +536,31 @@ nvme_qpair_process_completions(struct nvme_qpair *qpair)
 	int done = 0;
 	bool in_panic = dumping || SCHEDULER_STOPPED();
 
-	qpair->num_intr_handler_calls++;
-
 	/*
 	 * qpair is not enabled, likely because a controller reset is is in
 	 * progress.  Ignore the interrupt - any I/O that was associated with
-	 * this interrupt will get retried when the reset is complete.
+	 * this interrupt will get retried when the reset is complete. Any
+	 * pending completions for when we're in startup will be completed
+	 * as soon as initialization is complete and we start sending commands
+	 * to the device.
 	 */
 	if (qpair->recovery_state != RECOVERY_NONE)
 		return (false);
 
+	/*
+	 * Sanity check initialization. After we reset the hardware, the phase
+	 * is defined to be 1. So if we get here with zero prior calls and the
+	 * phase is 0, it means that we've lost a race between the
+	 * initialization and the ISR running. With the phase wrong, we'll
+	 * process a bunch of completions that aren't really completions leading
+	 * to a KASSERT below.
+	 */
+	KASSERT(!(qpair->num_intr_handler_calls == 0 && qpair->phase == 0),
+	    ("%s: Phase wrong for first interrupt call.",
+		device_get_nameunit(qpair->ctrlr->dev)));
+
+	qpair->num_intr_handler_calls++;
+
 	bus_dmamap_sync(qpair->dma_tag, qpair->queuemem_map,
 	    BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
 	/*
diff --git a/sys/dev/nvme/nvme_sysctl.c b/sys/dev/nvme/nvme_sysctl.c
index 9ec1f14511f5..58db38dc373d 100644
--- a/sys/dev/nvme/nvme_sysctl.c
+++ b/sys/dev/nvme/nvme_sysctl.c
@@ -155,8 +155,13 @@ static void
 nvme_qpair_reset_stats(struct nvme_qpair *qpair)
 {
 
+	/*
+	 * Reset the values. Due to sanity checks in
+	 * nvme_qpair_process_completions, we reset the number of interrupt
+	 * calls to 1.
+	 */
 	qpair->num_cmds = 0;
-	qpair->num_intr_handler_calls = 0;
+	qpair->num_intr_handler_calls = 1;
 	qpair->num_retries = 0;
 	qpair->num_failures = 0;
 }



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?202109290324.18T3ONUd075559>