From owner-freebsd-hackers@freebsd.org Mon Oct 31 22:10:12 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 824A0C2899E for ; Mon, 31 Oct 2016 22:10:12 +0000 (UTC) (envelope-from badger@FreeBSD.org) Received: from sasl.smtp.pobox.com (pb-smtp1.pobox.com [64.147.108.70]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 535AB175B for ; Mon, 31 Oct 2016 22:10:11 +0000 (UTC) (envelope-from badger@FreeBSD.org) Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-smtp1.pobox.com (Postfix) with ESMTP id E918A4AE05 for ; Mon, 31 Oct 2016 18:07:47 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:subject :to:message-id:date:mime-version:content-type; s=sasl; bh=cRDjPt mrXEAIb3qVkQApRB+HQTg=; b=fydWrCEL3svaaRkIXxBgXcU6ZyssVm8sWgNnIz X3nT74+PkBf/LPunTrhAMHl1pFYuCTaHpCUoWlxM/t/b97efFNPtgoXZd+K2QCio 36Sv/5lE4780foxHQAfhu8Y2UppMlkBvfgJGwCXq0e2Pi+pbTdfQzo6Z1FNTG/2i 9ejVQ= Received: from pb-smtp1.nyi.icgroup.com (unknown [127.0.0.1]) by pb-smtp1.pobox.com (Postfix) with ESMTP id E13314AE04 for ; Mon, 31 Oct 2016 18:07:47 -0400 (EDT) Received: from [172.31.100.239] (unknown [76.164.8.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pb-smtp1.pobox.com (Postfix) with ESMTPSA id 772124AE03 for ; Mon, 31 Oct 2016 18:07:43 -0400 (EDT) From: Eric Badger Subject: Crashes with 'reboot -d' To: freebsd-hackers@freebsd.org Message-ID: Date: Mon, 31 Oct 2016 17:07:25 -0500 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="Xox9P59LvAhc1C3GBVWmiiPqhNNbAxATi" X-Pobox-Relay-ID: 7281AA46-9FB6-11E6-B7B3-987C12518317-46178211!pb-smtp1.pobox.com X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Oct 2016 22:10:12 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --Xox9P59LvAhc1C3GBVWmiiPqhNNbAxATi Content-Type: multipart/mixed; boundary="WEJPSSThNMXMKeKlUmh1E0a988bRNENDT"; protected-headers="v1" From: Eric Badger To: freebsd-hackers@freebsd.org Message-ID: Subject: Crashes with 'reboot -d' --WEJPSSThNMXMKeKlUmh1E0a988bRNENDT Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable I've run into crashes when using 'reboot -d' (or a slightly tweaked version of it in our FreeBSD spin at work). The problem is that dump code is written to run in a panic/crash scenario, when all other CPUs are stopped. In the case of 'reboot -d', all other CPUs are not stopped. The code in xpt_polled_action runs what would normally be done by the interrupt handler, polling start_ccb->ccb_h.status to see when the operation has been completed. If the real interrupt handler is still running, however, polling start_ccb->ccb_h.status is not sufficient; the ccb may be placed in the cam kproc's doneq after start_ccb->ccb_h.status has been updated. The dumper will reuse the ccb's memory, but when the cam kproc processes that item in its doneq, it will twiddle bits and corrupt the now reused ccb memory. I fixed this by shutting off other CPUs when doing a dump during reboot (patch below). This seems fine, but perhaps heavy handed. I also experimented with letting the normal interrupt handler and cam kproc do the work when we're not in a SCHEDULER_STOPPED() scenario. This seemed to reduce dump performance and make performance less consistent, but otherwise worked ok. I'd appreciate any comments on things I may have failed to consider. If no objections are raised, I will proceed with the patch here. Thanks, Eric diff --git a/sys/kern/kern_shutdown.c b/sys/kern/kern_shutdown.c index 79c4c30..bdc0182 100644 --- a/sys/kern/kern_shutdown.c +++ b/sys/kern/kern_shutdown.c @@ -319,8 +319,9 @@ void kern_reboot(int howto) { static int once =3D 0; +#ifdef SMP + cpuset_t other_cpus; -#if defined(SMP) /* * Bind us to CPU 0 so that all shutdown code runs there. Some * systems don't shutdown properly (i.e., ACPI power off) if we @@ -362,8 +363,28 @@ kern_reboot(int howto) */ EVENTHANDLER_INVOKE(shutdown_post_sync, howto); - if ((howto & (RB_HALT|RB_DUMP)) =3D=3D RB_DUMP && !cold && !dumping) + if ((howto & (RB_HALT|RB_DUMP)) =3D=3D RB_DUMP && !cold && !dumping) { +#ifdef SMP + /* + * Dump code assumes that all other CPUs have stopped, and thus + * handles disk interrupts manually. This assumption must be enforced, + * as otherwise the real interrupt handler may race with the dumper. + */ + if (!SCHEDULER_STOPPED()) { + spinlock_enter(); + + other_cpus =3D all_cpus; + CPU_CLR(PCPU_GET(cpuid), &other_cpus); + stop_cpus_hard(other_cpus); + + curthread->td_stopsched =3D 1; + + /* Module shutdown is no longer safe. */ + howto |=3D RB_NOSYNC; + } +#endif doadump(TRUE); + } /* Now that we're going to really halt the system... */ EVENTHANDLER_INVOKE(shutdown_final, howto); --WEJPSSThNMXMKeKlUmh1E0a988bRNENDT-- --Xox9P59LvAhc1C3GBVWmiiPqhNNbAxATi Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQGQBAEBCgB6BQJYF8CmXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQzQTlFODAxM0JDQTdDOTQ1ODI1Mzc3NTk2 MkU1MDA5NjVBM0YyNEFDExxiYWRnZXJAZnJlZWJzZC5vcmcACgkQYuUAllo/JKyk fwgA3Bq+vMAeSVV/Wqbw03yWnH0EUJ3uvc9HkrJX9yFiiHtHO0CyLNJZw8+qnyFQ V/IRBePEUebCDAVrCzlp5493ZERdqSkOwLehXeaaLJ2e02Uo5WbQIn5/7+1Lw/u9 1u0iDMPLFVaxWlSXRGNmmg2NHXXfI2lITzNl5xUcu1GwH6cNrIgSNRMGUhrDumlH HJZopA1C6+Durn93Au5jtFd2kKxsEI1wbpKBdK4qpkM34fkMugE6rha8ZcZkX0xF aghiywSvRT4ylPeDh86i8yUkC5rHNfrLAuKNfyQ6OatVDnrXQCApGZ9mz/4JMC+U lYfrdKuQvHIbydvI9j5syQ+rMQ== =Fx5o -----END PGP SIGNATURE----- --Xox9P59LvAhc1C3GBVWmiiPqhNNbAxATi--