Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 8 Sep 2017 11:12:06 -0400
From:      Scott Long <scottl@samsco.org>
To:        john hood <cgull@glup.org>
Cc:        freebsd-scsi@freebsd.org, jhb@freebsd.org, jkim@freebsd.org
Subject:   Re: GEOM probes fail on aac with EARLY_AP_STARTUP
Message-ID:  <AB09EF90-2392-4D1D-8F22-B6E1EBDD0E45@samsco.org>
In-Reply-To: <f2dceefe-8c2c-1bad-95ab-9dd138c8fcbe@glup.org>
References:  <f2dceefe-8c2c-1bad-95ab-9dd138c8fcbe@glup.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi John,

Great bug report and analysis.  I think you=E2=80=99re right, behavior =
in the system
changed with EARLY_AP_STARTUP and the intrhook is being released too
soon now, before the driver is ready for concurrent access.  I=E2=80=99ll =
shepherd it
into SVN.  There=E2=80=99s a similar pattern in most of the non-CAM =
drivers, so I=E2=80=99ll
review them as well.

Scott

> On Sep 7, 2017, at 7:19 PM, john hood <cgull@glup.org> wrote:
>=20
> I've got a devel machine here which was failing to boot on our =
vendored
> FreeBSD 11.1, because GEOM was unable to find the partitions on the =
boot
> drive and so the root mount failed.  This started happening on many =
but
> not all boots after I upgraded the machine from 9.3.
>=20
> The machine is an Intel S25520UR motherboard with 2x Xeon E5620 CPUs
> (Hyperthreading enabled, so hw.ncpu=3D16) and an Adaptec 5805, and 2 =
RAID
> volumes configured on 6 SATA drives.
>=20
> When booting, it sees the aac0 controller and aacd0
> volume but GEOM does not find any of the partitions on that volume, =
and the
> initial mount of root on /dev/aacd0p2 fails.  aacd0 is available and
> readable, but the expected aacd0p{1,2,3} devices do not exist.
> (However, aacd1 and its partitions/devices are configured normally.)
>=20
> I think it's a race condition between the aac driver and GEOM probing,
> probably newly triggered/exposed by EARLY_AP_STARTUP.  I've reproduced
> the problem on upstream FreeBSD 11.1 and -current.  Disabling
> EARLY_AP_STARTUP, or setting kern.smp.disabled=3D1, causes the kernel =
to
> start correctly. 'boot -v' also causes the kernel to start correctly.
>=20
> The kernel calls aac_attach() which uses
> configure_intrhook_establish() to run aac_startup() later.  When that
> runs, it adds devices via
> aac_add_container()/device_add_child()/bus_generic_attach().
>=20
> However, at the beginning of aac_attach(), an AAC_STATE_SUSPEND flag
> is set.  It is cleared at the end of aac_startup().  It appears that
> GEOM probes call aac_disk_open(), which checks the flag and returns
> error if it is set.  On my system the race is that the GEOM probes
> happen before that flag is cleared, possibly because GEOM is tasting
> aacd0 while the aac driver is still attaching aacd1.  So the GEOM =
probes
> fail and the geom nodes never get created.  If I boot with the -v =
flag,
> the kernel boots successfully, I think because the message printing
> takes long enough to delay GEOM probing past aac_start() completion.
>=20
> I've attached a patch which resolves the problem on FreeBSD-current =
(and 11.1), would anybody care to adopt it and shepherd it into SVN?
>=20
> regards,
>=20
>  --John Hood
>=20
> <aac.diff>_______________________________________________
> freebsd-scsi@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to =
"freebsd-scsi-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AB09EF90-2392-4D1D-8F22-B6E1EBDD0E45>