Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Sep 2017 19:19:40 -0400
From:      john hood <cgull@glup.org>
To:        freebsd-scsi@freebsd.org, jhb@freebsd.org, jkim@freebsd.org
Subject:   GEOM probes fail on aac with EARLY_AP_STARTUP
Message-ID:  <f2dceefe-8c2c-1bad-95ab-9dd138c8fcbe@glup.org>

next in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------CFD810116ED4341A2805A58E
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

I've got a devel machine here which was failing to boot on our vendored
FreeBSD 11.1, because GEOM was unable to find the partitions on the boot
drive and so the root mount failed.=C2=A0 This started happening on many =
but
not all boots after I upgraded the machine from 9.3.

The machine is an Intel S25520UR motherboard with 2x Xeon E5620 CPUs
(Hyperthreading enabled, so hw.ncpu=3D16) and an Adaptec 5805, and 2 RAID=

volumes configured on 6 SATA drives.

When booting, it sees the aac0 controller and aacd0
volume but GEOM does not find any of the partitions on that volume, and t=
he
initial mount of root on /dev/aacd0p2 fails.  aacd0 is available and
readable, but the expected aacd0p{1,2,3} devices do not exist.
(However, aacd1 and its partitions/devices are configured normally.)

I think it's a race condition between the aac driver and GEOM probing,
probably newly triggered/exposed by EARLY_AP_STARTUP.=C2=A0 I've reproduc=
ed
the problem on upstream FreeBSD 11.1 and -current.=C2=A0 Disabling
EARLY_AP_STARTUP, or setting kern.smp.disabled=3D1, causes the kernel to
start correctly. 'boot -v' also causes the kernel to start correctly.

The kernel calls aac_attach() which uses
configure_intrhook_establish() to run aac_startup() later.  When that
runs, it adds devices via
aac_add_container()/device_add_child()/bus_generic_attach().

However, at the beginning of aac_attach(), an AAC_STATE_SUSPEND flag
is set.  It is cleared at the end of aac_startup().  It appears that
GEOM probes call aac_disk_open(), which checks the flag and returns
error if it is set.  On my system the race is that the GEOM probes
happen before that flag is cleared, possibly because GEOM is tasting
aacd0 while the aac driver is still attaching aacd1.  So the GEOM probes
fail and the geom nodes never get created.  If I boot with the -v flag,
the kernel boots successfully, I think because the message printing
takes long enough to delay GEOM probing past aac_start() completion.

I've attached a patch which resolves the problem on FreeBSD-current (and =
11.1), would anybody care to adopt it and shepherd it into SVN?

regards,

  --John Hood


--------------CFD810116ED4341A2805A58E
Content-Type: text/plain; charset=UTF-8; x-mac-type="0"; x-mac-creator="0";
 name="aac.diff"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="aac.diff"

T25seSBpbiBzeXMvYW1kNjQvY29tcGlsZTogQUFDUFJPQkUKT25seSBpbiBzeXMvYW1kNjQv
Y29uZjogQUFDUFJPQkUKT25seSBpbiBzeXMvYW1kNjQvY29uZjogQUFDUFJPQkV+CmRpZmYg
LXUgLXIgc3lzLm9yaWcvZGV2L2FhYy9hYWMuYyBzeXMvZGV2L2FhYy9hYWMuYwotLS0gc3lz
Lm9yaWcvZGV2L2FhYy9hYWMuYwkyMDE3LTA5LTA1IDA5OjA2OjI2LjAwMDAwMDAwMCAtMDQw
MAorKysgc3lzL2Rldi9hYWMvYWFjLmMJMjAxNy0wOS0wNyAxNDoyNzozMi40NjE1MjgwMDAg
LTA0MDAKQEAgLTQxOCw5ICs0MTgsNiBAQAogCXNjID0gKHN0cnVjdCBhYWNfc29mdGMgKilh
cmc7CiAJZndwcmludGYoc2MsIEhCQV9GTEFHU19EQkdfRlVOQ1RJT05fRU5UUllfQiwgIiIp
OwogCi0JLyogZGlzY29ubmVjdCBvdXJzZWx2ZXMgZnJvbSB0aGUgaW50cmhvb2sgY2hhaW4g
Ki8KLQljb25maWdfaW50cmhvb2tfZGlzZXN0YWJsaXNoKCZzYy0+YWFjX2ljaCk7Ci0KIAlt
dHhfbG9jaygmc2MtPmFhY19pb19sb2NrKTsKIAlhYWNfYWxsb2Nfc3luY19maWIoc2MsICZm
aWIpOwogCkBAIC00MzcsMTIgKzQzNCwxNSBAQAogCWFhY19yZWxlYXNlX3N5bmNfZmliKHNj
KTsKIAltdHhfdW5sb2NrKCZzYy0+YWFjX2lvX2xvY2spOwogCisJLyogbWFyayB0aGUgY29u
dHJvbGxlciB1cCAqLworCXNjLT5hYWNfc3RhdGUgJj0gfkFBQ19TVEFURV9TVVNQRU5EOwor
CiAJLyogcG9rZSB0aGUgYnVzIHRvIGFjdHVhbGx5IGF0dGFjaCB0aGUgY2hpbGQgZGV2aWNl
cyAqLwogCWlmIChidXNfZ2VuZXJpY19hdHRhY2goc2MtPmFhY19kZXYpKQogCQlkZXZpY2Vf
cHJpbnRmKHNjLT5hYWNfZGV2LCAiYnVzX2dlbmVyaWNfYXR0YWNoIGZhaWxlZFxuIik7CiAK
LQkvKiBtYXJrIHRoZSBjb250cm9sbGVyIHVwICovCi0Jc2MtPmFhY19zdGF0ZSAmPSB+QUFD
X1NUQVRFX1NVU1BFTkQ7CisJLyogZGlzY29ubmVjdCBvdXJzZWx2ZXMgZnJvbSB0aGUgaW50
cmhvb2sgY2hhaW4gKi8KKwljb25maWdfaW50cmhvb2tfZGlzZXN0YWJsaXNoKCZzYy0+YWFj
X2ljaCk7CiAKIAkvKiBlbmFibGUgaW50ZXJydXB0cyBub3cgKi8KIAlBQUNfVU5NQVNLX0lO
VEVSUlVQVFMoc2MpOwpPbmx5IGluIHN5cy9kZXYvYWFjOiBhYWMuYy5vcmlnCk9ubHkgaW4g
c3lzL2Rldi9hYWM6IGFhYy5jfgpPbmx5IGluIHN5cy9kZXYvYWFjOiBhYWNfZGlzay5jLm9y
aWcKT25seSBpbiBzeXMvZGV2L2FhYzogYWFjX2Rpc2suY34KT25seSBpbiBzeXMvZ2VvbTog
Z2VvbV9kaXNrLmMub3JpZwpPbmx5IGluIHN5cy9nZW9tOiBnZW9tX2Rpc2suY34K
--------------CFD810116ED4341A2805A58E--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f2dceefe-8c2c-1bad-95ab-9dd138c8fcbe>