From owner-freebsd-scsi@freebsd.org Thu Sep 7 23:19:50 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 73BFCE1693D for ; Thu, 7 Sep 2017 23:19:50 +0000 (UTC) (envelope-from cgull@glup.org) Received: from glup.org (e6.glup.org [IPv6:2001:470:8bf0:1::3]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9C3186EAFE; Thu, 7 Sep 2017 23:19:49 +0000 (UTC) (envelope-from cgull@glup.org) Received: from minipixel.i.glup.org (unknown [IPv6:2001:470:8bf0:1:1867:405d:adad:1ac9]) by glup.org (Postfix) with ESMTPSA id 5EBED854DE; Thu, 7 Sep 2017 19:19:40 -0400 (EDT) Authentication-Results: glup.org; dmarc=none header.from=glup.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=glup.org; s=201009; t=1504826380; bh=Svo+719ay7wMIlBsENTgU18LsT7ibDZjh2G47bo4of0=; h=To:From:Subject:Message-ID:Date:MIME-Version:Content-Type; b=CTT+vjoeIaSQIC9Svh72izyirQBdxxlE5m9Jc5jZ6DyDostFhe0hMT/YNAjqmbu0Q 8kjoHaff0fz0aS1T0iJQWvSviAO47szzFIRq6d8pzeM+XkVR83UUDG3JOr5YPTHvU3 neQfWtCQqWUS20CH9SY/I+w0zhdYaqfrng9wsuaVHaa3DZwdY8TOD73QkH+It/H65n 7CrtcItvXIwxciDzxztS+AfPxafwMm7ai/M6G12bHD2bWY1M5TDY9l8TteB5lDxBCo q4HyCwlHIM2gJjZuWiPaa4cguQWfcCl5cDOmqvRq7kJFzBVqGr85PgDhO+bBknj4Hm DNFMVL78j7JzA== To: freebsd-scsi@freebsd.org, jhb@freebsd.org, jkim@freebsd.org From: john hood Subject: GEOM probes fail on aac with EARLY_AP_STARTUP Message-ID: Date: Thu, 7 Sep 2017 19:19:40 -0400 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------CFD810116ED4341A2805A58E" Content-Language: en-US X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Sep 2017 23:19:50 -0000 This is a multi-part message in MIME format. --------------CFD810116ED4341A2805A58E Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable I've got a devel machine here which was failing to boot on our vendored FreeBSD 11.1, because GEOM was unable to find the partitions on the boot drive and so the root mount failed.=C2=A0 This started happening on many = but not all boots after I upgraded the machine from 9.3. The machine is an Intel S25520UR motherboard with 2x Xeon E5620 CPUs (Hyperthreading enabled, so hw.ncpu=3D16) and an Adaptec 5805, and 2 RAID= volumes configured on 6 SATA drives. When booting, it sees the aac0 controller and aacd0 volume but GEOM does not find any of the partitions on that volume, and t= he initial mount of root on /dev/aacd0p2 fails. aacd0 is available and readable, but the expected aacd0p{1,2,3} devices do not exist. (However, aacd1 and its partitions/devices are configured normally.) I think it's a race condition between the aac driver and GEOM probing, probably newly triggered/exposed by EARLY_AP_STARTUP.=C2=A0 I've reproduc= ed the problem on upstream FreeBSD 11.1 and -current.=C2=A0 Disabling EARLY_AP_STARTUP, or setting kern.smp.disabled=3D1, causes the kernel to start correctly. 'boot -v' also causes the kernel to start correctly. The kernel calls aac_attach() which uses configure_intrhook_establish() to run aac_startup() later. When that runs, it adds devices via aac_add_container()/device_add_child()/bus_generic_attach(). However, at the beginning of aac_attach(), an AAC_STATE_SUSPEND flag is set. It is cleared at the end of aac_startup(). It appears that GEOM probes call aac_disk_open(), which checks the flag and returns error if it is set. On my system the race is that the GEOM probes happen before that flag is cleared, possibly because GEOM is tasting aacd0 while the aac driver is still attaching aacd1. So the GEOM probes fail and the geom nodes never get created. If I boot with the -v flag, the kernel boots successfully, I think because the message printing takes long enough to delay GEOM probing past aac_start() completion. I've attached a patch which resolves the problem on FreeBSD-current (and = 11.1), would anybody care to adopt it and shepherd it into SVN? regards, --John Hood --------------CFD810116ED4341A2805A58E Content-Type: text/plain; charset=UTF-8; x-mac-type="0"; x-mac-creator="0"; name="aac.diff" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="aac.diff" T25seSBpbiBzeXMvYW1kNjQvY29tcGlsZTogQUFDUFJPQkUKT25seSBpbiBzeXMvYW1kNjQv Y29uZjogQUFDUFJPQkUKT25seSBpbiBzeXMvYW1kNjQvY29uZjogQUFDUFJPQkV+CmRpZmYg LXUgLXIgc3lzLm9yaWcvZGV2L2FhYy9hYWMuYyBzeXMvZGV2L2FhYy9hYWMuYwotLS0gc3lz Lm9yaWcvZGV2L2FhYy9hYWMuYwkyMDE3LTA5LTA1IDA5OjA2OjI2LjAwMDAwMDAwMCAtMDQw MAorKysgc3lzL2Rldi9hYWMvYWFjLmMJMjAxNy0wOS0wNyAxNDoyNzozMi40NjE1MjgwMDAg LTA0MDAKQEAgLTQxOCw5ICs0MTgsNiBAQAogCXNjID0gKHN0cnVjdCBhYWNfc29mdGMgKilh cmc7CiAJZndwcmludGYoc2MsIEhCQV9GTEFHU19EQkdfRlVOQ1RJT05fRU5UUllfQiwgIiIp OwogCi0JLyogZGlzY29ubmVjdCBvdXJzZWx2ZXMgZnJvbSB0aGUgaW50cmhvb2sgY2hhaW4g Ki8KLQljb25maWdfaW50cmhvb2tfZGlzZXN0YWJsaXNoKCZzYy0+YWFjX2ljaCk7Ci0KIAlt dHhfbG9jaygmc2MtPmFhY19pb19sb2NrKTsKIAlhYWNfYWxsb2Nfc3luY19maWIoc2MsICZm aWIpOwogCkBAIC00MzcsMTIgKzQzNCwxNSBAQAogCWFhY19yZWxlYXNlX3N5bmNfZmliKHNj KTsKIAltdHhfdW5sb2NrKCZzYy0+YWFjX2lvX2xvY2spOwogCisJLyogbWFyayB0aGUgY29u dHJvbGxlciB1cCAqLworCXNjLT5hYWNfc3RhdGUgJj0gfkFBQ19TVEFURV9TVVNQRU5EOwor CiAJLyogcG9rZSB0aGUgYnVzIHRvIGFjdHVhbGx5IGF0dGFjaCB0aGUgY2hpbGQgZGV2aWNl cyAqLwogCWlmIChidXNfZ2VuZXJpY19hdHRhY2goc2MtPmFhY19kZXYpKQogCQlkZXZpY2Vf cHJpbnRmKHNjLT5hYWNfZGV2LCAiYnVzX2dlbmVyaWNfYXR0YWNoIGZhaWxlZFxuIik7CiAK LQkvKiBtYXJrIHRoZSBjb250cm9sbGVyIHVwICovCi0Jc2MtPmFhY19zdGF0ZSAmPSB+QUFD X1NUQVRFX1NVU1BFTkQ7CisJLyogZGlzY29ubmVjdCBvdXJzZWx2ZXMgZnJvbSB0aGUgaW50 cmhvb2sgY2hhaW4gKi8KKwljb25maWdfaW50cmhvb2tfZGlzZXN0YWJsaXNoKCZzYy0+YWFj X2ljaCk7CiAKIAkvKiBlbmFibGUgaW50ZXJydXB0cyBub3cgKi8KIAlBQUNfVU5NQVNLX0lO VEVSUlVQVFMoc2MpOwpPbmx5IGluIHN5cy9kZXYvYWFjOiBhYWMuYy5vcmlnCk9ubHkgaW4g c3lzL2Rldi9hYWM6IGFhYy5jfgpPbmx5IGluIHN5cy9kZXYvYWFjOiBhYWNfZGlzay5jLm9y aWcKT25seSBpbiBzeXMvZGV2L2FhYzogYWFjX2Rpc2suY34KT25seSBpbiBzeXMvZ2VvbTog Z2VvbV9kaXNrLmMub3JpZwpPbmx5IGluIHN5cy9nZW9tOiBnZW9tX2Rpc2suY34K --------------CFD810116ED4341A2805A58E-- From owner-freebsd-scsi@freebsd.org Fri Sep 8 13:15:16 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 37C3FE17441 for ; Fri, 8 Sep 2017 13:15:16 +0000 (UTC) (envelope-from sthaug@nethelp.no) Received: from bizet.nethelp.no (bizet.nethelp.no [IPv6:2001:8c0:9e04:500::1]) by mx1.freebsd.org (Postfix) with ESMTP id EE6687ED1C; Fri, 8 Sep 2017 13:15:15 +0000 (UTC) (envelope-from sthaug@nethelp.no) Received: from localhost (bizet.nethelp.no [IPv6:2001:8c0:9e04:500::1]) by bizet.nethelp.no (Postfix) with ESMTP id 57214E6074; Fri, 8 Sep 2017 15:15:04 +0200 (CEST) Date: Fri, 08 Sep 2017 15:15:04 +0200 (CEST) Message-Id: <20170908.151504.74703639.sthaug@nethelp.no> To: cgull@glup.org Cc: freebsd-scsi@freebsd.org, jhb@freebsd.org, jkim@freebsd.org Subject: Re: GEOM probes fail on aac with EARLY_AP_STARTUP From: sthaug@nethelp.no In-Reply-To: References: X-Mailer: Mew version 3.3 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Sep 2017 13:15:16 -0000 > I've got a devel machine here which was failing to boot on our vendor= ed > FreeBSD 11.1, because GEOM was unable to find the partitions on the b= oot > drive and so the root mount failed.=A0 This started happening on many= but > not all boots after I upgraded the machine from 9.3. > = > The machine is an Intel S25520UR motherboard with 2x Xeon E5620 CPUs > (Hyperthreading enabled, so hw.ncpu=3D16) and an Adaptec 5805, and 2 = RAID > volumes configured on 6 SATA drives. > = > When booting, it sees the aac0 controller and aacd0 > volume but GEOM does not find any of the partitions on that volume, a= nd the > initial mount of root on /dev/aacd0p2 fails. aacd0 is available and > readable, but the expected aacd0p{1,2,3} devices do not exist. > (However, aacd1 and its partitions/devices are configured normally.) > = > I think it's a race condition between the aac driver and GEOM probing= , > probably newly triggered/exposed by EARLY_AP_STARTUP.=A0 I've reprodu= ced > the problem on upstream FreeBSD 11.1 and -current.=A0 Disabling > EARLY_AP_STARTUP, or setting kern.smp.disabled=3D1, causes the kernel= to > start correctly. 'boot -v' also causes the kernel to start correctly.= Is there any reason to believe this is limited to aac? I'm asking because your description is quite similar to boot problema I'm seeing with 11.1-STABLE on a server with mps (Avago) SCSI/SATA controller and SATA disks. I'm getting the dreaded "mounting from ... failed with error 19". 11.1-RELEASE seems to work okay but 11.1-STABLE does not. Steinar Haug, Nethelp consulting, sthaug@nethelp.no From owner-freebsd-scsi@freebsd.org Fri Sep 8 15:06:09 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 99864E1CB92 for ; Fri, 8 Sep 2017 15:06:09 +0000 (UTC) (envelope-from cgull@glup.org) Received: from glup.org (e6.glup.org [IPv6:2001:470:8bf0:1::3]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5E5E682C8B; Fri, 8 Sep 2017 15:06:09 +0000 (UTC) (envelope-from cgull@glup.org) Received: from minipixel.i.glup.org (unknown [IPv6:2001:470:8bf0:1:1867:405d:adad:1ac9]) by glup.org (Postfix) with ESMTPSA id 2B0F8854DE; Fri, 8 Sep 2017 11:06:06 -0400 (EDT) Authentication-Results: glup.org; dmarc=none header.from=glup.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=glup.org; s=201009; t=1504883166; bh=jE1HK2DB0ekXseyBt13ZJIrnLagXwkYbyHlJQJ3wFjQ=; h=Subject:To:Cc:References:From:Message-ID:Date:MIME-Version: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=PaX/hILDEsm/sCSgt4mKvYeme4OVnqgBXJLtsVUR16YJFjHp9O8gzxzeA78/3fExW APS0dreDxorM8VG82oSE/0mRcNJztxOuqs6130g/hD/1KZ3B/0+c08jo2FE6uxm8NE bsRf+pI6hRCAU/UWz4PozjOsoEnP881CGxqvzB5w7xoleWh/+1up/I/WeB6WF5Hl6f s0xuc2x0albntzdpGJF2sSpgtpYQIGbxbChmr/MiOPdKkATShcLrKz0+iNerDZCQqN QgPmXg1DjkT8g72MERip1d3jtJb3t111J9aGdBFFzzrsAXjf1hruqMftiHOOis65+D 2Njat9kEDziiw== Subject: Re: GEOM probes fail on aac with EARLY_AP_STARTUP To: sthaug@nethelp.no Cc: freebsd-scsi@freebsd.org, jhb@freebsd.org, jkim@freebsd.org References: <20170908.151504.74703639.sthaug@nethelp.no> From: john hood Message-ID: <0c73b27e-d4ac-4003-81df-b2d9a0a63a81@glup.org> Date: Fri, 8 Sep 2017 11:06:05 -0400 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: <20170908.151504.74703639.sthaug@nethelp.no> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Language: en-US X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Sep 2017 15:06:09 -0000 On 9/8/17 9:15 AM, sthaug@nethelp.no wrote: > Is there any reason to believe this is limited to aac? I'm asking > because your description is quite similar to boot problema I'm seeing > with 11.1-STABLE on a server with mps (Avago) SCSI/SATA controller > and SATA disks. I'm getting the dreaded "mounting from ... failed > with error 19". 11.1-RELEASE seems to work okay but 11.1-STABLE does > not. Your issue isn't obviously directly related to mine. mps is a CAM driver and aac is not (at least not for normal block volumes), which makes disk probe/attach quite different.=C2=A0 I also don't see an obvious driver fl= ag like the one in aac. Do you see GEOM error messages like "Opened disk aacd0 -> 6"?=C2=A0 When = you get a failed boot, if you type '?' at the mountroot prompt, do you see the base device (ie "da0"), but not partitions/slices ("da0s1", "da0p1")?=C2=A0 If both of those are true, then your problem might be sim= ilar to mine. regards, =C2=A0 --jh From owner-freebsd-scsi@freebsd.org Fri Sep 8 15:12:09 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D34D8E1D184 for ; Fri, 8 Sep 2017 15:12:09 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A343E83155; Fri, 8 Sep 2017 15:12:09 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 3DF2120B0A; Fri, 8 Sep 2017 11:12:08 -0400 (EDT) Received: from frontend1 ([10.202.2.160]) by compute6.internal (MEProxy); Fri, 08 Sep 2017 11:12:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsco.org; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-sender :x-me-sender:x-sasl-enc:x-sasl-enc; s=fm1; bh=r0ZjKGcdF3IT9tU5Oh hDIsFU8Dd9+Ee1UfclOjypKco=; b=ttpnviIvOHlryDQfRxo4RtQb/wnZuslsIC tQ+sGnjuWd+QOk9E350IilKCIU8pgHTN+gL/1VSgKSYvCzK/7T7KqQHL3XrpJ4df HuaLBpJ+T4WYtD1NEngUSZvZ76kW160yKhZDIFHa8L+cF1dMu6vQLhqCes876lJJ OykTCnJsLeG4sfqjBSEpC0vBpwhjdr056Sp0QnkLl8/Zo7tfJSvOasK8xcFqA1xc olo6EEv52H7mBfkF4X+ifVuZlrW80UtG/a+VB3yENY1sQYfOPoOM7pCgpP44xskh CE2gbd5wv8455kFalEH1+ShYiV3gsHJSZFEldKPo8Y03O58ENlAA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-sender:x-me-sender:x-sasl-enc:x-sasl-enc; s= fm1; bh=r0ZjKGcdF3IT9tU5OhhDIsFU8Dd9+Ee1UfclOjypKco=; b=rka/cHE2 CdUTa8Sc6hfNuyZJbbTzR5hVCKKNLKUonLWSWQ0/J9oxsK9UYBmqjJHUkXgWONnk CoQMNLWWFXljyHgoJF2oCvKhM7XaqpR0Ux/htxKP7LSDWAXTNwIFIFAABFLQE2Aw WCt/VmLJMIhAdAuj8+Pug0IvjsAnucJDMjw0/42Mgkppp7xa1O6Y2y05Wuis/VV1 bNG2vl4AW4ylXOKXkYb+Kf27LWEOImvNxC1IesRO8lLE5l2XL3PFVclZDzybKl2t CMSPyfZFFiary2J3pKMqD4FQzuMDKihrcLAtLsbuYUvDdAursOqqj5dH3OhlREGu FBRCWRWu9hBuKQ== X-ME-Sender: X-Sasl-enc: bffrFF5O84L8kYEHZH37yvgLIlsBRSNBKGkaji1PeO/B 1504883527 Received: from [10.199.7.12] (unknown [50.235.236.73]) by mail.messagingengine.com (Postfix) with ESMTPA id AF3F57F96A; Fri, 8 Sep 2017 11:12:07 -0400 (EDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: GEOM probes fail on aac with EARLY_AP_STARTUP From: Scott Long In-Reply-To: Date: Fri, 8 Sep 2017 11:12:06 -0400 Cc: freebsd-scsi@freebsd.org, jhb@freebsd.org, jkim@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: To: john hood X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Sep 2017 15:12:09 -0000 Hi John, Great bug report and analysis. I think you=E2=80=99re right, behavior = in the system changed with EARLY_AP_STARTUP and the intrhook is being released too soon now, before the driver is ready for concurrent access. I=E2=80=99ll = shepherd it into SVN. There=E2=80=99s a similar pattern in most of the non-CAM = drivers, so I=E2=80=99ll review them as well. Scott > On Sep 7, 2017, at 7:19 PM, john hood wrote: >=20 > I've got a devel machine here which was failing to boot on our = vendored > FreeBSD 11.1, because GEOM was unable to find the partitions on the = boot > drive and so the root mount failed. This started happening on many = but > not all boots after I upgraded the machine from 9.3. >=20 > The machine is an Intel S25520UR motherboard with 2x Xeon E5620 CPUs > (Hyperthreading enabled, so hw.ncpu=3D16) and an Adaptec 5805, and 2 = RAID > volumes configured on 6 SATA drives. >=20 > When booting, it sees the aac0 controller and aacd0 > volume but GEOM does not find any of the partitions on that volume, = and the > initial mount of root on /dev/aacd0p2 fails. aacd0 is available and > readable, but the expected aacd0p{1,2,3} devices do not exist. > (However, aacd1 and its partitions/devices are configured normally.) >=20 > I think it's a race condition between the aac driver and GEOM probing, > probably newly triggered/exposed by EARLY_AP_STARTUP. I've reproduced > the problem on upstream FreeBSD 11.1 and -current. Disabling > EARLY_AP_STARTUP, or setting kern.smp.disabled=3D1, causes the kernel = to > start correctly. 'boot -v' also causes the kernel to start correctly. >=20 > The kernel calls aac_attach() which uses > configure_intrhook_establish() to run aac_startup() later. When that > runs, it adds devices via > aac_add_container()/device_add_child()/bus_generic_attach(). >=20 > However, at the beginning of aac_attach(), an AAC_STATE_SUSPEND flag > is set. It is cleared at the end of aac_startup(). It appears that > GEOM probes call aac_disk_open(), which checks the flag and returns > error if it is set. On my system the race is that the GEOM probes > happen before that flag is cleared, possibly because GEOM is tasting > aacd0 while the aac driver is still attaching aacd1. So the GEOM = probes > fail and the geom nodes never get created. If I boot with the -v = flag, > the kernel boots successfully, I think because the message printing > takes long enough to delay GEOM probing past aac_start() completion. >=20 > I've attached a patch which resolves the problem on FreeBSD-current = (and 11.1), would anybody care to adopt it and shepherd it into SVN? >=20 > regards, >=20 > --John Hood >=20 > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to = "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@freebsd.org Fri Sep 8 15:32:07 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E34CEE1E1E3 for ; Fri, 8 Sep 2017 15:32:07 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B2BDF83B3D; Fri, 8 Sep 2017 15:32:07 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 356CF20FE2; Fri, 8 Sep 2017 11:32:06 -0400 (EDT) Received: from frontend1 ([10.202.2.160]) by compute6.internal (MEProxy); Fri, 08 Sep 2017 11:32:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsco.org; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-sender :x-me-sender:x-sasl-enc:x-sasl-enc; s=fm1; bh=xJ4z7KeYFlfM7bynHh s7QSrQG3KOASgTTXNZutgrNTM=; b=j3UYVMc39lCNrpsGfqr9hnFlJbUzvwK2NQ Gjkkn9E8HGaDbfAwmXmUb+GneGfekoEgA0jlQs/hPFk1pCrTRMrq/q5G+yLkXVfq gjHCpPm/XK/SjXgkSGUynyqS2+r26pODT9JoeqUD0lSlFygM+8tOI9a8Leytyh6h iCqIN1v+7m8AsFwBYt0JouBiAQf1aPcgCbLyLMdKWV9EUy8Rk79kk0QNT+uthNAI 2N4k6nWWBxPC7mB1qqLwySOAp2Vol1EB57a0R+h5herlHntiWcv9ANIe9GFte/iF fLvAff4eua9ykfOfUsDtMcu4lstfhTLMs+rrZsEBc1zhtXjLkw+w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-sender:x-me-sender:x-sasl-enc:x-sasl-enc; s= fm1; bh=xJ4z7KeYFlfM7bynHhs7QSrQG3KOASgTTXNZutgrNTM=; b=HTE1gxJi ceWjzn59V2UI1JGh4kfUCKXxtduHUojMXRaES7lQNa05X9QeA2dUcRvvZieuIhij TaSD3GLMu2zgIEYxT8wHHRA9wMJxk6+x3NaoxSqVs4dQ59dpk3nr4bhBWeEq5j9a rbGqCgDeJfkh4U+fI+gNbOHaxAuWmLKDrWinLFjp4qcyubtQOU3PLpwfyF+muy5t ef7rTRCFtIg7HFeO6XcrPSXe+gme6T3Gp3hsXtO+gVO1kzWGayDcrvkKCOOSyqMs pY6pV+nBrcUQQ7UWYvAtcmMB800PDv7kNNW9419xJekpRT1p+ZVa7ruQbWVBX+mi XG/blQGJn5BcsQ== X-ME-Sender: X-Sasl-enc: nFvhqH83AuYxO+vZc5tCqjQ8T57zlhOYo4JFq+A12365 1504884725 Received: from [10.199.7.12] (unknown [50.235.236.73]) by mail.messagingengine.com (Postfix) with ESMTPA id CBEA77E2FD; Fri, 8 Sep 2017 11:32:05 -0400 (EDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: GEOM probes fail on aac with EARLY_AP_STARTUP From: Scott Long In-Reply-To: <0c73b27e-d4ac-4003-81df-b2d9a0a63a81@glup.org> Date: Fri, 8 Sep 2017 11:32:05 -0400 Cc: FreeBSD-SCSI , John Baldwin Content-Transfer-Encoding: quoted-printable Message-Id: <0D188193-4562-483A-B123-6117CC65AF01@samsco.org> References: <20170908.151504.74703639.sthaug@nethelp.no> <0c73b27e-d4ac-4003-81df-b2d9a0a63a81@glup.org> To: john hood , sthaug@nethelp.no X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Sep 2017 15:32:08 -0000 > On Sep 8, 2017, at 11:06 AM, john hood wrote: >=20 > On 9/8/17 9:15 AM, sthaug@nethelp.no wrote: >> Is there any reason to believe this is limited to aac? I'm asking >> because your description is quite similar to boot problema I'm seeing >> with 11.1-STABLE on a server with mps (Avago) SCSI/SATA controller >> and SATA disks. I'm getting the dreaded "mounting from ... failed >> with error 19". 11.1-RELEASE seems to work okay but 11.1-STABLE does >> not. >=20 > Your issue isn't obviously directly related to mine. mps is a CAM = driver > and aac is not (at least not for normal block volumes), which makes = disk > probe/attach quite different. I also don't see an obvious driver flag > like the one in aac. >=20 It=E2=80=99s a different pattern in CAM, and CAM itself looks to be = safe. MPS and MPR have their own intrhooks, and on review just now I see a potential for a = similar problem to what you saw and that would match Steinar=E2=80=99s symptoms. = Please try the following patch: --- mps.c (revision 323314) +++ mps.c (working copy) @@ -1639,6 +1639,11 @@ mps_mapping_initialize(sc); mpssas_startup(sc); mps_unlock(sc); + + mps_dprint(sc, MPS_XINFO, "disestablish config intrhook\n"); + config_intrhook_disestablish(&sc->mps_ich); + sc->mps_ich.ich_arg =3D NULL; + mps_dprint(sc, MPS_INIT, "%s exit\n", __func__); } =20 Index: mps_sas.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- mps_sas.c (revision 323314) +++ mps_sas.c (working copy) @@ -3695,11 +3695,6 @@ mps_dprint(sc, MPS_FAULT, "Portenable failed\n"); =20 mps_free_command(sc, cm); - if (sc->mps_ich.ich_arg !=3D NULL) { - mps_dprint(sc, MPS_XINFO, "disestablish config = intrhook\n"); - config_intrhook_disestablish(&sc->mps_ich); - sc->mps_ich.ich_arg =3D NULL; - } =20 /* * Get WarpDrive info after discovery is complete but before the = scan