From owner-freebsd-amd64@FreeBSD.ORG Thu Mar 22 18:35:12 2012 Return-Path: Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 515E81065674; Thu, 22 Mar 2012 18:35:12 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 82F1A8FC14; Thu, 22 Mar 2012 18:35:11 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q2MIYxLM066267; Thu, 22 Mar 2012 20:34:59 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q2MIYw9m065434; Thu, 22 Mar 2012 20:34:58 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q2MIYwGb065433; Thu, 22 Mar 2012 20:34:58 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 22 Mar 2012 20:34:58 +0200 From: Konstantin Belousov To: Jeremiah Lott Message-ID: <20120322183458.GF2358@deviant.kiev.zoral.com.ua> References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="xkJEvbTYpfqRozZ0" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: alc@freebsd.org, freebsd-amd64@freebsd.org Subject: Re: page fault after wiring page X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Mar 2012 18:35:12 -0000 --xkJEvbTYpfqRozZ0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Mar 22, 2012 at 02:01:59PM -0400, Jeremiah Lott wrote: > We've been seeing some panics and deadlocks that appear to be related to = getting a page fault when accessing a page after it has been wired (on amd6= 4). All the ones we have seen are related to sysctl handlers that call sys= ctl_wire_old_buffer, then lock a mutex, then call SYSCTL_OUT. When it does= the copyout, it gets a page fault even though the page has been wired, som= etimes causing it to sleep while holding a mutex or recurse on non-recursab= le mutexes. Here are the two panics that are most easy to follow: >=20 > Sleeping thread (tid 100458, pid 2737) owns a non-sleepable lock > sched_switch() at 0xffffffff80603bf5 =3D sched_switch+0x146 > mi_switch() at 0xffffffff805e8e15 =3D mi_switch+0x183 > sleepq_switch() at 0xffffffff8061e6e7 =3D sleepq_switch+0xb1 > sleepq_wait() at 0xffffffff8061f0ea =3D sleepq_wait+0x3d > _sx_slock_hard() at 0xffffffff805e7ca7 =3D _sx_slock_hard+0x41d > _sx_slock() at 0xffffffff805e7e32 =3D _sx_slock+0x3d > vm_map_lookup() at 0xffffffff807909e4 =3D vm_map_lookup+0x54 > vm_fault() at 0xffffffff80786c20 =3D vm_fault+0x11c > trap_pfault() at 0xffffffff80844dd0 =3D trap_pfault+0xe1 > trap() at 0xffffffff80845286 =3D trap+0x337 > calltrap() at 0xffffffff80827f28 =3D calltrap+0x8 > --- trap 0xc, rip =3D 0xffffffff8084296b, rsp =3D 0xffffff811391e7e0, rbp= =3D 0xffffff811391e810 --- > copyout() at 0xffffffff8084296b =3D copyout+0x3b > sysctl_rtsock() at 0xffffffff806a5ef7 =3D sysctl_rtsock+0x499 > sysctl_root() at 0xffffffff805eab9e =3D sysctl_root+0xea > userland_sysctl() at 0xffffffff805eae6e =3D userland_sysctl+0x14f > sysctl() at 0xffffffff805eb258 =3D sysctl+0x9a > amd64_syscall() at 0xffffffff80844065 =3D amd64_syscall+0x145 > Xfast_syscall() at 0xffffffff8082821c =3D Xfast_syscall+0xfc >=20 > login: panic: _mtx_lock_sleep: recursed on non-recursive mutex process lo= ck @ ../../../amd64/amd64/trap.c:731 > cpuid =3D 0 > KDB: stack backtrace: > gdb_trace_self_wrapper() at 0xffffffff8057e7ea =3D gdb_trace_self_wrapper= +0x2a > kdb_backtrace() at 0xffffffff8062ffdc =3D kdb_backtrace+0x37 > panic() at 0xffffffff805f89ca =3D panic+0x2ad > _mtx_lock_flags() at 0xffffffff805e9376 =3D _mtx_lock_flags > _mtx_lock_flags() at 0xffffffff805e9417 =3D _mtx_lock_flags+0xa1 > trap_pfault() at 0xffffffff80880450 =3D trap_pfault+0xa1 > trap() at 0xffffffff80880ac7 =3D trap+0x4b8 > calltrap() at 0xffffffff80861af8 =3D calltrap+0x8 > --- trap 0xc, rip =3D 0xffffffff8087de8b, rsp =3D 0xffffff807b7e9410, rbp= =3D 0xffffff807b7e9440 --- > copyout() at 0xffffffff8087de8b =3D copyout+0x3b > sysctl_out_proc() at 0xffffffff805ed305 =3D sysctl_out_proc+0x16c > sysctl_root() at 0xffffffff80606141 =3D sysctl_root+0x13a > userland_sysctl() at 0xffffffff8060640a =3D userland_sysctl+0x14f > sysctl() at 0xffffffff806067f8 =3D sysctl+0x9a > amd64_syscall() at 0xffffffff8087f635 =3D amd64_syscall+0x145 > Xfast_syscall() at 0xffffffff80861dec =3D Xfast_syscall+0xfc > --- syscall (202, FreeBSD ELF64, sysctl), rip =3D 0x801c12b0c, rsp =3D 0x= 7fffffffb768, rbp =3D 0x7fffffffb7b0 --- > --- curthread 0xffffff000465b000, tid 100142 >=20 > After doing some instrumentation, I think I've figured out what is causin= g this. It seems that when I am wiring the page, in some situations the pa= ge table entry is being changed from read-only -> read-write as well as bei= ng wired. I haven't figured out the exact scenario that causes this, but I= can definitely see it in my added trace. Here is an example page table en= try transition I am seeing in pmap_enter that is called as a result of the = wire: >=20 > pmap_enter: origpte: 80000000ad201425 newpte: 80000000ad201607 >=20 > This means that we are setting PG_W (wired) and PG_RW (read/write) in thi= s pmap_enter operation. Everytime I saw a page-fault after wiring it was i= mmediately preceded by a transition like this (in the cases that did not pa= ge fault, the page table entry already had PG_RW set). This made me suspec= t that a read-only version of the page table entry was cached in the TLB. = I noticed we invalidate in some situations in pmap_enter, but this transiti= on is not one of them. I was able to eliminate the panics by making this c= hange: >=20 > diff --git a/src/sys/amd64/amd64/pmap.c b/src/sys/amd64/amd64/pmap.c > --- a/src/sys/amd64/amd64/pmap.c > +++ b/src/sys/amd64/amd64/pmap.c > @@ -3251,6 +3251,11 @@ validate: > if (opa !=3D VM_PAGE_TO_PHYS(m) || ((orig= pte & > PG_NX) =3D=3D 0 && (newpte & PG_NX))) > invlva =3D TRUE; > + if ((newpte & PG_W) && > + ((origpte & PG_RW) =3D=3D 0) && > + (newpte & PG_RW)) { > + invlva =3D TRUE; > + } > } > if ((origpte & (PG_M | PG_RW)) =3D=3D (PG_M | PG_= RW)) { > if ((origpte & PG_MANAGED) !=3D 0) >=20 > I wanted to see if anyone has seen issues in this area, and if this fix s= eems appropriate. I'm running 8.2, but I didn't see any obvious changes to= pmap stuff in head which would change this behavior. Thanks for any feedb= ack, >=20 > Jeremiah Lott > Avere Systems This should be the issue fixed in the r233291. --xkJEvbTYpfqRozZ0 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk9rcNAACgkQC3+MBN1Mb4hyDgCcC4HAZSGcRJOYwxRV08Rguk3X KmAAn207yJzVN/FORpBPyDzFb9LNaKb8 =59Ts -----END PGP SIGNATURE----- --xkJEvbTYpfqRozZ0--