From owner-freebsd-amd64@FreeBSD.ORG Thu Mar 22 18:02:08 2012 Return-Path: Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 669BA106567D; Thu, 22 Mar 2012 18:02:08 +0000 (UTC) (envelope-from jlott@averesystems.com) Received: from zimbra.averesystems.com (75-149-8-245-Pennsylvania.hfc.comcastbusiness.net [75.149.8.245]) by mx1.freebsd.org (Postfix) with ESMTP id F324E8FC20; Thu, 22 Mar 2012 18:02:07 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by zimbra.averesystems.com (Postfix) with ESMTP id 91894446005; Thu, 22 Mar 2012 14:05:15 -0400 (EDT) X-Virus-Scanned: amavisd-new at averesystems.com Received: from zimbra.averesystems.com ([127.0.0.1]) by localhost (zimbra.averesystems.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SlBK-nbBOAtJ; Thu, 22 Mar 2012 14:05:14 -0400 (EDT) Received: from jlott-mac.arriad.com (fw.arriad.com [10.0.0.16]) by zimbra.averesystems.com (Postfix) with ESMTPSA id 1064F446004; Thu, 22 Mar 2012 14:05:14 -0400 (EDT) From: Jeremiah Lott Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Thu, 22 Mar 2012 14:01:59 -0400 Message-Id: To: freebsd-amd64@freebsd.org Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) Cc: alc@freebsd.org, kib@freebsd.org Subject: page fault after wiring page X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Mar 2012 18:02:08 -0000 We've been seeing some panics and deadlocks that appear to be related to = getting a page fault when accessing a page after it has been wired (on = amd64). All the ones we have seen are related to sysctl handlers that = call sysctl_wire_old_buffer, then lock a mutex, then call SYSCTL_OUT. = When it does the copyout, it gets a page fault even though the page has = been wired, sometimes causing it to sleep while holding a mutex or = recurse on non-recursable mutexes. Here are the two panics that are = most easy to follow: Sleeping thread (tid 100458, pid 2737) owns a non-sleepable lock sched_switch() at 0xffffffff80603bf5 =3D sched_switch+0x146 mi_switch() at 0xffffffff805e8e15 =3D mi_switch+0x183 sleepq_switch() at 0xffffffff8061e6e7 =3D sleepq_switch+0xb1 sleepq_wait() at 0xffffffff8061f0ea =3D sleepq_wait+0x3d _sx_slock_hard() at 0xffffffff805e7ca7 =3D _sx_slock_hard+0x41d _sx_slock() at 0xffffffff805e7e32 =3D _sx_slock+0x3d vm_map_lookup() at 0xffffffff807909e4 =3D vm_map_lookup+0x54 vm_fault() at 0xffffffff80786c20 =3D vm_fault+0x11c trap_pfault() at 0xffffffff80844dd0 =3D trap_pfault+0xe1 trap() at 0xffffffff80845286 =3D trap+0x337 calltrap() at 0xffffffff80827f28 =3D calltrap+0x8 --- trap 0xc, rip =3D 0xffffffff8084296b, rsp =3D 0xffffff811391e7e0, = rbp =3D 0xffffff811391e810 --- copyout() at 0xffffffff8084296b =3D copyout+0x3b sysctl_rtsock() at 0xffffffff806a5ef7 =3D sysctl_rtsock+0x499 sysctl_root() at 0xffffffff805eab9e =3D sysctl_root+0xea userland_sysctl() at 0xffffffff805eae6e =3D userland_sysctl+0x14f sysctl() at 0xffffffff805eb258 =3D sysctl+0x9a amd64_syscall() at 0xffffffff80844065 =3D amd64_syscall+0x145 Xfast_syscall() at 0xffffffff8082821c =3D Xfast_syscall+0xfc login: panic: _mtx_lock_sleep: recursed on non-recursive mutex process = lock @ ../../../amd64/amd64/trap.c:731 cpuid =3D 0 KDB: stack backtrace: gdb_trace_self_wrapper() at 0xffffffff8057e7ea =3D = gdb_trace_self_wrapper+0x2a kdb_backtrace() at 0xffffffff8062ffdc =3D kdb_backtrace+0x37 panic() at 0xffffffff805f89ca =3D panic+0x2ad _mtx_lock_flags() at 0xffffffff805e9376 =3D _mtx_lock_flags _mtx_lock_flags() at 0xffffffff805e9417 =3D _mtx_lock_flags+0xa1 trap_pfault() at 0xffffffff80880450 =3D trap_pfault+0xa1 trap() at 0xffffffff80880ac7 =3D trap+0x4b8 calltrap() at 0xffffffff80861af8 =3D calltrap+0x8 --- trap 0xc, rip =3D 0xffffffff8087de8b, rsp =3D 0xffffff807b7e9410, = rbp =3D 0xffffff807b7e9440 --- copyout() at 0xffffffff8087de8b =3D copyout+0x3b sysctl_out_proc() at 0xffffffff805ed305 =3D sysctl_out_proc+0x16c sysctl_root() at 0xffffffff80606141 =3D sysctl_root+0x13a userland_sysctl() at 0xffffffff8060640a =3D userland_sysctl+0x14f sysctl() at 0xffffffff806067f8 =3D sysctl+0x9a amd64_syscall() at 0xffffffff8087f635 =3D amd64_syscall+0x145 Xfast_syscall() at 0xffffffff80861dec =3D Xfast_syscall+0xfc --- syscall (202, FreeBSD ELF64, sysctl), rip =3D 0x801c12b0c, rsp =3D = 0x7fffffffb768, rbp =3D 0x7fffffffb7b0 --- --- curthread 0xffffff000465b000, tid 100142 After doing some instrumentation, I think I've figured out what is = causing this. It seems that when I am wiring the page, in some = situations the page table entry is being changed from read-only -> = read-write as well as being wired. I haven't figured out the exact = scenario that causes this, but I can definitely see it in my added = trace. Here is an example page table entry transition I am seeing in = pmap_enter that is called as a result of the wire: pmap_enter: origpte: 80000000ad201425 newpte: 80000000ad201607 This means that we are setting PG_W (wired) and PG_RW (read/write) in = this pmap_enter operation. Everytime I saw a page-fault after wiring it = was immediately preceded by a transition like this (in the cases that = did not page fault, the page table entry already had PG_RW set). This = made me suspect that a read-only version of the page table entry was = cached in the TLB. I noticed we invalidate in some situations in = pmap_enter, but this transition is not one of them. I was able to = eliminate the panics by making this change: diff --git a/src/sys/amd64/amd64/pmap.c b/src/sys/amd64/amd64/pmap.c --- a/src/sys/amd64/amd64/pmap.c +++ b/src/sys/amd64/amd64/pmap.c @@ -3251,6 +3251,11 @@ validate: if (opa !=3D VM_PAGE_TO_PHYS(m) || = ((origpte & PG_NX) =3D=3D 0 && (newpte & = PG_NX))) invlva =3D TRUE; + if ((newpte & PG_W) && + ((origpte & PG_RW) =3D=3D 0) && + (newpte & PG_RW)) { + invlva =3D TRUE; + } } if ((origpte & (PG_M | PG_RW)) =3D=3D (PG_M | = PG_RW)) { if ((origpte & PG_MANAGED) !=3D 0) I wanted to see if anyone has seen issues in this area, and if this fix = seems appropriate. I'm running 8.2, but I didn't see any obvious = changes to pmap stuff in head which would change this behavior. Thanks = for any feedback, Jeremiah Lott Avere Systems=