From owner-freebsd-hackers Mon Dec 30 7:48:12 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 825CE37B401 for ; Mon, 30 Dec 2002 07:48:08 -0800 (PST) Received: from shaft.noc.clara.net (shaft.noc.clara.net [195.8.70.216]) by mx1.FreeBSD.org (Postfix) with ESMTP id C8B5743EA9 for ; Mon, 30 Dec 2002 07:48:07 -0800 (PST) (envelope-from mivens@clara.net) Received: by shaft.noc.clara.net (Postfix, from userid 100) id 6AA9F46D6; Mon, 30 Dec 2002 15:47:56 +0000 (GMT) Date: Mon, 30 Dec 2002 15:47:56 +0000 From: Mark Ivens To: freebsd-hackers@freebsd.org Subject: deadlock: procs stuck in vmpfw Message-ID: <20021230154756.GD26089@uk.clara.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4i X-NCC-RegID: uk.claranet Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Hi, On one of our 4.3-STABLE boxen running apache, we're seeing apache processes getting stuck in disk wait, wait channel vmpfw. Only way out is to reboot. The box is a NFS client with a jail, including the apache documentroot mounted on a Network Appliance. I've got a stack backtrace of several of the stuck proccesses from the live kernel, although I didn't force a coredump. A couple copied below: (kgdb) proc 21846 (kgdb) bt #0 mi_switch () at ../../kern/kern_synch.c:858 #1 0xc0153e6d in tsleep (ident=0xc12d5ee0, priority=4, wmesg=0xc022d8cc "vmpfw", timo=0) at ../../kern/kern_synch.c:467 #2 0xc01d1152 in vm_fault (map=0xea69bd80, vaddr=674340864, fault_type=1 '\001', fault_flags=0) at ../../vm/vm_page.h:569 #3 0xc01fda4e in trap_pfault (frame=0xea7f8db0, usermode=0, eva=674340862) at ../../i386/i386/trap.c:824 #4 0xc01fd69f in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = -550698992, tf_edi = -1037885128, tf_esi = 674340862, tf_ebp = -360739300, tf_isp = -360739364, tf_ebx = 2048, tf_edx = 674342598, tf_ecx = 434, tf_eax = -360747008, tf_trapno = 12, tf_err = 0, tf_eip = -1071658167, tf_cs = 8, tf_eflags = 66054, tf_esp = -360739060, tf_ss = -360739116}) at ../../i386/i386/trap.c:443 #5 0xc01fcb49 in generic_copyin () #6 0xc016e518 in sosend (so=0xdf2df000, addr=0x0, uio=0xea7f8f0c, top=0x0, control=0x0, flags=0, p=0xea76ad80) at ../../kern/uipc_socket.c:585 #7 0xc01626b8 in soo_write (fp=0xc419a900, uio=0xea7f8f0c, cred=0xc405db80, flags=0, p=0xea76ad80) at ../../kern/sys_socket.c:81 #8 0xc015f420 in writev (p=0xea76ad80, uap=0xea7f8f80) at ../../sys/file.h:162 #9 0xc01fe08d in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = -1078001617, tf_edi = 0, tf_esi = 2, tf_ebp = -1077938516, tf_isp = -360738860, tf_ebx = 674183052, tf_edx = 0, tf_ecx = 32768, tf_eax = 121, tf_trapno = 12, tf_err = 2, tf_eip = 672109608, tf_cs = 31, tf_eflags = 663, tf_esp = -1077938560, tf_ss = 47}) at ../../i386/i386/trap.c:1150 #10 0xc01f2e85 in Xint0x80_syscall () (kgdb) proc 26316 (kgdb) back #0 mi_switch () at ../../kern/kern_synch.c:858 #1 0xc0153e6d in tsleep (ident=0xc08822e0, priority=4, wmesg=0xc022d8cc "vmpfw", timo=0) at ../../kern/kern_synch.c:467 #2 0xc01d1152 in vm_fault (map=0xea69df40, vaddr=674336768, fault_type=1 '\001', fault_flags=0) at ../../vm/vm_page.h:569 #3 0xc01fda4e in trap_pfault (frame=0xe53bddb0, usermode=0, eva=674340862) at ../../i386/i386/trap.c:824 #4 0xc01fd69f in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = -550567920, tf_edi = -1041225416, tf_esi = 674340862, tf_ebp = -449061348, tf_isp = -449061412, tf_ebx = 2048, tf_edx = 674342598, tf_ecx = 434, tf_eax = -449069056, tf_trapno = 12, tf_err = 0, tf_eip = -1071658167, tf_cs = 8, tf_eflags = 66054, tf_esp = -449061108, tf_ss = -449061164}) at ../../i386/i386/trap.c:443 #5 0xc01fcb49 in generic_copyin () #6 0xc016e518 in sosend (so=0xdf2f4c00, addr=0x0, uio=0xe53bdf0c, top=0x0, control=0x0, flags=0, p=0xe53a82a0) at ../../kern/uipc_socket.c:585 #7 0xc01626b8 in soo_write (fp=0xc4028440, uio=0xe53bdf0c, cred=0xc45f1e00, flags=0, p=0xe53a82a0) at ../../kern/sys_socket.c:81 #8 0xc015f420 in writev (p=0xe53a82a0, uap=0xe53bdf80) at ../../sys/file.h:162 #9 0xc01fe08d in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = -1078001617, tf_edi = 0, tf_esi = 2, tf_ebp = -1077938516, tf_isp = -449060908, tf_ebx = 674183052, tf_edx = 0, tf_ecx = 32768, tf_eax = 121, tf_trapno = 7, tf_err = 2, tf_eip = 672109608, tf_cs = 31, tf_eflags = 663, tf_esp = -1077938560, tf_ss = 47}) at ../../i386/i386/trap.c:1150 #10 0xc01f2e85 in Xint0x80_syscall () USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND claranet 26316 0.0 0.3 4148 2636 ?? DL 1:43PM 0:10.68 /usr/local/sbin/ 5000 26316 98582 1 -18 0 4148 2636 vmpfw DL ?? 0:10.68 /usr/local/sbin/httpd -DMOD_FP -DMOD_SSL claranet 21846 0.0 0.2 4020 2524 ?? DL 1:07PM 0:23.26 /usr/local/sbin/ 5000 21846 98582 1 -18 0 4020 2524 vmpfw DL ?? 0:23.26 /usr/local/sbin/httpd -DMOD_FP -DMOD_SSL claranet 18675 0.0 0.3 4148 2624 ?? DL 12:41PM 0:19.35 /usr/local/sbin/ 5000 18675 98582 3 -18 0 4148 2624 vmpfw DL ?? 0:19.35 /usr/local/sbin/httpd -DMOD_FP -DMOD_SSL etc etc. Background is that all the processes belong to one particular customer of ours and an lsof shows that each process has one particular jpeg open. The customer re-uploads a new version of the file (it's from a webcam) every couple of minutes so I'm guessing perhaps that perhaps this is causing some form of deadlock. Any suggestions on what this is or extra info I can provide to help resolve this would be gratefully received. I'm afraid I haven't managed to find any obviously relevant PR's. Mark To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message