From owner-freebsd-hackers  Mon Dec 30  7:48:12 2002
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 825CE37B401
	for <freebsd-hackers@freebsd.org>; Mon, 30 Dec 2002 07:48:08 -0800 (PST)
Received: from shaft.noc.clara.net (shaft.noc.clara.net [195.8.70.216])
	by mx1.FreeBSD.org (Postfix) with ESMTP id C8B5743EA9
	for <freebsd-hackers@freebsd.org>; Mon, 30 Dec 2002 07:48:07 -0800 (PST)
	(envelope-from mivens@clara.net)
Received: by shaft.noc.clara.net (Postfix, from userid 100)
	id 6AA9F46D6; Mon, 30 Dec 2002 15:47:56 +0000 (GMT)
Date: Mon, 30 Dec 2002 15:47:56 +0000
From: Mark Ivens <mark@uk.clara.net>
To: freebsd-hackers@freebsd.org
Subject: deadlock: procs stuck in vmpfw
Message-ID: <20021230154756.GD26089@uk.clara.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4i
X-NCC-RegID: uk.claranet
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

Hi,

On one of our 4.3-STABLE boxen running apache, we're seeing apache processes
getting stuck in disk wait, wait channel vmpfw. Only way out is to reboot.

The box is a NFS client with a jail, including the apache documentroot
mounted on a Network Appliance. 

I've got a stack backtrace of several of the stuck proccesses from the
live kernel, although I didn't force a coredump. A couple copied below:

(kgdb) proc 21846
(kgdb) bt
#0  mi_switch () at ../../kern/kern_synch.c:858
#1  0xc0153e6d in tsleep (ident=0xc12d5ee0, priority=4, wmesg=0xc022d8cc "vmpfw", timo=0) at ../../kern/kern_synch.c:467
#2  0xc01d1152 in vm_fault (map=0xea69bd80, vaddr=674340864, fault_type=1 '\001', fault_flags=0) at ../../vm/vm_page.h:569
#3  0xc01fda4e in trap_pfault (frame=0xea7f8db0, usermode=0, eva=674340862) at ../../i386/i386/trap.c:824
#4  0xc01fd69f in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = -550698992, tf_edi = -1037885128, tf_esi = 674340862, tf_ebp = -360739300, tf_isp = -360739364, tf_ebx = 2048, 
      tf_edx = 674342598, tf_ecx = 434, tf_eax = -360747008, tf_trapno = 12, tf_err = 0, tf_eip = -1071658167, tf_cs = 8, tf_eflags = 66054, tf_esp = -360739060, 
      tf_ss = -360739116}) at ../../i386/i386/trap.c:443
#5  0xc01fcb49 in generic_copyin ()
#6  0xc016e518 in sosend (so=0xdf2df000, addr=0x0, uio=0xea7f8f0c, top=0x0, control=0x0, flags=0, p=0xea76ad80) at ../../kern/uipc_socket.c:585
#7  0xc01626b8 in soo_write (fp=0xc419a900, uio=0xea7f8f0c, cred=0xc405db80, flags=0, p=0xea76ad80) at ../../kern/sys_socket.c:81
#8  0xc015f420 in writev (p=0xea76ad80, uap=0xea7f8f80) at ../../sys/file.h:162
#9  0xc01fe08d in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = -1078001617, tf_edi = 0, tf_esi = 2, tf_ebp = -1077938516, tf_isp = -360738860, tf_ebx = 674183052, 
      tf_edx = 0, tf_ecx = 32768, tf_eax = 121, tf_trapno = 12, tf_err = 2, tf_eip = 672109608, tf_cs = 31, tf_eflags = 663, tf_esp = -1077938560, tf_ss = 47})
    at ../../i386/i386/trap.c:1150
#10 0xc01f2e85 in Xint0x80_syscall ()

(kgdb) proc 26316
(kgdb) back
#0  mi_switch () at ../../kern/kern_synch.c:858
#1  0xc0153e6d in tsleep (ident=0xc08822e0, priority=4, wmesg=0xc022d8cc "vmpfw", timo=0) at ../../kern/kern_synch.c:467
#2  0xc01d1152 in vm_fault (map=0xea69df40, vaddr=674336768, fault_type=1 '\001', fault_flags=0) at ../../vm/vm_page.h:569
#3  0xc01fda4e in trap_pfault (frame=0xe53bddb0, usermode=0, eva=674340862) at ../../i386/i386/trap.c:824
#4  0xc01fd69f in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = -550567920, tf_edi = -1041225416, tf_esi = 674340862, tf_ebp = -449061348, tf_isp = -449061412, tf_ebx = 2048, 
      tf_edx = 674342598, tf_ecx = 434, tf_eax = -449069056, tf_trapno = 12, tf_err = 0, tf_eip = -1071658167, tf_cs = 8, tf_eflags = 66054, tf_esp = -449061108, 
      tf_ss = -449061164}) at ../../i386/i386/trap.c:443
#5  0xc01fcb49 in generic_copyin ()
#6  0xc016e518 in sosend (so=0xdf2f4c00, addr=0x0, uio=0xe53bdf0c, top=0x0, control=0x0, flags=0, p=0xe53a82a0) at ../../kern/uipc_socket.c:585
#7  0xc01626b8 in soo_write (fp=0xc4028440, uio=0xe53bdf0c, cred=0xc45f1e00, flags=0, p=0xe53a82a0) at ../../kern/sys_socket.c:81
#8  0xc015f420 in writev (p=0xe53a82a0, uap=0xe53bdf80) at ../../sys/file.h:162
#9  0xc01fe08d in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = -1078001617, tf_edi = 0, tf_esi = 2, tf_ebp = -1077938516, tf_isp = -449060908, tf_ebx = 674183052, 
      tf_edx = 0, tf_ecx = 32768, tf_eax = 121, tf_trapno = 7, tf_err = 2, tf_eip = 672109608, tf_cs = 31, tf_eflags = 663, tf_esp = -1077938560, tf_ss = 47})
    at ../../i386/i386/trap.c:1150
#10 0xc01f2e85 in Xint0x80_syscall ()

USER       PID %CPU %MEM   VSZ  RSS  TT  STAT STARTED      TIME COMMAND            UID   PID  PPID CPU PRI NI   VSZ  RSS WCHAN  STAT  TT       TIME COMMAND
claranet 26316  0.0  0.3  4148 2636  ??  DL    1:43PM   0:10.68 /usr/local/sbin/  5000 26316 98582   1 -18  0  4148 2636 vmpfw  DL    ??    0:10.68  /usr/local/sbin/httpd -DMOD_FP -DMOD_SSL
claranet 21846  0.0  0.2  4020 2524  ??  DL    1:07PM   0:23.26 /usr/local/sbin/  5000 21846 98582   1 -18  0  4020 2524 vmpfw  DL    ??    0:23.26  /usr/local/sbin/httpd -DMOD_FP -DMOD_SSL
claranet 18675  0.0  0.3  4148 2624  ??  DL   12:41PM   0:19.35 /usr/local/sbin/  5000 18675 98582   3 -18  0  4148 2624 vmpfw  DL    ??    0:19.35  /usr/local/sbin/httpd -DMOD_FP -DMOD_SSL
etc etc.

Background is that all the processes belong to one particular customer
of ours and an lsof shows that each process has one particular jpeg
open.  The customer re-uploads a new version of the file (it's from a
webcam) every couple of minutes so I'm guessing perhaps that perhaps
this is causing some form of deadlock.

Any suggestions on what this is or extra info I can provide to help
resolve this would be gratefully received. I'm afraid I haven't
managed to find any obviously relevant PR's.

Mark

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message