From owner-freebsd-current@FreeBSD.ORG Fri Oct 22 19:42:34 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 387D816A4CE; Fri, 22 Oct 2004 19:42:34 +0000 (GMT) Received: from cicero1.cybercity.dk (cicero1.cybercity.dk [212.242.40.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 62EDA43D5D; Fri, 22 Oct 2004 19:42:33 +0000 (GMT) (envelope-from tom@motd.dk) Received: from bart.motd.dk (port95.ds1-ro.adsl.cybercity.dk [212.242.60.98]) by cicero1.cybercity.dk (Postfix) with ESMTP id 629F87E2DE7; Fri, 22 Oct 2004 21:42:31 +0200 (CEST) Received: from localhost (localhost.motd.dk [127.0.0.1]) by bart.motd.dk (Postfix) with ESMTP id 0D2D162F0; Fri, 22 Oct 2004 21:40:02 +0200 (CEST) Received: from bart.motd.dk ([127.0.0.1]) by localhost (bart.motd.dk [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 00678-01; Fri, 22 Oct 2004 21:39:50 +0200 (CEST) Received: from home03 (unknown [192.168.10.3]) by bart.motd.dk (Postfix) with ESMTP id 3032562EF; Fri, 22 Oct 2004 21:39:50 +0200 (CEST) From: "Tom Jensen" To: "'Robert Watson'" Date: Fri, 22 Oct 2004 21:42:17 +0200 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0180_01C4B880.00BBF860" X-Mailer: Microsoft Office Outlook, Build 11.0.6353 In-Reply-To: X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1441 Thread-Index: AcS4FJFPtiposmxTSFGkIlVammFSpwAWlWeA Message-Id: <20041022193950.3032562EF@bart.motd.dk> X-Virus-Scanned: by amavisd-new at motd.dk cc: freebsd-current@freebsd.org Subject: RE: Machine hangs(Beta7), only reset button works X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Oct 2004 19:42:34 -0000 This is a multi-part message in MIME format. ------=_NextPart_000_0180_01C4B880.00BBF860 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Ok, did a show lockedvnods as requested, any thing else you want me to try - Tom -----Original Message----- From: owner-freebsd-current@freebsd.org [mailto:owner-freebsd-current@freebsd.org] On Behalf Of Robert Watson Sent: 22. oktober 2004 10:55 To: Tom Jensen Cc: freebsd-current@freebsd.org Subject: RE: Machine hangs(Beta7), only reset button works On Fri, 22 Oct 2004, Tom Jensen wrote: > Ok, I managed to break into the with a break over serial console, > attached is some info from KDB (please note that I have no knowledge > about using the debugger) > > I also noticed that I don't have "makeoptions DEBUG=-g" in my kernel > conf so I have no debug kernel. I rebuild my kernel ASAP so I'm able > to provide more info sometime tomorrow. > > Please let me know if there is more info needed. Great, into the debugger is good news. Looking at the thread wait states, it looks like you might have a vnode deadlock going on. It would be useful if you could use the "show lockedvnods" command to show what vnode locks are held. If you recompile your kernel with DEBUG_LOCKS (and compile all modules in, as this produces a kernel that is ABI-incompatible with most modules), you will get extra debugging information when you use that command (it will say what locks were acquired where, not just what locks are held). It looks like you're using SMBfs could you mention a little about how that's in use, and whether things like home directories, etc, are mounted that way? Is it an element of the system you could remove during testing to see if the problem goes away? Also, are you using NFS or other distributed file systems? Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research > > - Tom > > -----Original Message----- > From: owner-freebsd-current@freebsd.org > [mailto:owner-freebsd-current@freebsd.org] On Behalf Of Robert Watson > Sent: 21. oktober 2004 13:46 > To: Tom Jensen > Cc: freebsd-current@freebsd.org > Subject: Re: Machine hangs(Beta7), only reset button works > > > On Thu, 21 Oct 2004, Tom Jensen wrote: > > > I've been seeing a pretty strange problem lately with my server. > > > > The box completely freeze typically when it's done running the first > > part of my backup script, resulting in no possibility to login on > > the console or by SSH, the freeze even happens when I'm sitting in a > > terminal and working. > > > > There is no indication in log files etc. about what's causing the > > problem and it's not breaking into debugger either :-( > > This should probably be in debugging lore somewhere, but I've observed > that it's often possible to break into the debugger using a break over > serial console when it's not possible to break in using syscons. This > is because syscons requires the Giant lock, so if the freeze happens > because a thread is spinning while holding Giant, you can't get in. > This needs to be fixed, but hasn't yet been fixed, so in the mean time > often useful advice is to use a serial console to generate the break. > > If you still can't get into the debugger, you might try some of the > various watchdog drivers -- some hardware comes with built in watchdog > parts, such as ichwd(4), or you could try options MP_WATCHDOG on an > SMP box if you're willing to dedicate a CPU to running as a watchdog for the other cpu(s). > > Robert N M Watson FreeBSD Core Team, TrustedBSD Projects > robert@fledge.watson.org Principal Research Scientist, McAfee Research > > > > > > The backup script is really simple, creating a .tgz file of a given > > directory, mounting a windows share (mount_smbfs) and copying the > > file. The script is run by cron six times (start at the same time) > > in six different directories, this results in the box freezes after > > the tar processes finishes. > > > > Attached is the dmesg.boot and the latest top, don't know if it's > > any use but it's seems rather strange that a lot of processes are in > > a STATE usf (not sure what this means but I don't sees this when the > > box is running > > normally) > > > > The kernel is mostly a generic with the following modifications: > > > > options IPFIREWALL > > options IPFIREWALL_VERBOSE > > options IPFIREWALL_VERBOSE_LIMIT=400 > > options IPDIVERT > > options IPSEC > > options IPSEC_ESP > > options IPSEC_DEBUG > > device ath > > device ath_hal > > options KDB > > options DDB > > > > bash-2.05b# uname -a > > FreeBSD bart.motd.dk 5.3-BETA7 FreeBSD 5.3-BETA7 #6: Tue Oct 19 00:36:59 > > CEST 2004 root@bart.motd.dk:/usr/obj/usr/src/sys/GW i386 > > > > Any more info needed please let me know. > > > > Best regards > > > > - Tom > > > > > > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" ------=_NextPart_000_0180_01C4B880.00BBF860 Content-Type: text/plain; name="debug_info_2.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="debug_info_2.txt" KDB: enter: Line break on console [thread 100003] Stopped at kdb_enter+0x2b: nop db> trace kdb_enter(c08489ef) at kdb_enter+0x2b siointr1(c1487800,c092eb20,0,c08487b3,6ad) at siointr1+0xce siointr(c1487800) at siointr+0x21 intr_execute_handlers(c08cff00,c9bc1ca4,0,c14a5000,c14a5018) at = intr_execute_handlers+0xa9 atpic_handle_intr(4) at atpic_handle_intr+0x92 Xatpic_intr4() at Xatpic_intr4+0x20 --- interrupt, eip =3D 0xc0a4c5bd, esp =3D 0xc9bc1ce8, ebp =3D = 0xc9bc1ce8 --- acpi_cpu_c1(c08f08a0,1,1,1,c06009f8) at acpi_cpu_c1+0x5 acpi_cpu_idle(c9bc1d1c,c0600a55,c13c3c5c,c9bc1d34,c0600838) at = acpi_cpu_idle+0xd9 cpu_idle(c13c3c5c,c9bc1d34,c0600838,0,c9bc1d48) at cpu_idle+0x28 idle_proc(0,c9bc1d48,0,c06009f8,0) at idle_proc+0x5d fork_exit(c06009f8,0,c9bc1d48) at fork_exit+0xa4 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip =3D 0, esp =3D 0xc9bc1d7c, ebp =3D 0 --- db> ps pid proc uarea uid ppid pgrp flag stat wmesg wchan = cmd 1227 c1c00e20 cedd7000 0 697 697 4000000 [SLPQ ufs = 0xc15bfc64][SLP] couriertcpd .. 1222 c1d31e20 cee5c000 70 612 612 0000000 [SLPQ ufs = 0xc15bfc64][SLP] postgres 1221 c1d7c1c4 ceea8000 0 1174 1160 0004000 [SLPQ ufs = 0xc16088e0][SLP] mount_smbfs 1220 c1d7c54c ceeaa000 0 1175 1159 0004000 [SLPQ ufs = 0xc15bfc64][SLP] mount_smbfs 1219 c1c008d4 cedd4000 0 1180 1163 0004000 [SLPQ ufs = 0xc1942a0c][SLP] mount_smbfs 1218 c1d2d710 cee4f000 0 1176 1158 0004000 [SLPQ ufs = 0xc1630b38][SLP] mount_smbfs .. db> show lockedvnods Locked vnodes 0xc15bfbb8: tag ufs, type VDIR, usecount 182, writecount 0, refcount 1, = flags (VV_ROOT|VV_OBJBUF), lock type ufs: EXCL (count 1) by thread = 0xc1d2f960 (pid 1221) with 14 pending ino 2, on dev ad0s1a (4, 15) 0xc1608834: tag ufs, type VDIR, usecount 2, writecount 0, refcount 1, = lock type ufs: EXCL (count 1) by thread 0xc1d2eaf0 (pid 1218) with 1 = pending ino 24771, on dev ad0s1a (4, 15) 0xc1630a8c: tag ufs, type VDIR, usecount 2, writecount 0, refcount 1, = lock type ufs: EXCL (count 1) by thread 0xc1c02320 (pid 1219) with 1 = pending ino 323, on dev ad0s1a (4, 15) 0xc1942960: tag ufs, type VREG, usecount 2, writecount 0, refcount 0, = flags (VV_OBJBUF), lock type ufs: EXCL (count 1) by thread 0xc1d2eaf0 = (pid 1218) with 1 pending ino 1170, on dev ad0s1a (4, 15) db> trace 1221 sched_switch(c1d2f960,0,1) at sched_switch+0x16f mi_switch(1,0) at mi_switch+0x264 sleepq_switch(c16088e0,cee437dc,c0618cb1,c16088e0,0) at = sleepq_switch+0xe0 sleepq_wait(c16088e0,0,0,0,c1608834) at sleepq_wait+0x30 msleep(c16088e0,c08f18ec,50,c082ffdf,0) at msleep+0x2f1 acquire(cee43834,1000040,600,c1d2f960,0) at acquire+0x9e debuglockmgr(c16088e0,1010002,c1608834,c1d2f960,c083145e) at = debuglockmgr+0x3de ufs_lock(cee4386c,cee43888,c066ccf3,cee4386c,c08d28e0) at ufs_lock+0x4d ufs_vnoperate(cee4386c) at ufs_vnoperate+0x13 debug_vn_lock(c1608834,10002,c1d2f960,c0831d13,7ec) at = debug_vn_lock+0xdb vget(c1608834,2,c1d2f960,be,c1d2f960) at vget+0xd3 vfs_cache_lookup(cee4394c,cee43968,c065cb4b,cee4394c,c1d2f960) at = vfs_cache_lookup+0x1dd ufs_vnoperate(cee4394c) at ufs_vnoperate+0x13 lookup(cee43c18,c1d2f960,c08ff500,1,0) at lookup+0x2e3 namei(cee43c18,0,c1d2f960,0,0) at namei+0x204 vn_open_cred(cee43c18,cee43af4,0,c1bcc180,ffffffff) at = vn_open_cred+0x238 vn_open(cee43c18,cee43af4,0,ffffffff) at vn_open+0x1e linker_hints_lookup(c0888600,c,c1508400,5,0) at = linker_hints_lookup+0x109 linker_search_module(c1508400,5,0,400,0) at linker_search_module+0x43 linker_load_module(0,c1508400,0,0,cee43cdc) at linker_load_module+0x80 kldload(c1d2f960,cee43d14,1,5,292) at kldload+0xcb syscall(2f,2f,2f,bfbfed98,bfbfed7c) at syscall+0x213 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (304, FreeBSD ELF32, kldload), eip =3D 0x280cc257, esp =3D = 0xbfbfe28c, ebp =3D 0xbfbfed34 --- db> trace 1218 sched_switch(c1d2eaf0,0,1) at sched_switch+0x16f mi_switch(1,0) at mi_switch+0x264 sleepq_switch(c1630b38,cee2868c,c0618cb1,c1630b38,0) at = sleepq_switch+0xe0 sleepq_wait(c1630b38,0,0,0,c1630a8c) at sleepq_wait+0x30 msleep(c1630b38,c08f0fec,50,c082ffdf,0) at msleep+0x2f1 acquire(cee286e4,1000040,600,c1d2eaf0,0) at acquire+0x9e debuglockmgr(c1630b38,1010002,c1630a8c,c1d2eaf0,c083145e) at = debuglockmgr+0x3de ufs_lock(cee2871c,cee28738,c066ccf3,cee2871c,c08d28e0) at ufs_lock+0x4d ufs_vnoperate(cee2871c) at ufs_vnoperate+0x13 debug_vn_lock(c1630a8c,10002,c1d2eaf0,c0831d13,7ec) at = debug_vn_lock+0xdb vget(c1630a8c,2,c1d2eaf0,101,c1d2eaf0) at vget+0xd3 vfs_cache_lookup(cee287fc,cee28818,c065cb4b,cee287fc,c1d2eaf0) at = vfs_cache_lookup+0x1dd ufs_vnoperate(cee287fc) at ufs_vnoperate+0x13 lookup(cee28ac8,c1d2eaf0,c08ff500,1,0) at lookup+0x2e3 namei(cee28ac8,0,c1d2eaf0,0,0) at namei+0x204 vn_open_cred(cee28ac8,cee289a4,0,c1bd6680,ffffffff) at = vn_open_cred+0x238 vn_open(cee28ac8,cee289a4,0,ffffffff) at vn_open+0x1e linker_hints_lookup(c0888600,c,c1df1f88,8,c1df3160) at = linker_hints_lookup+0x109 linker_search_module(c1df1f88,8,c1df3160,cee28b7c,c1df2ad0) at = linker_search_module+0x43 linker_load_module(0,c1df1f88,c1a9dc00,c1df3160,0) at = linker_load_module+0x80 linker_load_dependencies(c1a9dc00,c1a9dc00,4,c1df3000,cee28c04) at = linker_load_dependencies+0x14a link_elf_load_file(c088c9a0,c1950740,cee28c90) at = link_elf_load_file+0x410 linker_load_file(c1950740,cee28cb0,400,0,c14a3000) at = linker_load_file+0x91 linker_load_module(0,c14a3000,0,0,cee28cdc) at linker_load_module+0xb7 kldload(c1d2eaf0,cee28d14,1,4,296) at kldload+0xcb syscall(2f,2f,2f,bfbfed80,bfbfed64) at syscall+0x213 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (304, FreeBSD ELF32, kldload), eip =3D 0x280cc257, esp =3D = 0xbfbfe27c, ebp =3D 0xbfbfed1c --- db> trace 1219 sched_switch(c1c02320,0,1) at sched_switch+0x16f mi_switch(1,0) at mi_switch+0x264 sleepq_switch(c1942a0c,cedbd734,c0618cb1,c1942a0c,0) at = sleepq_switch+0xe0 sleepq_wait(c1942a0c,0,0,0,c1942960) at sleepq_wait+0x30 msleep(c1942a0c,c08f0e3c,50,c082ffdf,0) at msleep+0x2f1 acquire(cedbd78c,1000040,600,c1c02320,0) at acquire+0x9e debuglockmgr(c1942a0c,1010002,c1942960,c1c02320,c083145e) at = debuglockmgr+0x3de ufs_lock(cedbd7c4,cedbd7e0,c066ccf3,cedbd7c4,c08d28e0) at ufs_lock+0x4d ufs_vnoperate(cedbd7c4) at ufs_vnoperate+0x13 debug_vn_lock(c1942960,10002,c1c02320,c0831d13,7ec) at = debug_vn_lock+0xdb vget(c1942960,2,c1c02320,117a,c1c02320) at vget+0xd3 vfs_cache_lookup(cedbd8a4,cedbd8c0,c065cb4b,cedbd8a4,c1c02320) at = vfs_cache_lookup+0x1dd ufs_vnoperate(cedbd8a4) at ufs_vnoperate+0x13 lookup(cedbda50,c1c02320,c08ff500,1,0) at lookup+0x2e3 namei(cedbda50,0,c1c02320,0,0) at namei+0x204 vn_open_cred(cedbda50,cedbda2c,0,c15d3780,ffffffff) at = vn_open_cred+0x238 vn_open(cedbda50,cedbda2c,0,ffffffff) at vn_open+0x1e linker_lookup_file(c0888600,c,c1506c00,5,0) at linker_lookup_file+0x104 linker_hints_lookup(c0888600,c,c1506c00,5,0) at = linker_hints_lookup+0x50e linker_search_module(c1506c00,5,0,400,0) at linker_search_module+0x43 linker_load_module(0,c1506c00,0,0,cedbdcdc) at linker_load_module+0x80 kldload(c1c02320,cedbdd14,1,4,292) at kldload+0xcb syscall(2f,2f,2f,bfbfed8c,bfbfed70) at syscall+0x213 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (304, FreeBSD ELF32, kldload), eip =3D 0x280cc257, esp =3D = 0xbfbfe28c, ebp =3D 0xbfbfed28 --- ------=_NextPart_000_0180_01C4B880.00BBF860--