Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 22 Oct 2004 21:42:17 +0200
From:      "Tom Jensen" <tom@motd.dk>
To:        "'Robert Watson'" <rwatson@freebsd.org>
Cc:        freebsd-current@freebsd.org
Subject:   RE: Machine hangs(Beta7), only reset button works
Message-ID:  <20041022193950.3032562EF@bart.motd.dk>
In-Reply-To: <Pine.NEB.3.96L.1041022045223.34569C-100000@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.

------=_NextPart_000_0180_01C4B880.00BBF860
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit

Ok, did a show lockedvnods as requested, any thing else you want me to try

- Tom

-----Original Message-----
From: owner-freebsd-current@freebsd.org
[mailto:owner-freebsd-current@freebsd.org] On Behalf Of Robert Watson
Sent: 22. oktober 2004 10:55
To: Tom Jensen
Cc: freebsd-current@freebsd.org
Subject: RE: Machine hangs(Beta7), only reset button works


On Fri, 22 Oct 2004, Tom Jensen wrote:

> Ok, I managed to break into the with a break over serial console, 
> attached is some info from KDB (please note that I have no knowledge 
> about using the debugger)
> 
> I also noticed that I don't have "makeoptions DEBUG=-g" in my kernel 
> conf so I have no debug kernel. I rebuild my kernel ASAP so I'm able 
> to provide more info sometime tomorrow.
> 
> Please let me know if there is more info needed. 

Great, into the debugger is good news.  Looking at the thread wait states,
it looks like you might have a vnode deadlock going on.  It would be useful
if you could use the "show lockedvnods" command to show what vnode locks are
held.  If you recompile your kernel with DEBUG_LOCKS (and compile all
modules in, as this produces a kernel that is ABI-incompatible with most
modules), you will get extra debugging information when you use that command
(it will say what locks were acquired where, not just what locks are held).

It looks like you're using SMBfs could you mention a little about how that's
in use, and whether things like home directories, etc, are mounted that way?
Is it an element of the system you could remove during testing to see if the
problem goes away?  Also, are you using NFS or other distributed file
systems?

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Principal Research Scientist, McAfee Research



> 
> - Tom
> 
> -----Original Message-----
> From: owner-freebsd-current@freebsd.org 
> [mailto:owner-freebsd-current@freebsd.org] On Behalf Of Robert Watson
> Sent: 21. oktober 2004 13:46
> To: Tom Jensen
> Cc: freebsd-current@freebsd.org
> Subject: Re: Machine hangs(Beta7), only reset button works
> 
> 
> On Thu, 21 Oct 2004, Tom Jensen wrote:
> 
> > I've been seeing a pretty strange problem lately with my server. 
> > 
> > The box completely freeze typically when it's done running the first 
> > part of my backup script, resulting in no possibility to login on 
> > the console or by SSH, the freeze even happens when I'm sitting in a 
> > terminal and working.
> > 
> > There is no indication in log files etc. about what's causing the 
> > problem and it's not breaking into debugger either :-(
> 
> This should probably be in debugging lore somewhere, but I've observed 
> that it's often possible to break into the debugger using a break over 
> serial console when it's not possible to break in using syscons.  This 
> is because syscons requires the Giant lock, so if the freeze happens 
> because a thread is spinning while holding Giant, you can't get in.  
> This needs to be fixed, but hasn't yet been fixed, so in the mean time 
> often useful advice is to use a serial console to generate the break.
> 
> If you still can't get into the debugger, you might try some of the 
> various watchdog drivers -- some hardware comes with built in watchdog 
> parts, such as ichwd(4), or you could try options MP_WATCHDOG on an 
> SMP box if you're willing to dedicate a CPU to running as a watchdog for
the other cpu(s).
> 
> Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
> robert@fledge.watson.org      Principal Research Scientist, McAfee
Research
> 
> 
> > 
> > The backup script is really simple, creating a .tgz file of a given 
> > directory, mounting a windows share (mount_smbfs) and copying the 
> > file. The script is run by cron six times (start at the same time) 
> > in six different directories, this results in the box freezes after 
> > the tar processes finishes.
> > 
> > Attached is the dmesg.boot and the latest top, don't know if it's 
> > any use but it's seems rather strange that a lot of processes are in 
> > a STATE usf (not sure what this means but I don't sees this when the 
> > box is running
> > normally)
> > 
> > The kernel is mostly a generic with the following modifications:
> > 
> > options         IPFIREWALL
> > options         IPFIREWALL_VERBOSE
> > options         IPFIREWALL_VERBOSE_LIMIT=400
> > options         IPDIVERT
> > options         IPSEC
> > options         IPSEC_ESP
> > options         IPSEC_DEBUG
> > device ath
> > device ath_hal
> > options         KDB
> > options         DDB          
> > 
> > bash-2.05b# uname -a
> > FreeBSD bart.motd.dk 5.3-BETA7 FreeBSD 5.3-BETA7 #6: Tue Oct 19 00:36:59
> > CEST 2004     root@bart.motd.dk:/usr/obj/usr/src/sys/GW  i386
> > 
> > Any more info needed please let me know.
> > 
> > Best regards
> > 
> > - Tom
> > 
> > 
> 
> _______________________________________________
> freebsd-current@freebsd.org mailing list 
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
> 

_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"

------=_NextPart_000_0180_01C4B880.00BBF860
Content-Type: text/plain;
	name="debug_info_2.txt"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="debug_info_2.txt"

KDB: enter: Line break on console
[thread 100003]
Stopped at      kdb_enter+0x2b: nop
db> trace
kdb_enter(c08489ef) at kdb_enter+0x2b
siointr1(c1487800,c092eb20,0,c08487b3,6ad) at siointr1+0xce
siointr(c1487800) at siointr+0x21
intr_execute_handlers(c08cff00,c9bc1ca4,0,c14a5000,c14a5018) at =
intr_execute_handlers+0xa9
atpic_handle_intr(4) at atpic_handle_intr+0x92
Xatpic_intr4() at Xatpic_intr4+0x20
--- interrupt, eip =3D 0xc0a4c5bd, esp =3D 0xc9bc1ce8, ebp =3D =
0xc9bc1ce8 ---
acpi_cpu_c1(c08f08a0,1,1,1,c06009f8) at acpi_cpu_c1+0x5
acpi_cpu_idle(c9bc1d1c,c0600a55,c13c3c5c,c9bc1d34,c0600838) at =
acpi_cpu_idle+0xd9
cpu_idle(c13c3c5c,c9bc1d34,c0600838,0,c9bc1d48) at cpu_idle+0x28
idle_proc(0,c9bc1d48,0,c06009f8,0) at idle_proc+0x5d
fork_exit(c06009f8,0,c9bc1d48) at fork_exit+0xa4
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip =3D 0, esp =3D 0xc9bc1d7c, ebp =3D 0 ---
db> ps
  pid   proc     uarea   uid  ppid  pgrp  flag   stat  wmesg    wchan  =
cmd
 1227 c1c00e20 cedd7000    0   697   697 4000000 [SLPQ ufs =
0xc15bfc64][SLP] couriertcpd
 ..
 1222 c1d31e20 cee5c000   70   612   612 0000000 [SLPQ ufs =
0xc15bfc64][SLP] postgres
 1221 c1d7c1c4 ceea8000    0  1174  1160 0004000 [SLPQ ufs =
0xc16088e0][SLP] mount_smbfs
 1220 c1d7c54c ceeaa000    0  1175  1159 0004000 [SLPQ ufs =
0xc15bfc64][SLP] mount_smbfs
 1219 c1c008d4 cedd4000    0  1180  1163 0004000 [SLPQ ufs =
0xc1942a0c][SLP] mount_smbfs
 1218 c1d2d710 cee4f000    0  1176  1158 0004000 [SLPQ ufs =
0xc1630b38][SLP] mount_smbfs
 ..
db> show lockedvnods
Locked vnodes
0xc15bfbb8: tag ufs, type VDIR, usecount 182, writecount 0, refcount 1, =
flags (VV_ROOT|VV_OBJBUF), lock type ufs: EXCL (count 1) by thread =
0xc1d2f960 (pid 1221) with 14 pending
        ino 2, on dev ad0s1a (4, 15)
0xc1608834: tag ufs, type VDIR, usecount 2, writecount 0, refcount 1, =
lock type ufs: EXCL (count 1) by thread 0xc1d2eaf0 (pid 1218) with 1 =
pending
        ino 24771, on dev ad0s1a (4, 15)
0xc1630a8c: tag ufs, type VDIR, usecount 2, writecount 0, refcount 1, =
lock type ufs: EXCL (count 1) by thread 0xc1c02320 (pid 1219) with 1 =
pending
        ino 323, on dev ad0s1a (4, 15)
0xc1942960: tag ufs, type VREG, usecount 2, writecount 0, refcount 0, =
flags (VV_OBJBUF), lock type ufs: EXCL (count 1) by thread 0xc1d2eaf0 =
(pid 1218) with 1 pending
        ino 1170, on dev ad0s1a (4, 15)
db> trace 1221
sched_switch(c1d2f960,0,1) at sched_switch+0x16f
mi_switch(1,0) at mi_switch+0x264
sleepq_switch(c16088e0,cee437dc,c0618cb1,c16088e0,0) at =
sleepq_switch+0xe0
sleepq_wait(c16088e0,0,0,0,c1608834) at sleepq_wait+0x30
msleep(c16088e0,c08f18ec,50,c082ffdf,0) at msleep+0x2f1
acquire(cee43834,1000040,600,c1d2f960,0) at acquire+0x9e
debuglockmgr(c16088e0,1010002,c1608834,c1d2f960,c083145e) at =
debuglockmgr+0x3de
ufs_lock(cee4386c,cee43888,c066ccf3,cee4386c,c08d28e0) at ufs_lock+0x4d
ufs_vnoperate(cee4386c) at ufs_vnoperate+0x13
debug_vn_lock(c1608834,10002,c1d2f960,c0831d13,7ec) at =
debug_vn_lock+0xdb
vget(c1608834,2,c1d2f960,be,c1d2f960) at vget+0xd3
vfs_cache_lookup(cee4394c,cee43968,c065cb4b,cee4394c,c1d2f960) at =
vfs_cache_lookup+0x1dd
ufs_vnoperate(cee4394c) at ufs_vnoperate+0x13
lookup(cee43c18,c1d2f960,c08ff500,1,0) at lookup+0x2e3
namei(cee43c18,0,c1d2f960,0,0) at namei+0x204
vn_open_cred(cee43c18,cee43af4,0,c1bcc180,ffffffff) at =
vn_open_cred+0x238
vn_open(cee43c18,cee43af4,0,ffffffff) at vn_open+0x1e
linker_hints_lookup(c0888600,c,c1508400,5,0) at =
linker_hints_lookup+0x109
linker_search_module(c1508400,5,0,400,0) at linker_search_module+0x43
linker_load_module(0,c1508400,0,0,cee43cdc) at linker_load_module+0x80
kldload(c1d2f960,cee43d14,1,5,292) at kldload+0xcb
syscall(2f,2f,2f,bfbfed98,bfbfed7c) at syscall+0x213
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (304, FreeBSD ELF32, kldload), eip =3D 0x280cc257, esp =3D =
0xbfbfe28c, ebp =3D 0xbfbfed34 ---
db> trace 1218
sched_switch(c1d2eaf0,0,1) at sched_switch+0x16f
mi_switch(1,0) at mi_switch+0x264
sleepq_switch(c1630b38,cee2868c,c0618cb1,c1630b38,0) at =
sleepq_switch+0xe0
sleepq_wait(c1630b38,0,0,0,c1630a8c) at sleepq_wait+0x30
msleep(c1630b38,c08f0fec,50,c082ffdf,0) at msleep+0x2f1
acquire(cee286e4,1000040,600,c1d2eaf0,0) at acquire+0x9e
debuglockmgr(c1630b38,1010002,c1630a8c,c1d2eaf0,c083145e) at =
debuglockmgr+0x3de
ufs_lock(cee2871c,cee28738,c066ccf3,cee2871c,c08d28e0) at ufs_lock+0x4d
ufs_vnoperate(cee2871c) at ufs_vnoperate+0x13
debug_vn_lock(c1630a8c,10002,c1d2eaf0,c0831d13,7ec) at =
debug_vn_lock+0xdb
vget(c1630a8c,2,c1d2eaf0,101,c1d2eaf0) at vget+0xd3
vfs_cache_lookup(cee287fc,cee28818,c065cb4b,cee287fc,c1d2eaf0) at =
vfs_cache_lookup+0x1dd
ufs_vnoperate(cee287fc) at ufs_vnoperate+0x13
lookup(cee28ac8,c1d2eaf0,c08ff500,1,0) at lookup+0x2e3
namei(cee28ac8,0,c1d2eaf0,0,0) at namei+0x204
vn_open_cred(cee28ac8,cee289a4,0,c1bd6680,ffffffff) at =
vn_open_cred+0x238
vn_open(cee28ac8,cee289a4,0,ffffffff) at vn_open+0x1e
linker_hints_lookup(c0888600,c,c1df1f88,8,c1df3160) at =
linker_hints_lookup+0x109
linker_search_module(c1df1f88,8,c1df3160,cee28b7c,c1df2ad0) at =
linker_search_module+0x43
linker_load_module(0,c1df1f88,c1a9dc00,c1df3160,0) at =
linker_load_module+0x80
linker_load_dependencies(c1a9dc00,c1a9dc00,4,c1df3000,cee28c04) at =
linker_load_dependencies+0x14a
link_elf_load_file(c088c9a0,c1950740,cee28c90) at =
link_elf_load_file+0x410
linker_load_file(c1950740,cee28cb0,400,0,c14a3000) at =
linker_load_file+0x91
linker_load_module(0,c14a3000,0,0,cee28cdc) at linker_load_module+0xb7
kldload(c1d2eaf0,cee28d14,1,4,296) at kldload+0xcb
syscall(2f,2f,2f,bfbfed80,bfbfed64) at syscall+0x213
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (304, FreeBSD ELF32, kldload), eip =3D 0x280cc257, esp =3D =
0xbfbfe27c, ebp =3D 0xbfbfed1c ---
db> trace 1219
sched_switch(c1c02320,0,1) at sched_switch+0x16f
mi_switch(1,0) at mi_switch+0x264
sleepq_switch(c1942a0c,cedbd734,c0618cb1,c1942a0c,0) at =
sleepq_switch+0xe0
sleepq_wait(c1942a0c,0,0,0,c1942960) at sleepq_wait+0x30
msleep(c1942a0c,c08f0e3c,50,c082ffdf,0) at msleep+0x2f1
acquire(cedbd78c,1000040,600,c1c02320,0) at acquire+0x9e
debuglockmgr(c1942a0c,1010002,c1942960,c1c02320,c083145e) at =
debuglockmgr+0x3de
ufs_lock(cedbd7c4,cedbd7e0,c066ccf3,cedbd7c4,c08d28e0) at ufs_lock+0x4d
ufs_vnoperate(cedbd7c4) at ufs_vnoperate+0x13
debug_vn_lock(c1942960,10002,c1c02320,c0831d13,7ec) at =
debug_vn_lock+0xdb
vget(c1942960,2,c1c02320,117a,c1c02320) at vget+0xd3
vfs_cache_lookup(cedbd8a4,cedbd8c0,c065cb4b,cedbd8a4,c1c02320) at =
vfs_cache_lookup+0x1dd
ufs_vnoperate(cedbd8a4) at ufs_vnoperate+0x13
lookup(cedbda50,c1c02320,c08ff500,1,0) at lookup+0x2e3
namei(cedbda50,0,c1c02320,0,0) at namei+0x204
vn_open_cred(cedbda50,cedbda2c,0,c15d3780,ffffffff) at =
vn_open_cred+0x238
vn_open(cedbda50,cedbda2c,0,ffffffff) at vn_open+0x1e
linker_lookup_file(c0888600,c,c1506c00,5,0) at linker_lookup_file+0x104
linker_hints_lookup(c0888600,c,c1506c00,5,0) at =
linker_hints_lookup+0x50e
linker_search_module(c1506c00,5,0,400,0) at linker_search_module+0x43
linker_load_module(0,c1506c00,0,0,cedbdcdc) at linker_load_module+0x80
kldload(c1c02320,cedbdd14,1,4,292) at kldload+0xcb
syscall(2f,2f,2f,bfbfed8c,bfbfed70) at syscall+0x213
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (304, FreeBSD ELF32, kldload), eip =3D 0x280cc257, esp =3D =
0xbfbfe28c, ebp =3D 0xbfbfed28 ---

------=_NextPart_000_0180_01C4B880.00BBF860--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20041022193950.3032562EF>