From owner-freebsd-stable@FreeBSD.ORG Tue Jul 29 14:01:19 2003 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 00AC037B401 for ; Tue, 29 Jul 2003 14:01:19 -0700 (PDT) Received: from mail.sandvine.com (sandvine.com [199.243.201.138]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0C38543FAF for ; Tue, 29 Jul 2003 14:01:18 -0700 (PDT) (envelope-from ddolson@sandvine.com) Received: by mail.sandvine.com with Internet Mail Service (5.5.2653.19) id <305LG015>; Tue, 29 Jul 2003 17:01:17 -0400 Message-ID: From: Dave Dolson To: "'freebsd-stable@freebsd.org'" Date: Tue, 29 Jul 2003 17:01:09 -0400 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Subject: RE: kernel deadlock X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Jul 2003 21:01:19 -0000 To follow up, I've discovered that the system has exhausted its "FFS node" malloc type. >From vmstat on the core file, the "FFS node" MALLOC type is full: Memory statistics by type Type Kern Type InUse MemUse HighUse Limit Requests Limit Limit Size(s) ... FFS node409600102400K 102400K102400K 1138048 3 0 256 ... The stress test is recursively creating a directory then cd'ing to it, trying to create 1,000,000 deep. You might say "don't do that", but this doesn't require any special priviledge, so is a potential DoS attack by any user. I'm wondering why the MALLOC is done with M_WAITOK; it seems like something which could reasonably fail. Or, why aren't the cached inodes being purged? David Dolson (ddolson@sandvine.com, www.sandvine.com) > -----Original Message----- > From: Dave Dolson > Sent: Tuesday, July 29, 2003 3:06 PM > To: 'freebsd-stable@freebsd.org' > Subject: kernel deadlock > > > We have a reproducible problem with FreeBSD-4.7 which is > apparently a deadlock. > The system is undergoing a filesystem stress test. > > The machine is pingable, but console and most other features > are unresponsive. > The console debugger can be accessed. > The following information is available with db's "ps". > I suspect the wchan of "inode" to be what everything is waiting on. > I'm not sure who is supposed to perform the waking. > > db> ps > pid proc addr uid ppid pgrp flag stat wmesg > wchan cmd > 467 e75df000 e76d6000 0 141 141 000104 3 inode > c34ab600 sshd > 466 e75df1a0 e76c9000 25 147 147 000104 3 inode > c34ab600 sendmail > 465 e75df340 e76c4000 0 144 144 000104 3 inode > c34ab600 sendmail > 464 e75df4e0 e76be000 25 147 147 000104 3 inode > c34ab600 sendmail > 463 e75df680 e76ba000 0 144 144 000104 3 inode > c34ab600 sendmail > 462 e75df820 e76b5000 25 147 147 000104 3 inode > c34ab600 sendmail > 461 e75df9c0 e76b0000 0 144 144 000104 3 inode > c34ab600 sendmail > 460 e75dfb60 e76ac000 25 147 147 000104 3 inode > c34ab600 sendmail > 459 e75dfd00 e76a7000 0 144 144 000104 3 inode > c34ab600 sendmail > 458 e75dfea0 e76a3000 25 147 147 000104 3 inode > c34ab600 sendmail > 457 e75e0040 e769e000 0 144 144 000104 3 inode > c34ab600 sendmail > 456 e75e01e0 e7698000 25 147 147 000104 3 inode > c34ab600 sendmail > 455 e75e0380 e7693000 0 144 144 000104 3 inode > c34ab600 sendmail > 454 e75e0520 e768f000 25 147 147 000104 3 inode > c34ab600 sendmail > 453 e75e06c0 e768b000 0 144 144 000104 3 inode > c34ab600 sendmail > 452 e75e0860 e7685000 25 147 147 000104 3 inode > c34ab600 sendmail > 451 e75e0a00 e7681000 0 144 144 000104 3 inode > c34ab600 sendmail > 450 e75e0ba0 e767d000 25 147 147 000104 3 inode > c34ab600 sendmail > 449 e75e0d40 e7678000 0 144 144 000104 3 inode > c34ab600 sendmail > 448 e75e0ee0 e7671000 25 147 147 000104 3 inode > c34ab600 sendmail > 447 e75e1080 e766d000 0 144 144 000104 3 inode > c34ab600 sendmail > 446 e75e1220 e7669000 25 147 147 000104 3 inode > c34ab600 sendmail > 445 e75e13c0 e7664000 0 144 144 000104 3 inode > c34ab600 sendmail > 444 e75e1560 e7660000 25 147 147 000104 3 inode > c34ab600 sendmail > 443 e75e1700 e765b000 0 144 144 000104 3 inode > c34ab600 sendmail > 442 e75e18a0 e7656000 25 147 147 000104 3 inode > c34ab600 sendmail > 441 e75e1a40 e7652000 0 144 144 000104 3 inode > c34ab600 sendmail > 440 e75e1be0 e764c000 25 147 147 000104 3 inode > c34ab600 sendmail > 439 e75e1d80 e7647000 0 144 144 000104 3 inode > c34ab600 sendmail > 438 e75e1f20 e7642000 25 147 147 000104 3 inode > c34ab600 sendmail > 437 e75e20c0 e763e000 0 144 144 000104 3 inode > c34ab600 sendmail > 436 e75e2260 e763a000 25 147 147 000104 3 inode > c34ab600 sendmail > 435 e75e2400 e7635000 0 144 144 000104 3 inode > c34ab600 sendmail > 434 e75e25a0 e7630000 25 147 147 000104 3 inode > c34ab600 sendmail > 433 e75e2740 e762c000 0 144 144 000104 3 inode > c34ab600 sendmail > 432 e75e28e0 e7626000 25 147 147 000104 3 inode > c34ab600 sendmail > 431 e75e2a80 e7621000 0 144 144 000104 3 inode > c34ab600 sendmail > 430 e75e2c20 e761c000 25 147 147 000104 3 inode > c34ab600 sendmail > 429 e75e2dc0 e7618000 0 144 144 000104 3 inode > c34ab600 sendmail > 428 e75e2f60 e7613000 25 147 147 000104 3 inode > c34ab600 sendmail > 427 e75e3100 e760c000 0 144 144 000104 3 inode > c34ab600 sendmail > 426 e75e32a0 e7608000 25 147 147 000104 3 inode > c34ab600 sendmail > 425 e75e3440 e7602000 0 144 144 000104 3 inode > c34ab600 sendmail > 424 e75e35e0 e75fc000 25 147 147 000104 3 inode > c34ab600 sendmail > 423 e75e3780 e75f8000 0 144 144 000104 3 inode > c34ab600 sendmail > 422 e75e3920 e75f4000 25 147 147 000104 3 inode > c34ab600 sendmail > 421 e75e3ac0 e75ee000 0 144 144 000104 3 inode > c34ab600 sendmail > 420 e75e3c60 e75ea000 25 147 147 000104 3 inode > c34ab600 sendmail > 419 e75e3e00 e75e6000 0 144 144 000104 3 inode > c34ab600 sendmail > 418 dc358ea0 e75dc000 25 147 147 000104 3 inode > c34ab600 sendmail > 417 dc359040 e75d7000 0 144 144 000104 3 inode > c34ab600 sendmail > 416 dc3591e0 e75d1000 25 147 147 000104 3 inode > c34ab600 sendmail > 415 dc359380 e75cd000 0 144 144 000104 3 inode > c34ab600 sendmail > 414 dc359520 e75c8000 25 147 147 000104 3 inode > c34ab600 sendmail > 413 dc3596c0 e75c4000 0 144 144 000104 3 inode > c34ab600 sendmail > 412 dc359860 e75bf000 25 147 147 000104 3 inode > c34ab600 sendmail > 411 dc359a00 e75ba000 0 144 144 000104 3 inode > c34ab600 sendmail > 410 dc359ba0 e75b6000 25 147 147 000104 3 inode > c34ab600 sendmail > 409 dc359d40 e75b2000 0 144 144 000104 3 inode > c34ab600 sendmail > 408 dc359ee0 e75aa000 25 147 147 000104 3 inode > c34ab600 sendmail > 407 dc35a080 e75a6000 0 144 144 000104 3 inode > c34ab600 sendmail > 406 dc35a220 e75a2000 25 147 147 000104 3 inode > c34ab600 sendmail > 405 dc35a3c0 e759d000 0 144 144 000104 3 inode > c34ab600 sendmail > 404 dc35a560 e7598000 25 147 147 000104 3 inode > c34ab600 sendmail > 403 dc35af20 e03f3000 0 144 144 000104 3 inode > c34ab600 sendmail > 402 dc35a700 e2877000 0 99 99 000004 3 inode > c34ab600 dhclient > 401 dc35b260 e03f0000 0 203 401 8000006 3 inode > c34ab600 bash > 399 dc35aa40 e1366000 0 398 399 000014 3 FFS node > c0350140 cron > 398 dc35a8a0 e135b000 0 139 139 000004 3 ppwait > dc35a8a0 cron > 302 dc35abe0 e0402000 0 137 302 4004004 3 ffsvgt > c03695e8 tclsh83 > 277 dc35ad80 e03fe000 0 137 277 4004084 3 poll > c037c1a0 tclsh83 > 203 dc35b8e0 e03d6000 0 202 203 004086 3 wait > dc35b8e0 bash > 202 dc35c440 e036e000 0 1 202 004186 3 wait > dc35c440 login > 191 dc35c2a0 e0376000 0 1 7 000086 3 select > c037c1a0 snmpd > 173 dc35b0c0 e03e8000 0 1 173 000084 3 nanslp > c03646b0 siocontrol > 167 dc35b400 e03e4000 0 1 167 000084 3 nanslp > c03646b0 wddt > 147 dc35b5a0 e03df000 25 1 147 2000184 3 pause > e03df260 sendmail > 144 dc35b740 e03da000 0 1 144 000184 3 select > c037c1a0 sendmail > 141 dc35ba80 e03d2000 0 1 141 000104 3 inode > c34ab600 sshd > 139 dc35bc20 e0397000 0 1 139 000004 3 inode > c35f4300 cron > 137 dc35bdc0 e0392000 0 1 137 000084 3 select > c037c1a0 inetd > 122 dc35bf60 e0382000 0 1 122 000004 3 inode > c34ab600 syslogd > 99 dc35c100 e037e000 0 1 99 000084 3 wait > dc35c100 dhclient > 6 dc35c5e0 defd1000 0 0 0 000204 3 vlrup > dc35c5e0 vnlru > 5 dc35c780 defce000 0 0 0 000204 3 syncer > c037c0c8 syncer > 4 dc35c920 defcb000 0 0 0 000204 3 psleep > c036487c bufdaemon > 3 dc35cac0 defc8000 0 0 0 000204 3 psleep > c0372fc0 vmdaemon > 2 dc35cc60 defc5000 0 0 0 000204 3 psleep > c0351e58 pagedaemon > 1 dc35ce00 dc361000 0 0 1 004284 3 wait > dc35ce00 init > 0 c037b4a0 c040d000 0 0 0 000204 3 sched > c037b4a0 swapper > > The hung tasks look like this: > > db> t 446 > mi_switch(c34ab600,1000040,0,0,ffffffff) at mi_switch+0x1c8 > tsleep(c34ab600,8,c031a54a,0,c34ab600) at tsleep+0x1d1 > acquire(c34ab600,1000040,600,c34ab600,20002) at acquire+0xbc > lockmgr(c34ab600,1030002,defc4e6c,e75e1220,defc4e00) at lockmgr+0x2cc > vop_stdlock(e766bd28,e766bd38,c01fa02c,e766bd28,defc4e00) at > vop_stdlock+0x42 > ufs_vnoperate(e766bd28) at ufs_vnoperate+0x15 > vn_lock(defc4e00,20002,e75e1220) at vn_lock+0x9c > lookup(e766bed0,0,e766bed0,e766bed0,e75e1220) at lookup+0x81 > namei(e766bed0,0,cb9c0a40,e766bed0,e766be18) at namei+0x19d > vn_open(e766bed0,1,1a4,3,e75e1220) at vn_open+0x1ed > open(e75e1220,e766bf80,0,80e3500,0) at open+0xc4 > syscall2(2f,2f,2f,0,80e3500) at syscall2+0x20d > Xint0x80_syscall() at Xint0x80_syscall+0x2b > > It might be here? Cron is waiting on memory: > > db> t 399 > mi_switch(c0350140,c0363440,c0350140,c02d535c,ffffffff) at > mi_switch+0x1c8 > tsleep(c0350140,2,c031a400,0,c368f228) at tsleep+0x1d1 > malloc(100,c0350140,0,c368f228,c35f4300) at malloc+0x1cd > ffs_vget(c35dca00,ec17,e1368cbc,0,defc3900) at ffs_vget+0xa0 > ufs_lookup(e1368d20,e1368d34,c01ec562,e1368d20,e036c00a) at > ufs_lookup+0xb47 > ufs_vnoperate(e1368d20,e036c00a,defc3900,e1368ef8,e1368d20) > at ufs_vnoperate+0x15 > vfs_cache_lookup(e1368d78,e1368d88,c01efb71,e1368d78,defc4e00) > at vfs_cache_lookup+0x2c2 > ufs_vnoperate(e1368d78,defc4e00,cb9ded00,e1368ef8,dc35aa40) > at ufs_vnoperate+0x15 > lookup(e1368ed0,0,e1368ed0,e1368ed0,dc35aa40) at lookup+0x2e1 > namei(e1368ed0,0,cb9cc7c0,e1368ed0,c02d298b) at namei+0x19d > vn_open(e1368ed0,1,1a4,3,dc35aa40) at vn_open+0x1ed > open(dc35aa40,e1368f80,68108dec,6811b380,4) at open+0xc4 > syscall2(2f,2f,2f,4,6811b380) at syscall2+0x20d > Xint0x80_syscall() at Xint0x80_syscall+0x2b > > > Can anyone suggest what the bug might be or how to proceed > with debugging? > > Thanks in advance, > David Dolson (ddolson@sandvine.com, www.sandvine.com) > >