From owner-freebsd-current@FreeBSD.ORG Fri Dec 23 13:32:26 2005 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D0E5216A41F for ; Fri, 23 Dec 2005 13:32:26 +0000 (GMT) (envelope-from dsh@vlink.ru) Received: from deliver.smtp.vlink.ru (vlink-1.avtlg.ru [83.239.142.33]) by mx1.FreeBSD.org (Postfix) with ESMTP id 00F7943D53 for ; Fri, 23 Dec 2005 13:32:25 +0000 (GMT) (envelope-from dsh@vlink.ru) Received: from smtp.smtp.vlink.ru (clamav.smtp.vlink.ru [192.168.4.1]) by deliver.smtp.vlink.ru (Postfix) with ESMTP id 12211FED4D4 for ; Fri, 23 Dec 2005 16:32:24 +0300 (MSK) Received: from neva.vlink.ru (neva.vlink.ru [217.107.252.29]) by smtp.smtp.vlink.ru (Postfix) with ESMTP id A8CCD1009806 for ; Fri, 23 Dec 2005 16:32:23 +0300 (MSK) Received: from neva.vlink.ru (localhost [127.0.0.1]) by neva.vlink.ru (8.13.4/8.13.4) with ESMTP id jBNDWDOw060566 for ; Fri, 23 Dec 2005 16:32:13 +0300 (MSK) (envelope-from dsh@vlink.ru) Received: (from dsh@localhost) by neva.vlink.ru (8.13.4/8.13.4/Submit) id jBNDWDQ9060563; Fri, 23 Dec 2005 16:32:13 +0300 (MSK) (envelope-from dsh@vlink.ru) To: freebsd-current@freebsd.org From: Denis Shaposhnikov Date: Fri, 23 Dec 2005 16:32:13 +0300 Message-ID: <87bqz8hrc2.fsf@neva.vlink.ru> User-Agent: Gnus/5.1007 (Gnus v5.10.7) XEmacs/21.4.18 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 X-Virus-Scanned: ClamAV using ClamSMTP Subject: current freeze from time to time X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Dec 2005 13:32:27 -0000 Hi! I have a problem with CURRENT. It freezes from time to time on SMP dual Xeon system. For some processes I see MWCHAN = ufs and "D" in the STAT. And I can't kill such processes even with -9. And system can't kill them too on shutdown. So, system can't do shutdown and wait forever after "All buffers synced" message. At this moment I've entered to KDB do "show lockedvnods": Locked vnodes 0xc687cb58: tag ufs, type VDIR usecount 1, writecount 0, refcount 2 mountedhere 0 flags () v_object 0xcb5b1934 ref 0 pages 0 lock type ufs: EXCL (count 1) by thread 0xc795d600 (pid 74686) with 1 pending ino 2072602, on dev ad4s1g 0xc687ca50: tag ufs, type VDIR usecount 31, writecount 0, refcount 32 mountedhere 0 flags () v_object 0xc85d2744 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xc7683d80 (pid 74178) with 6 pending ino 2072603, on dev ad4s1g 0xc687c948: tag ufs, type VDIR usecount 2, writecount 0, refcount 3 mountedhere 0 flags () v_object 0xc875d000 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xc91f3300 (pid 65610) with 1 pending ino 2072615, on dev ad4s1g 0xc691f420: tag ufs, type VDIR usecount 2, writecount 0, refcount 3 mountedhere 0 flags () v_object 0xc8a773e0 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xc68e5780 (pid 519) with 1 pending ino 2072680, on dev ad4s1g 0xc691f318: tag ufs, type VDIR usecount 3, writecount 0, refcount 4 mountedhere 0 flags () v_object 0xc8a7b2e8 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xc7019780 (pid 74103) with 2 pending ino 2072795, on dev ad4s1g 0xc69bb528: tag ufs, type VDIR usecount 2, writecount 0, refcount 3 mountedhere 0 flags () v_object 0xc7890744 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xc91f4600 (pid 74129) with 1 pending ino 2072767, on dev ad4s1g Locked vnodes 0xc687cb58: tag ufs, type VDIR usecount 1, writecount 0, refcount 2 mountedhere 0 flags () v_object 0xcb5b1934 ref 0 pages 0 lock type ufs: EXCL (count 1) by thread 0xc795d600 (pid 74686) with 1 pending ino 2072602, on dev ad4s1g 0xc687ca50: tag ufs, type VDIR usecount 31, writecount 0, refcount 32 mountedhere 0 flags () v_object 0xc85d2744 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xc7683d80 (pid 74178) with 6 pending ino 2072603, on dev ad4s1g 0xc687c948: tag ufs, type VDIR usecount 2, writecount 0, refcount 3 mountedhere 0 flags () v_object 0xc875d000 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xc91f3300 (pid 65610) with 1 pending ino 2072615, on dev ad4s1g 0xc691f420: tag ufs, type VDIR usecount 2, writecount 0, refcount 3 mountedhere 0 flags () v_object 0xc8a773e0 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xc68e5780 (pid 519) with 1 pending ino 2072680, on dev ad4s1g 0xc691f318: tag ufs, type VDIR usecount 3, writecount 0, refcount 4 mountedhere 0 flags () v_object 0xc8a7b2e8 ref 0 pages 1 lock type ufs: EXCL (count 1) by t(kgdb) After that I've done "call doadump" and got vmcore. ps show me: (kgdb) ps During symbol reading, Incomplete CFI data; unspecified registers at 0xc04d97eb. pid proc uid ppid pgrp flag stat comm wchan 74686 c9464000 0 1 1 000000 1 sh ufs c687caa8 74195 c970d000 0 3074 74195 4000100 1 sshd ufs c687caa8 74178 c7682adc 0 3074 74178 004000 1 sshd ufs c687c9a0 74129 c9b82adc 1008 1 5504 004000 1 parser3.cgi ufs c691f370 74103 c70b5458 1008 1 5504 000000 1 httpd ufs c69bb580 65610 c92c0458 1005 1 65610 004000 1 sftp-server ufs c691f478 5518 c6247458 1008 1 5516 004002 1 perl5.8.7 ufs c687caa8 3081 c7523d08 0 1 3081 000000 1 cron ufs c687caa8 3074 c7682d08 0 1 3074 000100 1 sshd ufs c687caa8 3016 c7523adc 0 1 3016 000000 1 syslogd ufs c687caa8 519 c68e4d08 80 1 518 000100 1 nginx ufs c691f370 34 c6260000 0 0 0 000204 1 schedcpu - e88b3cf0 33 c62438b0 0 0 0 000204 1 syncer ktsusp c6243938 32 c6243adc 0 0 0 000204 1 vnlru ktsusp c6243b64 31 c6243d08 0 0 0 000204 1 bufdaemon ktsusp c6243d90 30 c6244000 0 0 0 00020c 1 pagezero pgzero c06c21a0 29 c624422c 0 0 0 000204 1 vmdaemon psleep c06c1d08 28 c6244458 0 0 0 000204 1 pagedaemon psleep c06c1cc8 27 c602e684 0 0 0 000204 1 irq1: atkbd0 26 c602e8b0 0 0 0 000204 1 swi0: sio 25 c602eadc 0 0 0 000204 1 irq18: atapci1 24 c602ed08 0 0 0 000204 1 irq15: ata1 23 c6074000 0 0 0 000204 1 irq14: ata0 22 c607422c 0 0 0 000204 1 irq27: em1 21 c6074458 0 0 0 000204 1 irq26: em0 20 c6074684 0 0 0 000204 1 irq9: acpi0 19 c60748b0 0 0 0 000204 1 swi2: cambio 18 c6074adc 0 0 0 000204 1 swi6: task queue 9 c5fd822c 0 0 0 000204 1 acpi_task2 - c6061e40 8 c5fd8458 0 0 0 000204 1 acpi_task1 - c6061e40 7 c5fd8684 0 0 0 000204 1 acpi_task0 - c6061e40 17 c5fd88b0 0 0 0 000204 1 swi6: Giant taskq 6 c5fd8adc 0 0 0 000204 1 thread taskq - c6062040 16 c5fd8d08 0 0 0 000204 1 swi5: Fast taskq 5 c602e000 0 0 0 000204 1 kqueue taskq - c6062100 15 c602e22c 0 0 0 000204 1 yarrow - c06b1720 4 c602e458 0 0 0 000204 1 g_down - c06b1fc0 3 c5fd3000 0 0 0 000204 1 g_up - c06b1fbc 2 c5fd322c 0 0 0 000204 1 g_event - c06b1fb4 14 c5fd3458 0 0 0 000204 1 swi3: vm 13 c5fd3684 0 0 0 00020c 1 swi4: clock sio 12 c5fd38b0 0 0 0 000204 1 swi1: net 11 c5fd3adc 0 0 0 00020c 1 idle: cpu0 10 c5fd3d08 0 0 0 00020c 1 idle: cpu1 1 c5fd8000 0 0 1 004200 1 init ufs c687cbb0 0 c06b20c0 0 0 0 000200 1 swapper There is no member named p_pptr. Note, I've tried to check v_object addresses from "show lockedvnodes": (kgdb) p/x *0xcb5b1934 $1 = 0xc068f280 (kgdb) p/x *0xc85d2744 $2 = 0xc068f280 (kgdb) p/x *0xc875d000 $3 = 0xc068f280 (kgdb) p/x *0xc8a773e0 $4 = 0xc068f280 ... and so on. May be that's important? Could anybody help me to fix this problem? PS. The system works fine with debug.mpsafevfs=0. -- DSS5-RIPE DSS-RIPN 2:550/5068@fidonet 2:550/5069@fidonet mailto:dsh@vlink.ru http://neva.vlink.ru/~dsh/