From owner-freebsd-performance@FreeBSD.ORG Sun May 20 10:29:24 2007 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4F23B16A56D for ; Sun, 20 May 2007 10:29:24 +0000 (UTC) (envelope-from redcrash@gmail.com) Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.248]) by mx1.freebsd.org (Postfix) with ESMTP id 05AFA13C46A for ; Sun, 20 May 2007 10:29:23 +0000 (UTC) (envelope-from redcrash@gmail.com) Received: by an-out-0708.google.com with SMTP id d23so317323and for ; Sun, 20 May 2007 03:29:23 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type; b=S80bys47GWrAg7wF4NJ8s5zk9akKcRHH392Ltgvj3jSpoUpgZXEguNT8ZJrslxFJjOk9/hb+4Hc9NCr0xYmauzpaV2czmzwUv9qz3l3I+hgfR1ZbEHcXrYahPfsUXfSAAG7q6MnDXmuqs/jheAafosn22Bs0ANkA8d3GRR/U7Sk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:mime-version:content-type; b=Q9bJ1go1dgolRnVQ47MST+M6Dr2E1OCAYixbMQBBJv6UOiaZwS3MTya8wYmiXQpEtwWVxD0vV40XlPCf75FKNhLfCgtb91xyqLka6d236dHZAbeQAPv7hxUwFZDIJZq8nLOUPzKnUelE29g7rrQ7e1Zy33gHXEfASbpX6swkv80= Received: by 10.100.11.7 with SMTP id 7mr2260610ank.1179656963242; Sun, 20 May 2007 03:29:23 -0700 (PDT) Received: by 10.100.109.2 with HTTP; Sun, 20 May 2007 03:29:23 -0700 (PDT) Message-ID: Date: Sun, 20 May 2007 12:29:23 +0200 From: "Harald Servat" To: freebsd-hackers@freebsd.org, freebsd-hpc@freebsd.org, freebsd-performance@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: testers wanted for PAPI / FreeBSD X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 May 2007 10:29:24 -0000 Hello, I'm porting PAPI to FreeBSD. I was wondering if you could give a try to the package I'm porting. It would be great to have more feedback than just that my laptop is able to provide me :) First of all, you can download the code at http://code.google.com/p/papi-for-freebsd Next, see man hwpmc(4) and compile a kernel with options HWPMC_HOOKS device hwpmc (you'll require device apic if you're running on i386 machines). When you boot your dmesg should print something like hwpmc: TSC/1/0x20 P6/2/0x1fe Once the machine is up and running, just untar the file you've downloaded, run ./configure and just run make (not make install). Could you send me the output of the following commands? # dmesg | grep hwpmc # utils/papi_avail # utils/papi_decode # utils/papi_native_avail # ctests/low-level # ctests/high-level Thank you very much, -- _________________________________________________________________ Empty your memory, with a free()... like a pointer! If you cast a pointer to an integer, it becomes an integer, if you cast a pointer to a struct, it becomes a struct. The pointer can crash..., and can overflow. Be a pointer my friend... From owner-freebsd-performance@FreeBSD.ORG Sun May 20 23:11:33 2007 Return-Path: X-Original-To: performance@freebsd.org Delivered-To: freebsd-performance@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id BE7CE16A421; Sun, 20 May 2007 23:11:33 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 70C3613C43E; Sun, 20 May 2007 23:11:33 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.101] (c-71-231-138-78.hsd1.or.comcast.net [71.231.138.78]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id l4KNBVIf003485 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO); Sun, 20 May 2007 19:11:32 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Sun, 20 May 2007 16:11:29 -0700 (PDT) From: Jeff Roberson X-X-Sender: jroberson@10.0.0.1 To: smp@freebsd.org, threads@freebsd.org, performance@freebsd.org Message-ID: <20070520161051.L632@10.0.0.1> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Subject: sched_lock && thread_lock() (fwd) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 May 2007 23:11:33 -0000 In case any of you missed it, I sent this mail to arch@. Please keep the discussion there. Thanks, Jeff ---------- Forwarded message ---------- Date: Sun, 20 May 2007 16:07:53 -0700 (PDT) From: Jeff Roberson To: arch@freebsd.org Subject: sched_lock && thread_lock() Attilio and I have been working on addressing the increasing problem of sched_lock contention on -CURRENT. Attilio has been addressing the parts of the kernel which do not need to fall under the scheduler lock and moving them into seperate locks. For example, the ldt/gdt lock and clock lock which were committed earlier. Also, using atomics for the vmcnt structure. I have been working on an approach to using thread locks rather than a global scheduler lock. The design is similar to Solaris's container locks, but the details are different. The basic idea is to have a pointer in the thread structure that points at a spinlock that protects the thread. This spinlock may be one of the scheduler lock, a turnstile lock, or a sleep queue lock. As the thread changes state from running to blocked on a lock or sleeping the lock changes with it. This has several advantages. The majority of the kernel simply calls thread_lock() which figures out the details. The kernel then knows nothing of the particulars of the scheduler locks, and the schedulers are free to implement them in any way that they like. Furthermore, in some cases the locking is reduced, because locking the thread has the side effect of locking the container. This patch does not implement per-cpu scheduler locks. It just changes the kernel to support this model. I have a fork of ULE in development that runs with per-cpu locks, but it is not ready yet. This means that there should be very little change in system performance until the scheduler catches up. In fact, on a 2cpu system the difference is immeasurable or almost so on every workload I have tested. On an 8way opteron system the results vary between +10% on some reasonable workloads and -15% on super-smack, which has some inherent problems that I believe are not exposing real performance problems with this patch. This has also been tested extensively by Kris and myself on a variety of machines and I believe it to be fairly solid. The only thing remaining to do is fix rusage so that it does not rely on a global scheduler lock. I am posting the patch here in case anyone with specific knowledge of turnstiles, sleepqueues, or signals would like to review it, and as a general heads up to people interested in where the kernel is headed. This will apply to current just prior to my kern_clock.c commits. I will re-merge and update again in the next few days, probably after we sort out rusage. http://people.freebsd.org/~jeff/threadlock.diff Thanks, Jeff _______________________________________________ freebsd-arch@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-arch To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-performance@FreeBSD.ORG Tue May 22 00:51:12 2007 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9553216A46F for ; Tue, 22 May 2007 00:51:12 +0000 (UTC) (envelope-from security@jim-liesl.org) Received: from qsmtp2.mc.surewest.net (qsmtp.mc.surewest.net [66.60.130.145]) by mx1.freebsd.org (Postfix) with SMTP id 4558B13C4B8 for ; Tue, 22 May 2007 00:51:12 +0000 (UTC) (envelope-from security@jim-liesl.org) Received: (qmail 27189 invoked from network); 21 May 2007 17:24:32 -0700 Received: by simscan 1.1.0 ppid: 27150, pid: 27152, t: 5.2563s scanners: regex: 1.1.0 attach: 1.1.0 clamav: 0.84/m:43/d:3123 spam: 3.0.3 Received: from unknown (HELO daemon.jim-liesl.org) (66.60.173.44) by qsmtp2 with SMTP; 21 May 2007 17:24:27 -0700 Received: from daemon.jim-liesl.org (localhost.static.surewest.net [127.0.0.1]) by daemon.jim-liesl.org (Postfix) with ESMTP id 1D9975C1D for ; Mon, 21 May 2007 17:24:27 -0700 (PDT) Received: from [127.0.0.1] (daemon.static.surewest.net [192.168.1.15]) by daemon.jim-liesl.org (Postfix) with ESMTP id A71A05C1C for ; Mon, 21 May 2007 17:24:26 -0700 (PDT) Message-ID: <4652383E.9000302@jim-liesl.org> Date: Mon, 21 May 2007 17:24:30 -0700 From: security User-Agent: Thunderbird 1.5.0.10 (Windows/20070221) MIME-Version: 1.0 To: freebsd-performance@freebsd.org X-Enigmail-Version: 0.94.1.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on qsmtp2.surewest.net X-Spam-Level: X-Spam-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.3 Subject: [Fwd: asymetric speeds over gigE link] X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 May 2007 00:51:12 -0000 Sent this to -net and didn't get much info, so I'll try here since there's some overlap Summary: Using iperf to measure TCP net speed between a linux (kubuntu edgy) and freebsd box over gigE, I see significant speed difference depending on the data direction. Pushing data from the freebsd box to the linux box, I average about 500Gb/s. Pushing data from the linux box to the freebsd box, I see about 300Gb/s. It would seem that there is something limiting either the transmit side of the linux box or the recv side of the freebsd box. Any suggestions on how to narrow down who's tuning needs more work would also be welcome. Just for grins, I tried upping the txqueuelen on the linux box to 1500, but no help. I'd like to get closer to the theoretical speed of this link (I'd be happy if I could get 600-700 both ways), but the real puzzler is the difference. The client is the linux box @192.168.1.104 and the iperf server is the freebsd box @ 192.168.1.15 . Neither box is cpu or net busy during testing. Both systems only have one nic. I played with larger window sizes, but it only had a minor affect. I went with the Intel Pro/1000's based on their good rep in both the linux and freebsd communities. polling is not in the kernel Client connecting to 192.168.1.15, TCP port 5001 TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 6] local 192.168.1.104 port 34788 connected with 192.168.1.15 port 5001 [ 6] 0.0-10.0 sec 369 MBytes 309 Mbits/sec [ 5] local 192.168.1.104 port 5001 connected with 192.168.1.15 port 52963 [ 5] 0.0-10.0 sec 597 MBytes 500 Mbits/sec Switch is the Netgear GS105 (5 port, supposedly wire speed, cables are Belkin 5e), both systems are on this switch. FreeBSD box: FreeBSD 6.1-RELEASE-p10 1.9Ghz Athlon / 1 gig of main mem Abit/nforce2 MB+chipset (onboard nic disabled in bios) Intel Pro/1000GT NIC sysctl.conf: kern.ipc.maxsockbuf=8192000 net.inet.tcp.sendspace=262144 net.inet.tcp.recvspace=262144 sysctl reports: kern.ipc.nmbjumbo16: 0 kern.ipc.nmbjumbo9: 0 kern.ipc.nmbjumbop: 0 kern.ipc.nmbclusters: 25600 net.inet.tcp.rfc1323: 1 em0: flags=8843 mtu 1500 options=b inet6 fe80::20e:cff:feda:1a3c%em0 prefixlen 64 scopeid 0x1 inet 192.168.1.15 netmask 0xffffff00 broadcast 192.168.1.255 ether 00:0e:0c:da:1a:3c media: Ethernet autoselect (1000baseTX ) status: active Linux box: Linux emperor 2.6.17-11-generic #2 SMP Tue Mar 13 23:32:38 UTC 2007 i686 GNU/Linux Soyo Dragon+ MB/ 1.9 Ghz Athlon/ 1 gig main mem. I run tcp_tune.sh at boot on the linux box #!/bin/bash echo "10000 131072 262144">/proc/sys/net/ipv4/tcp_rmem echo "10000 131072 262144">/proc/sys/net/ipv4/tcp_wmem echo "131072 262144 8192000" > /proc/sys/net/ipv4/tcp_mem /proc/sys/net/ipv4/tcp_window_scaling is 1 /proc/sys/net/ipv4/tcp_timestamps is 1 eth1 Link encap:Ethernet HWaddr 00:0E:0C:DA:1A:3B inet addr:192.168.1.104 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::20e:cff:feda:1a3b/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:8139539 errors:0 dropped:0 overruns:0 frame:0 TX packets:5638407 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:147638720 (140.7 MiB) TX bytes:1109677958 (1.0 GiB) Base address:0xd800 Memory:e2020000-e2040000 Settings for eth1: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: umbg Wake-on: g Current message level: 0x00000007 (7) Link detected: yes From owner-freebsd-performance@FreeBSD.ORG Tue May 22 02:51:54 2007 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 21F1416A468 for ; Tue, 22 May 2007 02:51:54 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from multiplay.co.uk (core6.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id B4D5E13C45A for ; Tue, 22 May 2007 02:51:53 +0000 (UTC) (envelope-from killing@multiplay.co.uk) X-Spam-Checker-Version: SpamAssassin 3.1.5 (2006-08-29) on core6.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-14.7 required=6.0 tests=BAYES_00, USER_IN_WHITELIST, USER_IN_WHITELIST_TO autolearn=ham version=3.1.5 Received: from r2d2 ([212.135.219.182]) by multiplay.co.uk (multiplay.co.uk [85.236.96.23]) (MDaemon PRO v9.5.4) with ESMTP id md50003841935.msg for ; Tue, 22 May 2007 03:47:56 +0100 Message-ID: <005901c79c1b$8fed4490$b6db87d4@multiplay.co.uk> From: "Steven Hartland" To: "security" , References: <4652383E.9000302@jim-liesl.org> Date: Tue, 22 May 2007 03:47:39 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.3028 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 X-MDRemoteIP: 212.135.219.182 X-Return-Path: killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-performance@freebsd.org X-Spam-Processed: multiplay.co.uk, Tue, 22 May 2007 03:47:56 +0100 X-MDAV-Processed: multiplay.co.uk, Tue, 22 May 2007 03:47:56 +0100 Cc: Subject: Re: asymetric speeds over gigE link] X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 May 2007 02:51:54 -0000 You might want to try setting: net.inet.tcp.inflight.enable=0 Just changing this on our FreeBSD 6.2 boxes enabled them to achieve full line rate with ftp / proftpd transfers. Steve ----- Original Message ----- From: "security" > Switch is the Netgear GS105 (5 port, supposedly wire speed, cables are > Belkin 5e), both systems are on this switch. > > FreeBSD box: > FreeBSD 6.1-RELEASE-p10 > 1.9Ghz Athlon / 1 gig of main mem > Abit/nforce2 MB+chipset (onboard nic disabled in bios) > Intel Pro/1000GT NIC > sysctl.conf: > kern.ipc.maxsockbuf=8192000 > net.inet.tcp.sendspace=262144 > net.inet.tcp.recvspace=262144 > sysctl reports: > kern.ipc.nmbjumbo16: 0 > kern.ipc.nmbjumbo9: 0 > kern.ipc.nmbjumbop: 0 > kern.ipc.nmbclusters: 25600 > net.inet.tcp.rfc1323: 1 > > em0: flags=8843 mtu 1500 > options=b > inet6 fe80::20e:cff:feda:1a3c%em0 prefixlen 64 scopeid 0x1 > inet 192.168.1.15 netmask 0xffffff00 broadcast 192.168.1.255 > ether 00:0e:0c:da:1a:3c > media: Ethernet autoselect (1000baseTX ) > status: active ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-performance@FreeBSD.ORG Tue May 22 06:29:18 2007 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id CAC8916A400 for ; Tue, 22 May 2007 06:29:18 +0000 (UTC) (envelope-from security@jim-liesl.org) Received: from qsmtp3.mc.surewest.net (qsmtp.mc.surewest.net [66.60.130.145]) by mx1.freebsd.org (Postfix) with SMTP id AF63613C43E for ; Tue, 22 May 2007 06:29:16 +0000 (UTC) (envelope-from security@jim-liesl.org) Received: (qmail 9715 invoked from network); 21 May 2007 23:29:16 -0700 Received: by simscan 1.1.0 ppid: 9704, pid: 9705, t: 4.4555s scanners: regex: 1.1.0 attach: 1.1.0 clamav: 0.84/m:43/d:3122 spam: 3.0.3 Received: from unknown (HELO daemon.jim-liesl.org) (66.60.173.44) by qsmtp3 with SMTP; 21 May 2007 23:29:11 -0700 Received: from daemon.jim-liesl.org (localhost.static.surewest.net [127.0.0.1]) by daemon.jim-liesl.org (Postfix) with ESMTP id 64B2A5C1D; Mon, 21 May 2007 23:29:11 -0700 (PDT) Received: from [192.168.1.104] (emperor.jim-liesl.org [192.168.1.104]) by daemon.jim-liesl.org (Postfix) with ESMTP id 236465C1C; Mon, 21 May 2007 23:29:11 -0700 (PDT) Message-ID: <46528DC0.5050804@jim-liesl.org> Date: Mon, 21 May 2007 23:29:20 -0700 From: security User-Agent: Thunderbird 1.5.0.10 (X11/20070306) MIME-Version: 1.0 To: Steven Hartland References: <4652383E.9000302@jim-liesl.org> <005901c79c1b$8fed4490$b6db87d4@multiplay.co.uk> In-Reply-To: <005901c79c1b$8fed4490$b6db87d4@multiplay.co.uk> X-Enigmail-Version: 0.94.0.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on qsmtp3.surewest.net X-Spam-Level: X-Spam-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.3 Cc: freebsd-performance@freebsd.org Subject: Re: asymetric speeds over gigE link] X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 May 2007 06:29:18 -0000 Thanks Steve. Just tried it and no change. I also rebuilt my kernel for polling and set it on the em0 device. While it did lower interrupt time, it didn't make any real change in my numbers jim Steven Hartland wrote: > You might want to try setting: > net.inet.tcp.inflight.enable=0 > > Just changing this on our FreeBSD 6.2 boxes enabled them to achieve > full line rate with ftp / proftpd transfers. > > Steve > > ----- Original Message ----- From: "security" > >> Switch is the Netgear GS105 (5 port, supposedly wire speed, cables are >> Belkin 5e), both systems are on this switch. >> >> FreeBSD box: >> FreeBSD 6.1-RELEASE-p10 >> 1.9Ghz Athlon / 1 gig of main mem >> Abit/nforce2 MB+chipset (onboard nic disabled in bios) >> Intel Pro/1000GT NIC >> sysctl.conf: >> kern.ipc.maxsockbuf=8192000 >> net.inet.tcp.sendspace=262144 >> net.inet.tcp.recvspace=262144 >> sysctl reports: >> kern.ipc.nmbjumbo16: 0 >> kern.ipc.nmbjumbo9: 0 >> kern.ipc.nmbjumbop: 0 >> kern.ipc.nmbclusters: 25600 >> net.inet.tcp.rfc1323: 1 >> >> em0: flags=8843 mtu 1500 >> options=b >> inet6 fe80::20e:cff:feda:1a3c%em0 prefixlen 64 scopeid 0x1 >> inet 192.168.1.15 netmask 0xffffff00 broadcast 192.168.1.255 >> ether 00:0e:0c:da:1a:3c >> media: Ethernet autoselect (1000baseTX ) >> status: active From owner-freebsd-performance@FreeBSD.ORG Tue May 22 17:36:13 2007 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id BF16016A400; Tue, 22 May 2007 17:36:13 +0000 (UTC) (envelope-from aedwards@sandvine.com) Received: from gw.sandvine.com (gw.sandvine.com [199.243.201.138]) by mx1.freebsd.org (Postfix) with ESMTP id 2DFED13C457; Tue, 22 May 2007 17:36:12 +0000 (UTC) (envelope-from aedwards@sandvine.com) Received: from exchange-2.sandvine.com ([192.168.16.12]) by gw.sandvine.com with Microsoft SMTPSVC(6.0.3790.1830); Tue, 22 May 2007 13:35:10 -0400 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable Date: Tue, 22 May 2007 13:35:09 -0400 Message-ID: <5230D3C40B842D4F9FB3CD368021BEF72F093F@exchange-2.sandvine.com> In-Reply-To: <5230D3C40B842D4F9FB3CD368021BEF72F092A@exchange-2.sandvine.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Ufs dead-locks on freebsd 6.2 Thread-Index: AceZgA8XJQM9a6noQX+h86ioyzHh9wAHOkwgAAtyb/AAsywZQA== References: <5230D3C40B842D4F9FB3CD368021BEF72F0926@exchange-2.sandvine.com> <5230D3C40B842D4F9FB3CD368021BEF72F092A@exchange-2.sandvine.com> From: "Andrew Edwards" To: , X-OriginalArrivalTime: 22 May 2007 17:35:10.0284 (UTC) FILETIME=[8BCD70C0:01C79C97] Cc: Subject: RE: Ufs dead-locks on freebsd 6.2 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 May 2007 17:36:13 -0000 It's been a couple of days with no response, how do I know if anyone is looking into this problem? > -----Original Message----- > From: owner-freebsd-fs@freebsd.org [mailto:owner-freebsd-fs@freebsd.org] > On Behalf Of Andrew Edwards > Sent: Saturday, May 19, 2007 12:34 AM > To: freebsd-fs@freebsd.org; freebsd-performance@freebsd.org > Subject: RE: Ufs dead-locks on freebsd 6.2 >=20 > Fsck didn't help but below is a list of processes that were stuck in > disk. Also, one potential problem I've hit is I have mrtg scripts that > get launched from cron every min. MRTG is supposed to have a locking > mechanism to prevent the same script from running at the same time but I > suspect since the filesystem was unaccessible the cron jobs just kept > piling up and piling up until the system would eventually crash. I > caught it when the load avg. was at 620 and killed all the cron's I > could. That brought the load avg. down to under 1 however system is > still taking up 30% of the processor time and the disks are basically > idle. I can still do an ls -l on the root of all my mounted ufs and nfs > filesystems but on one it's taking a considerable amount longer than the > rest. This particular rsync that I was running is copying into the /d2 > fs. >=20 > The system is still running and I can make tpc connections and > somethings I have running from inetd work but ssh stops responding right > away and I can't logon via the console. So, I've captured a core dump > of the system and rebooted so that I could use it again. Are there any > suggestion as to what to do next? I'm debaiting installing an adaptec > raid and rebuilding the system to see if I get the same problem, my > worry is that it's the intel raid drivers that are causing this problem > and I have 4 other systems with the same card. >=20 >=20 > PID TT STAT TIME COMMAND > 2 ?? DL 0:04.86 [g_event] > 3 ?? DL 2:05.90 [g_up] > 4 ?? DL 1:07.95 [g_down] > 5 ?? DL 0:00.00 [xpt_thrd] > 6 ?? DL 0:00.00 [kqueue taskq] > 7 ?? DL 0:00.00 [thread taskq] > 8 ?? DL 0:06.96 [pagedaemon] > 9 ?? DL 0:00.00 [vmdaemon] > 15 ?? DL 0:22.28 [yarrow] > 24 ?? DL 0:00.01 [usb0] > 25 ?? DL 0:00.00 [usbtask] > 27 ?? DL 0:00.01 [usb1] > 29 ?? DL 0:00.01 [usb2] > 36 ?? DL 1:28.73 [pagezero] > 37 ?? DL 0:08.76 [bufdaemon] > 38 ?? DL 0:00.54 [vnlru] > 39 ?? DL 1:08.12 [syncer] > 40 ?? DL 0:04.00 [softdepflush] > 41 ?? DL 0:11.05 [schedcpu] > 27182 ?? Ds 0:05.75 /usr/sbin/syslogd -l /var/run/log -l > /var/named/var/run/log -b 127.0.0.1 -a 10.128.0.0/10 > 27471 ?? Is 0:01.10 /usr/local/bin/postmaster -D > /usr/local/pgsql/data (postgres) > 27594 ?? Is 0:00.04 /usr/libexec/ftpd -m -D -l -l > 27602 ?? DL 0:00.28 [smbiod1] > 96581 ?? D 0:00.00 cron: running job (cron) > 96582 ?? D 0:00.00 cron: running job (cron) > 96583 ?? D 0:00.00 cron: running job (cron) > 96585 ?? D 0:00.00 cron: running job (cron) > 96586 ?? D 0:00.00 cron: running job (cron) > 96587 ?? D 0:00.00 cron: running job (cron) > 96588 ?? D 0:00.00 cron: running job (cron) > 96589 ?? D 0:00.00 cron: running job (cron) > 96590 ?? D 0:00.00 cron: running job (cron) > 96591 ?? D 0:00.00 cron: running job (cron) > 96592 ?? D 0:00.00 cron: running job (cron) > 96593 ?? D 0:00.00 cron: running job (cron) > 96594 ?? D 0:00.00 cron: running job (cron) > 96607 ?? D 0:00.00 cron: running job (cron) > 96608 ?? D 0:00.00 cron: running job (cron) > 96609 ?? D 0:00.00 cron: running job (cron) > 96610 ?? D 0:00.00 cron: running job (cron) > 96611 ?? D 0:00.00 cron: running job (cron) > 96612 ?? D 0:00.00 cron: running job (cron) > 96613 ?? D 0:00.00 cron: running job (cron) > 96614 ?? D 0:00.00 cron: running job (cron) > 96615 ?? D 0:00.00 cron: running job (cron) > 96616 ?? D 0:00.00 cron: running job (cron) > 96617 ?? D 0:00.00 cron: running job (cron) > 96631 ?? D 0:00.00 cron: running job (cron) > 96632 ?? D 0:00.00 cron: running job (cron) > 96633 ?? D 0:00.00 cron: running job (cron) > 96634 ?? D 0:00.00 cron: running job (cron) > 96635 ?? D 0:00.00 cron: running job (cron) > 96636 ?? D 0:00.00 cron: running job (cron) > 96637 ?? D 0:00.00 cron: running job (cron) > 96638 ?? D 0:00.00 cron: running job (cron) > 96639 ?? D 0:00.00 cron: running job (cron) > 96642 ?? D 0:00.00 cron: running job (cron) > 96650 ?? D 0:00.00 cron: running job (cron) > 29393 p0 D+ 22:04.58 /usr/local/bin/rsync >=20 > real 0m0.012s > user 0m0.000s > sys 0m0.010s > / >=20 > real 0m0.019s > user 0m0.000s > sys 0m0.016s > /var >=20 > real 0m0.028s > user 0m0.008s > sys 0m0.018s > /diskless >=20 > real 0m0.017s > user 0m0.008s > sys 0m0.007s > /usr >=20 > real 0m0.016s > user 0m0.000s > sys 0m0.015s > /d2 >=20 > real 0m0.024s > user 0m0.000s > sys 0m0.023s > /exports/home >=20 > real 0m2.559s > user 0m0.216s > sys 0m2.307s >=20 > -----Original Message----- > From: owner-freebsd-fs@freebsd.org [mailto:owner-freebsd-fs@freebsd.org] > On Behalf Of Andrew Edwards > Sent: Friday, May 18, 2007 6:44 PM > To: freebsd-fs@freebsd.org; freebsd-performance@freebsd.org > Subject: RE: Ufs dead-locks on freebsd 6.2 >=20 > Okay, I let memtest run for a full day and there has been no memory > errors. What do I do next? Just to be on the safe side I'll fsck all > of my fs's and try to reproduce the problem again. >=20 > I also don't know what zonelimit is, I see this on similarily configured > machines but running 5.4. I know it's related to network as I > periodically get network connections to work i.e. ssh, ftp (both server > and client side) but eventually the box will deadlock. Should I start a > different thread on this? Happens about once every 30 days on two > server although I havn't checked the exact timing. >=20 > -----Original Message----- > From: owner-freebsd-fs@freebsd.org [mailto:owner-freebsd-fs@freebsd.org] > On Behalf Of Eric Anderson > Sent: Friday, May 18, 2007 3:09 PM > To: Kris Kennaway > Cc: freebsd-fs@freebsd.org > Subject: Re: Ufs dead-locks on freebsd 6.2 >=20 > On 05/18/07 14:00, Kris Kennaway wrote: > > On Thu, May 17, 2007 at 11:38:20PM -0500, Eric Anderson wrote: > >> On 05/17/07 12:47, Kostik Belousov wrote: > >>> On Thu, May 17, 2007 at 01:03:37PM -0400, Andrew Edwards wrote: > >>>> Here it is. > >>>> > >>>> db> show vnode 0xccd47984 > >>>> vnode 0xccd47984: tag ufs, type VDIR > >>>> usecount 5135, writecount 0, refcount 5137 mountedhere 0 > >>>> flags (VV_ROOT) > >>>> v_object 0xcd02518c ref 0 pages 1 > >>>> #0 0xc0593f0d at lockmgr+0x4ed > >>>> #1 0xc06b8e0e at ffs_lock+0x76 > >>>> #2 0xc0739787 at VOP_LOCK_APV+0x87 > >>>> #3 0xc0601c28 at vn_lock+0xac > >>>> #4 0xc05ee832 at lookup+0xde > >>>> #5 0xc05ee4b2 at namei+0x39a > >>>> #6 0xc05e2ab0 at unp_connect+0xf0 > >>>> #7 0xc05e1a6a at uipc_connect+0x66 > >>>> #8 0xc05d9992 at soconnect+0x4e > >>>> #9 0xc05dec60 at kern_connect+0x74 > >>>> #10 0xc05debdf at connect+0x2f > >>>> #11 0xc0723e2b at syscall+0x25b > >>>> #12 0xc070ee0f at Xint0x80_syscall+0x1f > >>>> > >>>> ino 2, on dev amrd0s1a > >>> It seems to be the sort of things that cannot happen. VOP_LOCK() > >>> returned 0, but vnode was not really locked. > >>> > >>> Although claiming that kernel code cannot have such bug is too > >>> optimistic, I would first make sure that: > >>> 1. You checked the memory of the machine. > >>> 2. Your kernel is built from pristine sources. > >> > >> This looks precisely like a lock I was seeing on one of my NFS > servers. > >> Only one of the filesystems would cause it, but it was the same one > >> each time, not necessarily under any kind of load. Things like > >> mountd would get wedged in state 'ufs', and other things would get > >> stuck in one of the lock states (I can't recall). > > > > ...so you cannot conclude that it looks "precisely like" this case. > > > > Please, don't confuse bug reports by this kind of claim unless you > > have made a detailed comparison of the debugging traces to yours. >=20 >=20 > Understood - my mistake. >=20 > Eric >=20 >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-performance@FreeBSD.ORG Sat May 26 01:01:09 2007 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7A7EC16A41F; Sat, 26 May 2007 01:01:09 +0000 (UTC) (envelope-from aedwards@sandvine.com) Received: from gw.sandvine.com (gw.sandvine.com [199.243.201.138]) by mx1.freebsd.org (Postfix) with ESMTP id 1604513C45E; Sat, 26 May 2007 01:01:08 +0000 (UTC) (envelope-from aedwards@sandvine.com) Received: from exchange-2.sandvine.com ([192.168.16.12]) by gw.sandvine.com with Microsoft SMTPSVC(6.0.3790.3959); Fri, 25 May 2007 21:01:08 -0400 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.5 Date: Fri, 25 May 2007 21:01:07 -0400 Message-ID: <5230D3C40B842D4F9FB3CD368021BEF72F09AC@exchange-2.sandvine.com> In-Reply-To: <5230D3C40B842D4F9FB3CD368021BEF72F093F@exchange-2.sandvine.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Ufs dead-locks on freebsd 6.2 Thread-Index: AceZgA8XJQM9a6noQX+h86ioyzHh9wAHOkwgAAtyb/AAsywZQACkuL8g From: "Andrew Edwards" To: , X-OriginalArrivalTime: 26 May 2007 01:01:08.0285 (UTC) FILETIME=[580D9ED0:01C79F31] Cc: Subject: RE: Ufs dead-locks on freebsd 6.2 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 May 2007 01:01:09 -0000 I guess I feel similar to Gore Jarold with his posting about being frustrated with ufs. I have a serious problem, it's preventing me from upgrading my production systems to freebsd 6+, I can reproduce this problem easily but I can't seem to get anyone to assist me. At the suggestion of one of our internal developers I have enabled memguard to help find the cause of the panic and I'm posting the current backtrace etc. as recommended from the developers handbook on debgging dead-locks. If someone can help me it would be greatly appreciated. db> bt Tracing pid 26543 tid 105117 td 0xd41c6a80 kdb_enter(c0785f13) at kdb_enter+0x2b vfs_badlock(c0785f2c,c0786051,cd3dc414) at vfs_badlock+0x47 assert_vop_locked(cd3dc414,c0786051) at assert_vop_locked+0x4a vop_lock_post(fa048bec,0,1002,cd3dc414,fa048c08,...) at vop_lock_post+0x2a VOP_LOCK_APV(c07dc2e0,fa048bec) at VOP_LOCK_APV+0xa0 vn_lock(cd3dc414,1002,d41c6a80,0,0,...) at vn_lock+0xac vn_statfile(cd3f0870,fa048c74,cd39f280,d41c6a80) at vn_statfile+0x63 kern_fstat(d41c6a80,3,fa048c74) at kern_fstat+0x35 fstat(d41c6a80,fa048d04) at fstat+0x19 syscall(3b,805003b,bfbf003b,8054000,8054000,...) at syscall+0x25b Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (189, FreeBSD ELF32, fstat), eip =3D 0x6814b08f, esp =3D 0xbfbfec2c, ebp =3D 0xbfbfeca8 --- db> show pcpu cpuid =3D 1 curthread =3D 0xd41c6a80: pid 26543 "cron" curpcb =3D 0xfa048d90 fpcurthread =3D none idlethread =3D 0xccb5c600: pid 10 "idle: cpu1" APIC ID =3D 6 currentldt =3D 0x50 spin locks held: db> show allpcpu Current CPU: 1 cpuid =3D 0 curthread =3D 0xcd01ec00: pid 898 "cron" curpcb =3D 0xf5ad7d90 fpcurthread =3D none idlethread =3D 0xccb5c780: pid 11 "idle: cpu0" APIC ID =3D 0 currentldt =3D 0x50 spin locks held: cpuid =3D 1 curthread =3D 0xd41c6a80: pid 26543 "cron" curpcb =3D 0xfa048d90 fpcurthread =3D none idlethread =3D 0xccb5c600: pid 10 "idle: cpu1" APIC ID =3D 6 currentldt =3D 0x50 spin locks held: db> show locks exclusive sx user map r =3D 0 (0xd4135734) locked @ /usr/src/sys/vm/vm_map.c:3074 db> show alllocks Process 26543 (cron) thread 0xd41c6a80 (105117) exclusive sx user map r =3D 0 (0xd4135734) locked @ /usr/src/sys/vm/vm_map.c:3074 db> show lockedvnods Locked vnodes 0xce00cc3c: tag ufs, type VREG usecount 1, writecount 1, refcount 11 mountedhere 0 flags () v_object 0xcdae9b58 ref 0 pages 96 lock type ufs: EXCL (count 1) by thread 0xccffb300 (pid 18081) with 1 pending#0 0xc0593f0d at lockmgr+0x4ed #1 0xc06b8e0e at ffs_lock+0x76 #2 0xc0739787 at VOP_LOCK_APV+0x87 #3 0xc0601c28 at vn_lock+0xac #4 0xc06015c4 at vn_write+0x138 #5 0xc05c4544 at dofilewrite+0x7c #6 0xc05c43e3 at kern_writev+0x3b #7 0xc05c4309 at write+0x45 #8 0xc0723e2b at syscall+0x25b #9 0xc070ee0f at Xint0x80_syscall+0x1f ino 494730, on dev amrd0s1d 0xccf936cc: tag ufs, type VDIR usecount 1, writecount 0, refcount 2494 mountedhere 0 flags (VV_ROOT) v_object 0xd18598c4 ref 0 pages 0 lock type ufs: EXCL (count 1) by thread 0xcdff7000 (pid 19300) with 2492 pending#0 0xc0593f0d at lockmgr+0x4ed #1 0xc06b8e0e at ffs_lock+0x76 #2 0xc0739787 at VOP_LOCK_APV+0x87 #3 0xc0601c28 at vn_lock+0xac #4 0xc05f5d12 at vget+0xbe #5 0xc05ed9f9 at vfs_hash_get+0x8d #6 0xc06b7b8f at ffs_vget+0x27 #7 0xc06c1435 at ufs_root+0x19 #8 0xc05eef1c at lookup+0x7c8 #9 0xc05ee4b2 at namei+0x39a #10 0xc0600a13 at vn_open_cred+0x5b #11 0xc06009b6 at vn_open+0x1e #12 0xc05fa126 at kern_open+0xb6 #13 0xc05fa03a at open+0x1a #14 0xc0723e2b at syscall+0x25b #15 0xc070ee0f at Xint0x80_syscall+0x1f ino 2, on dev amrd0s1d 0xcd00a000: tag ufs, type VDIR usecount 1, writecount 0, refcount 3 mountedhere 0 flags () v_object 0xce083210 ref 0 pages 0 lock type ufs: EXCL (count 1) by thread 0xcd574d80 (pid 89705) with 1 pending#0 0xc0593f0d at lockmgr+0x4ed #1 0xc06b8e0e at ffs_lock+0x76 #2 0xc0739787 at VOP_LOCK_APV+0x87 #3 0xc0601c28 at vn_lock+0xac #4 0xc05f5d12 at vget+0xbe #5 0xc05ea48e at cache_lookup+0x34a #6 0xc05ea9c2 at vfs_cache_lookup+0x92 #7 0xc0737847 at VOP_LOOKUP_APV+0x87 #8 0xc05eebf8 at lookup+0x4a4 #9 0xc05ee4b2 at namei+0x39a #10 0xc0600a13 at vn_open_cred+0x5b #11 0xc06009b6 at vn_open+0x1e #12 0xc05fa126 at kern_open+0xb6 #13 0xc05fa03a at open+0x1a #14 0xc0723e2b at syscall+0x25b #15 0xc070ee0f at Xint0x80_syscall+0x1f ino 494592, on dev amrd0s1d 0xcd23f15c: tag ufs, type VREG usecount 1, writecount 1, refcount 4 mountedhere 0 flags () v_object 0xcd16ec60 ref 0 pages 12 lock type ufs: EXCL (count 1) by thread 0xccff8300 (pid 713)#0 0xc0593f0d at lockmgr+0x4ed #1 0xc06b8e0e at ffs_lock+0x76 #2 0xc0739787 at VOP_LOCK_APV+0x87 #3 0xc0601c28 at vn_lock+0xac #4 0xc06015c4 at vn_write+0x138 #5 0xc05c4544 at dofilewrite+0x7c #6 0xc05c43e3 at kern_writev+0x3b #7 0xc05c4309 at write+0x45 #8 0xc0723e2b at syscall+0x25b #9 0xc070ee0f at Xint0x80_syscall+0x1f ino 494620, on dev amrd0s1d 0xd02a0984: tag ufs, type VREG usecount 1, writecount 1, refcount 3 mountedhere 0 flags () v_object 0xcd89fad4 ref 0 pages 3 lock type ufs: EXCL (count 1) by thread 0xccffb000 (pid 603)#0 0xc0593f0d at lockmgr+0x4ed #1 0xc06b8e0e at ffs_lock+0x76 #2 0xc0739787 at VOP_LOCK_APV+0x87 #3 0xc0601c28 at vn_lock+0xac #4 0xc05fd950 at fsync+0x9c #5 0xc0723e2b at syscall+0x25b #6 0xc070ee0f at Xint0x80_syscall+0x1f ino 188453, on dev amrd0s1d 0xd0b442b8: tag ufs, type VREG usecount 1, writecount 1, refcount 2 mountedhere 0 flags () v_object 0xcffc88c4 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xcd590480 (pid 18304)#0 0xc0593f0d at lockmgr+0x4ed #1 0xc06b8e0e at ffs_lock+0x76 #2 0xc0739787 at VOP_LOCK_APV+0x87 #3 0xc0601c28 at vn_lock+0xac #4 0xc05fd950 at fsync+0x9c #5 0xc0723e2b at syscall+0x25b #6 0xc070ee0f at Xint0x80_syscall+0x1f ino 424403, on dev amrd0s1d 0xd24d3c3c: tag ufs, type VREG usecount 1, writecount 1, refcount 2 mountedhere 0 flags () v_object 0xd13b3ce4 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xcd59ec00 (pid 18735)#0 0xc0593f0d at lockmgr+0x4ed #1 0xc06b8e0e at ffs_lock+0x76 #2 0xc0739787 at VOP_LOCK_APV+0x87 #3 0xc0601c28 at vn_lock+0xac #4 0xc05fd950 at fsync+0x9c #5 0xc0723e2b at syscall+0x25b #6 0xc070ee0f at Xint0x80_syscall+0x1f ino 424405, on dev amrd0s1d 0xd09d6414: tag ufs, type VREG usecount 1, writecount 1, refcount 3 mountedhere 0 flags () v_object 0xcfd71b58 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xcd6cfc00 (pid 20083)#0 0xc0593f0d at lockmgr+0x4ed #1 0xc06b8e0e at ffs_lock+0x76 #2 0xc0739787 at VOP_LOCK_APV+0x87 #3 0xc0601c28 at vn_lock+0xac #4 0xc05fd950 at fsync+0x9c #5 0xc0723e2b at syscall+0x25b #6 0xc070ee0f at Xint0x80_syscall+0x1f ino 424406, on dev amrd0s1d 0xce0762b8: tag ufs, type VREG usecount 1, writecount 1, refcount 3 mountedhere 0 flags () v_object 0xd14fcdec ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xcd01e000 (pid 17274)#0 0xc0593f0d at lockmgr+0x4ed #1 0xc06b8e0e at ffs_lock+0x76 #2 0xc0739787 at VOP_LOCK_APV+0x87 #3 0xc0601c28 at vn_lock+0xac #4 0xc05fd950 at fsync+0x9c #5 0xc0723e2b at syscall+0x25b #6 0xc070ee0f at Xint0x80_syscall+0x1f ino 424408, on dev amrd0s1d 0xce91f414: tag ufs, type VREG usecount 1, writecount 1, refcount 3 mountedhere 0 flags () v_object 0xd24af9cc ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xce03d780 (pid 20575)#0 0xc0593f0d at lockmgr+0x4ed #1 0xc06b8e0e at ffs_lock+0x76 #2 0xc0739787 at VOP_LOCK_APV+0x87 #3 0xc0601c28 at vn_lock+0xac #4 0xc05fd950 at fsync+0x9c #5 0xc0723e2b at syscall+0x25b #6 0xc070ee0f at Xint0x80_syscall+0x1f ino 424409, on dev amrd0s1d 0xd1701d98: tag ufs, type VREG usecount 1, writecount 1, refcount 3 mountedhere 0 flags () v_object 0xd0c8d39c ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xcd6ce180 (pid 21143)#0 0xc0593f0d at lockmgr+0x4ed #1 0xc06b8e0e at ffs_lock+0x76 #2 0xc0739787 at VOP_LOCK_APV+0x87 #3 0xc0601c28 at vn_lock+0xac #4 0xc05fd950 at fsync+0x9c #5 0xc0723e2b at syscall+0x25b #6 0xc070ee0f at Xint0x80_syscall+0x1f ino 424411, on dev amrd0s1d 0xcedf7828: tag ufs, type VREG usecount 1, writecount 1, refcount 3 mountedhere 0 flags () v_object 0xd2c37948 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xcd576000 (pid 21114)#0 0xc0593f0d at lockmgr+0x4ed #1 0xc06b8e0e at ffs_lock+0x76 #2 0xc0739787 at VOP_LOCK_APV+0x87 #3 0xc0601c28 at vn_lock+0xac #4 0xc05fd950 at fsync+0x9c #5 0xc0723e2b at syscall+0x25b #6 0xc070ee0f at Xint0x80_syscall+0x1f ino 424412, on dev amrd0s1d 0xd1679570: tag ufs, type VREG usecount 1, writecount 1, refcount 69 mountedhere 0 flags () v_object 0xd1fd0318 ref 0 pages 9308 lock type ufs: EXCL (count 1) by thread 0xcd59ea80 (pid 65735)#0 0xc0593f0d at lockmgr+0x4ed #1 0xc06b8e0e at ffs_lock+0x76 #2 0xc0739787 at VOP_LOCK_APV+0x87 #3 0xc0601c28 at vn_lock+0xac #4 0xc06015c4 at vn_write+0x138 #5 0xc05c4544 at dofilewrite+0x7c #6 0xc05c43e3 at kern_writev+0x3b #7 0xc05c4309 at write+0x45 #8 0xc0723e2b at syscall+0x25b #9 0xc070ee0f at Xint0x80_syscall+0x1f ino 112037039, on dev amrd1s1d db> alltrace Tracing command pid 26548 tid 105114 td 0xd41c7000 *** error reading from address b4ba72d3 *** -----Original Message----- From: owner-freebsd-fs@freebsd.org [mailto:owner-freebsd-fs@freebsd.org] On Behalf Of Andrew Edwards Sent: Tuesday, May 22, 2007 1:35 PM To: freebsd-fs@freebsd.org; freebsd-performance@freebsd.org Subject: RE: Ufs dead-locks on freebsd 6.2 It's been a couple of days with no response, how do I know if anyone is looking into this problem? > -----Original Message----- > From: owner-freebsd-fs@freebsd.org [mailto:owner-freebsd-fs@freebsd.org] > On Behalf Of Andrew Edwards > Sent: Saturday, May 19, 2007 12:34 AM > To: freebsd-fs@freebsd.org; freebsd-performance@freebsd.org > Subject: RE: Ufs dead-locks on freebsd 6.2 >=20 > Fsck didn't help but below is a list of processes that were stuck in=20 > disk. Also, one potential problem I've hit is I have mrtg scripts that > get launched from cron every min. MRTG is supposed to have a locking=20 > mechanism to prevent the same script from running at the same time but I > suspect since the filesystem was unaccessible the cron jobs just kept=20 > piling up and piling up until the system would eventually crash. I=20 > caught it when the load avg. was at 620 and killed all the cron's I=20 > could. That brought the load avg. down to under 1 however system is=20 > still taking up 30% of the processor time and the disks are basically=20 > idle. I can still do an ls -l on the root of all my mounted ufs and nfs > filesystems but on one it's taking a considerable amount longer than the > rest. This particular rsync that I was running is copying into the /d2 > fs. >=20 > The system is still running and I can make tpc connections and=20 > somethings I have running from inetd work but ssh stops responding right > away and I can't logon via the console. So, I've captured a core dump > of the system and rebooted so that I could use it again. Are there any > suggestion as to what to do next? I'm debaiting installing an adaptec > raid and rebuilding the system to see if I get the same problem, my=20 > worry is that it's the intel raid drivers that are causing this problem > and I have 4 other systems with the same card. >=20 >=20 > PID TT STAT TIME COMMAND > 2 ?? DL 0:04.86 [g_event] > 3 ?? DL 2:05.90 [g_up] > 4 ?? DL 1:07.95 [g_down] > 5 ?? DL 0:00.00 [xpt_thrd] > 6 ?? DL 0:00.00 [kqueue taskq] > 7 ?? DL 0:00.00 [thread taskq] > 8 ?? DL 0:06.96 [pagedaemon] > 9 ?? DL 0:00.00 [vmdaemon] > 15 ?? DL 0:22.28 [yarrow] > 24 ?? DL 0:00.01 [usb0] > 25 ?? DL 0:00.00 [usbtask] > 27 ?? DL 0:00.01 [usb1] > 29 ?? DL 0:00.01 [usb2] > 36 ?? DL 1:28.73 [pagezero] > 37 ?? DL 0:08.76 [bufdaemon] > 38 ?? DL 0:00.54 [vnlru] > 39 ?? DL 1:08.12 [syncer] > 40 ?? DL 0:04.00 [softdepflush] > 41 ?? DL 0:11.05 [schedcpu] > 27182 ?? Ds 0:05.75 /usr/sbin/syslogd -l /var/run/log -l > /var/named/var/run/log -b 127.0.0.1 -a 10.128.0.0/10 > 27471 ?? Is 0:01.10 /usr/local/bin/postmaster -D > /usr/local/pgsql/data (postgres) > 27594 ?? Is 0:00.04 /usr/libexec/ftpd -m -D -l -l > 27602 ?? DL 0:00.28 [smbiod1] > 96581 ?? D 0:00.00 cron: running job (cron) > 96582 ?? D 0:00.00 cron: running job (cron) > 96583 ?? D 0:00.00 cron: running job (cron) > 96585 ?? D 0:00.00 cron: running job (cron) > 96586 ?? D 0:00.00 cron: running job (cron) > 96587 ?? D 0:00.00 cron: running job (cron) > 96588 ?? D 0:00.00 cron: running job (cron) > 96589 ?? D 0:00.00 cron: running job (cron) > 96590 ?? D 0:00.00 cron: running job (cron) > 96591 ?? D 0:00.00 cron: running job (cron) > 96592 ?? D 0:00.00 cron: running job (cron) > 96593 ?? D 0:00.00 cron: running job (cron) > 96594 ?? D 0:00.00 cron: running job (cron) > 96607 ?? D 0:00.00 cron: running job (cron) > 96608 ?? D 0:00.00 cron: running job (cron) > 96609 ?? D 0:00.00 cron: running job (cron) > 96610 ?? D 0:00.00 cron: running job (cron) > 96611 ?? D 0:00.00 cron: running job (cron) > 96612 ?? D 0:00.00 cron: running job (cron) > 96613 ?? D 0:00.00 cron: running job (cron) > 96614 ?? D 0:00.00 cron: running job (cron) > 96615 ?? D 0:00.00 cron: running job (cron) > 96616 ?? D 0:00.00 cron: running job (cron) > 96617 ?? D 0:00.00 cron: running job (cron) > 96631 ?? D 0:00.00 cron: running job (cron) > 96632 ?? D 0:00.00 cron: running job (cron) > 96633 ?? D 0:00.00 cron: running job (cron) > 96634 ?? D 0:00.00 cron: running job (cron) > 96635 ?? D 0:00.00 cron: running job (cron) > 96636 ?? D 0:00.00 cron: running job (cron) > 96637 ?? D 0:00.00 cron: running job (cron) > 96638 ?? D 0:00.00 cron: running job (cron) > 96639 ?? D 0:00.00 cron: running job (cron) > 96642 ?? D 0:00.00 cron: running job (cron) > 96650 ?? D 0:00.00 cron: running job (cron) > 29393 p0 D+ 22:04.58 /usr/local/bin/rsync >=20 > real 0m0.012s > user 0m0.000s > sys 0m0.010s > / >=20 > real 0m0.019s > user 0m0.000s > sys 0m0.016s > /var >=20 > real 0m0.028s > user 0m0.008s > sys 0m0.018s > /diskless >=20 > real 0m0.017s > user 0m0.008s > sys 0m0.007s > /usr >=20 > real 0m0.016s > user 0m0.000s > sys 0m0.015s > /d2 >=20 > real 0m0.024s > user 0m0.000s > sys 0m0.023s > /exports/home >=20 > real 0m2.559s > user 0m0.216s > sys 0m2.307s >=20