From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 12 04:47:33 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1A12C1065670; Sat, 12 Sep 2009 04:47:33 +0000 (UTC) (envelope-from linda.messerschmidt@gmail.com) Received: from mail-qy0-f200.google.com (mail-qy0-f200.google.com [209.85.221.200]) by mx1.freebsd.org (Postfix) with ESMTP id ADB998FC19; Sat, 12 Sep 2009 04:47:32 +0000 (UTC) Received: by qyk38 with SMTP id 38so1416699qyk.27 for ; Fri, 11 Sep 2009 21:47:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=E0ES0qzX1s/QQHqzNRUrEmLteYbJ5VeX2hoOHPWWUg8=; b=stI2a9NCwc458GiVAL7onTljaIwWDcQLtpWXJN74tA/5qGt/7WEVou34mtg2CgHZHB QOovx+6zhUOd+76ukA1gWp1FvyV//3aq0pczq6fXcdIYupB0PKyj2Av71yr2QY3qjWzp WCptlpP9YO4L5W6mXdjQdPJQlTJeHs5MqyFkg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=u6UEqjLwxIaCAl8xDIukGAj/ufoEowGSIX73EPaV9QWW3Hth0EgSsHTatd3UbmZ2zf mMtKtRExEiHzs0PR4fQsP0ZFCyhuqCuDbhuwXYBpoqMQAEaCOKI2U5taEURdDLPGP+35 ALsZuPH/FPED64my6nLoETbjTuD5Cg2pYrCQ8= MIME-Version: 1.0 Received: by 10.229.119.69 with SMTP id y5mr1469532qcq.100.1252730851756; Fri, 11 Sep 2009 21:47:31 -0700 (PDT) In-Reply-To: <4AAB1E34.2060908@elischer.org> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200909111102.14503.jhb@freebsd.org> <237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com> <200909111506.47309.jhb@freebsd.org> <237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com> <237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com> <4AAB1E34.2060908@elischer.org> Date: Sat, 12 Sep 2009 00:47:31 -0400 Message-ID: <237c27100909112147h64f71585p2a97f2b48a510985@mail.gmail.com> From: Linda Messerschmidt To: Julian Elischer Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-hackers@freebsd.org Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Sep 2009 04:47:33 -0000 On Sat, Sep 12, 2009 at 12:06 AM, Julian Elischer wro= te: > does the system have a serial console? how about a normal console /keyboa= rd? It has an IP KVM. > how often deos it hang? and for =A0how long? Well, this is interesting. I got really frustrated with the other approach, so I thought I'd thin a machine down absolutely as far as I could, eliminate every possible source of delay, and see what happens. I killed everything... cron, RPC, NFS, devd, gmon, nrpe, everything. The Apache and its exerciser are now the only things running on the machine, and the Apache is only touching an md0 swap device mounted on /mnt. I *still* get the hangs. It hangs for all sorts of different periods, but the duration of the stall is approximately inversely proportional to the chance of seeing it. To get a short delay, you need wait only a little bit. If you want a 2-3 second delay, you may have to wait 15-20 minutes. *However* in order to answer your question, I changed up the test program, which up til now has been cycling requests every 50 ms until it gets one >2s, at which point it sysctls to stop ktr and aborts. Now it prints the timestamp of all "too long" requests. But I also dropped the threshold for "too long" from 2s to 100ms, since with everything on RAM disk, there's no longer any reason to expect a request to take more than 1-2ms in the worst case. The results are pretty profound: 1252729876: request 82 131ms 1252729883: request 210 388ms 1252729890: request 338 380ms 1252729897: request 466 388ms 1252729904: request 594 404ms 1252729919: request 849 810ms 1252729926: request 977 386ms 1252729933: request 1105 370ms 1252729940: request 1233 366ms 1252729947: request 1361 400ms 1252729961: request 1617 746ms 1252729968: request 1744 477ms 1252729975: request 1872 388ms 1252729982: request 2000 380ms 1252729989: request 2128 384ms 1252729996: request 2256 395ms It goes on and on like this, I get a 380-400ms stall every seven seconds. I have had a few come back higher, in the 750-850ms range, usually after missing a beat: 1252729897: request 466 388ms 1252729904: request 594 404ms 1252729919: request 849 810ms 1252729926: request 977 386ms 1252730010: request 2512 416ms 1252730017: request 2640 390ms 1252730031: request 2896 774ms 1252730038: request 3023 431ms 1252730454: request 10568 378ms 1252730461: request 10696 397ms 1252730475: request 10952 733ms 1252730482: request 11080 366ms So far, nothing over 1s. So what happens every seven seconds??