From owner-freebsd-hackers@FreeBSD.ORG  Mon Sep  7 10:59:58 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B04D11065679
	for <freebsd-hackers@freebsd.org>; Mon,  7 Sep 2009 10:59:58 +0000 (UTC)
	(envelope-from rivanr@gmail.com)
Received: from mail-bw0-f206.google.com (mail-bw0-f206.google.com
	[209.85.218.206])
	by mx1.freebsd.org (Postfix) with ESMTP id 3C9498FC1C
	for <freebsd-hackers@freebsd.org>; Mon,  7 Sep 2009 10:59:58 +0000 (UTC)
Received: by bwz2 with SMTP id 2so278940bwz.43
	for <freebsd-hackers@freebsd.org>; Mon, 07 Sep 2009 03:59:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:date:from
	:user-agent:mime-version:to:subject:content-type
	:content-transfer-encoding;
	bh=VTpENWKrHuY+cvqqBVH0oyIJUs0WYio5TdfOsBbe7ds=;
	b=lzFmTN6GDKausJdDWnNzxnd52ga98lx7W+yceOrhY73ipjnpRpy3MTIJmzFQm4tX7R
	f9i8FcJyIr0Y1e/rff+Y6t39G4UmKwxEFmYyP7+lUBL6L/wN4/TcW2NJhgs8GvqKVZKh
	doLurhHmKIEHdHP3UI+xkcqzpaEMMtlfUjsbs=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:date:from:user-agent:mime-version:to:subject
	:content-type:content-transfer-encoding;
	b=KOTxQFgiMTzyx2QQ5Hx9PyXMOVKSD/jiHZPgS2a3bcURAZE/kmS80iiZPpEfNHrmrp
	qDCfchmzl5jZqvzYuaaVV1jUUeWRAdLGhqu6ZAAQobfzM+gO7wyZ+4ngwoPn55U0kCyO
	zHe5I4oTECN40UF2uQ6jJKDH/bpZQUIVgMn+8=
Received: by 10.204.8.13 with SMTP id f13mr11940352bkf.150.1252321197018;
	Mon, 07 Sep 2009 03:59:57 -0700 (PDT)
Received: from azdaja.softwarehood.com ([95.180.33.218])
	by mx.google.com with ESMTPS id p9sm6949840fkb.37.2009.09.07.03.59.52
	(version=TLSv1/SSLv3 cipher=RC4-MD5);
	Mon, 07 Sep 2009 03:59:52 -0700 (PDT)
Message-ID: <4AA4E7A7.60503@gmail.com>
Date: Mon, 07 Sep 2009 12:59:51 +0200
From: Ivan Radovanovic <rivanr@gmail.com>
User-Agent: Thunderbird 2.0.0.22 (X11/20090708)
MIME-Version: 1.0
To: freebsd-hackers@FreeBSD.org
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Kernel panic caused by fork
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 07 Sep 2009 10:59:58 -0000

I was testing FreeBSD's behavior when running many threads at the same 
time (and I find it performs excellent) when I wanted to test how system 
will behave towards program that spawns itself too many times. I wrote a 
very simple program

#include <sys/types.h>
#include <unistd.h>

int main() {
  while(1)
    fork();
  return 0;
}

After running this program I got kernel panic with message
"get_pv_entry: increase vm.pmap.shpgperproc"
IMHO it is not very good idea to bring entire system down if one process 
misbehaves in this way, it is maybe much better to kill offending 
process and to send this message to system log. I am not sure whether 
the panic is actually caused by process forking forever or when the 
system tries to create new process when maxproc limit is already reached 
(since system is only printing warning message that maxproc limit is 
reached and it only panics when I try to start new process (like ps)).
System is FreeBSD 7.2-STABLE

kernel backtrace:

(kgdb) bt
#0  doadump () at pcpu.h:196
#1  0xc05fc477 in boot (howto=260) at ../../../kern/kern_shutdown.c:418
#2  0xc05fc782 in panic (fmt=Variable "fmt" is not available.
) at ../../../kern/kern_shutdown.c:574
#3  0xc087bccf in get_pv_entry (pmap=0xca0cb43c, try=0)
    at ../../../i386/i386/pmap.c:2067
#4  0xc087c0db in pmap_insert_entry (pmap=Variable "pmap" is not available.
) at ../../../i386/i386/pmap.c:2203
#5  0xc087f08e in pmap_enter (pmap=0xca0cb43c, va=671973376, access=1 
'\001',
    m=Variable "m" is not available.
) at ../../../i386/i386/pmap.c:3114
#6  0xc082a947 in vm_fault (map=0xca0cb3b0, vaddr=671973376,
    fault_type=1 '\001', fault_flags=0) at ../../../vm/vm_fault.c:891
#7  0xc0881acb in trap_pfault (frame=0xefc1bd38, usermode=1, eva=671975739)
    at ../../../i386/i386/trap.c:828
#8  0xc0882420 in trap (frame=0xefc1bd38) at ../../../i386/i386/trap.c:396
#9  0xc086724b in calltrap () at ../../../i386/i386/exception.s:166
#10 0x280d893b in ?? ()
Previous frame inner to this frame (corrupt stack?)


From owner-freebsd-hackers@FreeBSD.ORG  Tue Sep  8 09:09:17 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id ED459106568D
	for <freebsd-hackers@freebsd.org>; Tue,  8 Sep 2009 09:09:17 +0000 (UTC)
	(envelope-from rivanr@gmail.com)
Received: from mail-bw0-f206.google.com (mail-bw0-f206.google.com
	[209.85.218.206])
	by mx1.freebsd.org (Postfix) with ESMTP id 76C698FC27
	for <freebsd-hackers@freebsd.org>; Tue,  8 Sep 2009 09:09:17 +0000 (UTC)
Received: by bwz2 with SMTP id 2so796133bwz.43
	for <freebsd-hackers@freebsd.org>; Tue, 08 Sep 2009 02:09:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:date:from
	:user-agent:mime-version:to:cc:subject:references:in-reply-to
	:content-type:content-transfer-encoding;
	bh=hxTcQgRU1X7ndqKzHO671baC+88AFv9jCbHhJqyRr/s=;
	b=SbLurKoJcbcEHPaDzbc2LnfxfxdqXlXputOU02jpXXNfMTuAHpNocLQpnouLKEzIkq
	d8MuLZjha1TT9IMe1UgHtvNexVT0zJD32nnsJBKR7E+Nrl4Pt7hByySpU2yeXCuSotY/
	9ARa0CrYN8oxNHHM3YmBRGw67FmFkyIkS7mgg=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:content-type:content-transfer-encoding;
	b=u6oOSazMB20bE1RH+uEByZWpkpKRZQW6Jp86hUl+ZzvvoHz/D2mPTKxlHSxH8NXsdU
	CtQUqv3weVpvS9jeMzsSiwTbbgoaoIPuut3auanxasRbJz5W+B9qDAVuiEA9rnIZX2Ld
	w0V8Q1NNcxqzXnnO3L5JmTl5ZjwiGrf0PF8kk=
Received: by 10.103.78.35 with SMTP id f35mr6526804mul.89.1252400956551;
	Tue, 08 Sep 2009 02:09:16 -0700 (PDT)
Received: from azdaja.softwarehood.com ([95.180.33.218])
	by mx.google.com with ESMTPS id w5sm170879mue.4.2009.09.08.02.09.15
	(version=TLSv1/SSLv3 cipher=RC4-MD5);
	Tue, 08 Sep 2009 02:09:16 -0700 (PDT)
Message-ID: <4AA61F3A.3040802@gmail.com>
Date: Tue, 08 Sep 2009 11:09:14 +0200
From: Ivan Radovanovic <rivanr@gmail.com>
User-Agent: Thunderbird 2.0.0.22 (X11/20090708)
MIME-Version: 1.0
To: Jan Mikkelsen <janm-freebsd-hackers@transactionware.com>
References: <4AA4E7A7.60503@gmail.com>
	<E71733B7-16FB-435E-90BD-4869831CC61C@transactionware.com>
In-Reply-To: <E71733B7-16FB-435E-90BD-4869831CC61C@transactionware.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@FreeBSD.org
Subject: Re: Kernel panic caused by fork
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Sep 2009 09:09:18 -0000

Jan Mikkelsen napisa:
> A quick observation: This is not "one process misbehaving", it is a 
> large number of processes misbehaving.  From an administrative point 
> of view, I think the response is "call setrlimit(RLIMIT_NPROC, ...)", 
> otherwise the expected behaviour is for your machine to stop making 
> forward progress.
>
> Having said that, I agree that panics are bad and it would be nice if 
> fork() returned EAGAIN, again and again and again.  Or perhaps the 
> machine should just panic ...
from fork(2) page - about errors
     [EAGAIN]           The system-imposed limit on the total number of pro-
                        cesses under execution would be exceeded.  The limit
                        is given by the sysctl(3) MIB variable KERN_MAXPROC.
                        (The limit is actually ten less than this except for
                        the super user).

it seems that idea is to leave room for 10 more processes so root can 
kill offending process, and limits at my system are (I am running pretty 
much generic kernel)
kern.maxproc: 6164
kern.maxprocperuid: 5547
so if there are only two users running at the same time in the system 
(the case when I did this testing) there is room for more than 500 
processes after one user hits his limit - shouldn't panic I think

Regards,
Ivan

From owner-freebsd-hackers@FreeBSD.ORG  Tue Sep  8 09:19:57 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AB1E6106566B
	for <freebsd-hackers@FreeBSD.org>; Tue,  8 Sep 2009 09:19:56 +0000 (UTC)
	(envelope-from janm-freebsd-hackers@transactionware.com)
Received: from mail.transactionware.com (mail.transactionware.com
	[203.14.245.7]) by mx1.freebsd.org (Postfix) with SMTP id EA38B8FC16
	for <freebsd-hackers@FreeBSD.org>; Tue,  8 Sep 2009 09:19:55 +0000 (UTC)
Received: (qmail 20344 invoked from network); 8 Sep 2009 08:53:23 -0000
Received: from midgard.transactionware.com (192.168.1.55)
	by dm.transactionware.com with SMTP; 8 Sep 2009 08:53:23 -0000
Received: (qmail 24315 invoked by uid 907); 8 Sep 2009 08:53:13 -0000
Received: from jmmacpro.transactionware.com (HELO
	jmmacpro.transactionware.com) (192.168.1.33)
	by midgard.transactionware.com (qpsmtpd/0.82) with ESMTP;
	Tue, 08 Sep 2009 18:53:13 +1000
Mime-Version: 1.0 (Apple Message framework v1075.2)
Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes
From: Jan Mikkelsen <janm-freebsd-hackers@transactionware.com>
In-Reply-To: <4AA4E7A7.60503@gmail.com>
Date: Tue, 8 Sep 2009 18:53:13 +1000
Content-Transfer-Encoding: 7bit
Message-Id: <E71733B7-16FB-435E-90BD-4869831CC61C@transactionware.com>
References: <4AA4E7A7.60503@gmail.com>
To: Ivan Radovanovic <rivanr@gmail.com>
X-Mailer: Apple Mail (2.1075.2)
Cc: freebsd-hackers@FreeBSD.org
Subject: Re: Kernel panic caused by fork
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Sep 2009 09:19:57 -0000

Hi,

On 07/09/2009, at 8:59 PM, Ivan Radovanovic wrote:
...
> After running this program I got kernel panic with message
> "get_pv_entry: increase vm.pmap.shpgperproc"
> IMHO it is not very good idea to bring entire system down if one  
> process misbehaves in this way, it is maybe much better to kill  
> offending process and to send this message to system log. I am not  
> sure whether the panic is actually caused by process forking forever  
> or when the system tries to create new process when maxproc limit is  
> already reached (since system is only printing warning message that  
> maxproc limit is reached and it only panics when I try to start new  
> process (like ps)).

A quick observation: This is not "one process misbehaving", it is a  
large number of processes misbehaving.  From an administrative point  
of view, I think the response is "call setrlimit(RLIMIT_NPROC, ...)",  
otherwise the expected behaviour is for your machine to stop making  
forward progress.

Having said that, I agree that panics are bad and it would be nice if  
fork() returned EAGAIN, again and again and again.  Or perhaps the  
machine should just panic ...

Regards,

Jan.


From owner-freebsd-hackers@FreeBSD.ORG  Tue Sep  8 10:49:01 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0D04C106566B
	for <freebsd-hackers@freebsd.org>; Tue,  8 Sep 2009 10:49:01 +0000 (UTC)
	(envelope-from crquan@gmail.com)
Received: from mail-vw0-f189.google.com (mail-vw0-f189.google.com
	[209.85.212.189])
	by mx1.freebsd.org (Postfix) with ESMTP id B66378FC12
	for <freebsd-hackers@freebsd.org>; Tue,  8 Sep 2009 10:49:00 +0000 (UTC)
Received: by vws27 with SMTP id 27so2164930vws.3
	for <freebsd-hackers@freebsd.org>; Tue, 08 Sep 2009 03:48:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:in-reply-to:references
	:date:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=szK4piwxZJOSD7eJUZr4kFjPU0HWd8qRGAGMkWcrb/o=;
	b=VDnvYXDZm2rdgiUdRDJ9HTRPR66aF9ZZj98USIh6oyQ+LIA58gp+tNCQeggjEOIJlk
	L3cB7SXdZ0djPSidE4nEtmvqYlx/a4Ov0TWqeYTyDK6cwjWttiWSfkYylumolYQcQrZb
	loG96ZVObKetxHpOx49Uq3nM7m/tFtAOOweRY=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	b=JjMqDL05wZvRrDva1EMOuNoPykUAHYsIpYy2mXb7S7plv28g/KBAj0KOQwq6CTimP+
	IyZqXWOj3FYrRJpFxyalduEbUDO82/cU0IUSgLsloBc+KuRzjAwIquSxvh9Y8NlHlVFx
	I1IAuxpXlt+zkd1hDCY7dXeRjcB7x5HhUYE6E=
MIME-Version: 1.0
Received: by 10.220.111.80 with SMTP id r16mr14808551vcp.76.1252405323791; 
	Tue, 08 Sep 2009 03:22:03 -0700 (PDT)
In-Reply-To: <4AA4E7A7.60503@gmail.com>
References: <4AA4E7A7.60503@gmail.com>
Date: Tue, 8 Sep 2009 18:22:03 +0800
Message-ID: <91b13c310909080322s21e0fb02o423434206e5f96f6@mail.gmail.com>
From: Cheng Renquan <crquan@gmail.com>
To: Ivan Radovanovic <rivanr@gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-hackers@freebsd.org
Subject: Re: Kernel panic caused by fork
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Sep 2009 10:49:01 -0000

On Mon, Sep 7, 2009 at 6:59 PM, Ivan Radovanovic<rivanr@gmail.com> wrote:
> I was testing FreeBSD's behavior when running many threads at the same ti=
me
> (and I find it performs excellent) when I wanted to test how system will
> behave towards program that spawns itself too many times. I wrote a very
> simple program
>
> #include <sys/types.h>
> #include <unistd.h>
>
> int main() {
> =C2=A0while(1)
> =C2=A0 fork();
> =C2=A0return 0;
> }
>
> After running this program I got kernel panic with message
> "get_pv_entry: increase vm.pmap.shpgperproc"
> IMHO it is not very good idea to bring entire system down if one process
> misbehaves in this way, it is maybe much better to kill offending process
> and to send this message to system log. I am not sure whether the panic i=
s
> actually caused by process forking forever or when the system tries to
> create new process when maxproc limit is already reached (since system is
> only printing warning message that maxproc limit is reached and it only
> panics when I try to start new process (like ps)).
> System is FreeBSD 7.2-STABLE

It's just the "fork bomb" problem, all operating system kernels cannot
deal with it well,

http://en.wikipedia.org/wiki/Fork_bomb

And it's really a system administration problem rather than a kernel proble=
m,

--=20
Cheng Renquan (=E7=A8=8B=E4=BB=BB=E5=85=A8), from Shenzhen, China

From owner-freebsd-hackers@FreeBSD.ORG  Tue Sep  8 11:12:31 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9CE8B1065692
	for <freebsd-hackers@freebsd.org>; Tue,  8 Sep 2009 11:12:31 +0000 (UTC)
	(envelope-from joachim.kuebart@gmx.net)
Received: from mail.gmx.net (mail.gmx.net [213.165.64.20])
	by mx1.freebsd.org (Postfix) with SMTP id E998B8FC0C
	for <freebsd-hackers@freebsd.org>; Tue,  8 Sep 2009 11:12:30 +0000 (UTC)
Received: (qmail invoked by alias); 08 Sep 2009 10:45:47 -0000
Received: from cpc2-oxfd10-0-0-cust569.oxfd.cable.ntl.com (EHLO
	localhost.localdomain) [81.110.34.58]
	by mail.gmx.net (mp023) with SMTP; 08 Sep 2009 12:45:47 +0200
X-Authenticated: #31053830
X-Provags-ID: V01U2FsdGVkX18EDvR4tg8ESiTFil7raplmwee9qQCr8Epi6XruYd
	NmbfAGbmsP1HiO
From: Joachim Kuebart <joachim.kuebart@gmx.net>
To: freebsd-hackers@freebsd.org
Content-Type: text/plain
Date: Tue, 08 Sep 2009 11:45:45 +0100
Message-Id: <1252406745.778.22.camel@yacht>
Mime-Version: 1.0
X-Mailer: Evolution 2.26.3nb1 
Content-Transfer-Encoding: 7bit
X-Y-GMX-Trusted: 0
X-FuHaFi: 0.71
Subject: License change
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Sep 2009 11:12:31 -0000

Hi,

much to my embarrassment, I noticed recently that there is a file
authored by me using the 4-clause BSD license in the FreeBSD tree. The
file src/sys/dev/sound/pci/es137x.c uses the 4-clause BSD license while
the accompanying .h file uses a kind of 3-clause BSD license that I
apparently made up at the time.

I would like to change the license of es137x.c to the 3-clause BSD
license. Unfortunately I cannot prove that I'm in fact the original
author because the e-mail address given in the file is no longer active.
If this means that the license cannot be changed anymore, that's
unfortunate, but I guess it's the way it has to be...

Best regards,

Joachim


From owner-freebsd-hackers@FreeBSD.ORG  Tue Sep  8 16:24:38 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 949C9106568F
	for <freebsd-hackers@freebsd.org>; Tue,  8 Sep 2009 16:24:38 +0000 (UTC)
	(envelope-from julian@elischer.org)
Received: from outR.internet-mail-service.net (outr.internet-mail-service.net
	[216.240.47.241])
	by mx1.freebsd.org (Postfix) with ESMTP id 54DDB8FC1B
	for <freebsd-hackers@freebsd.org>; Tue,  8 Sep 2009 16:24:38 +0000 (UTC)
Received: from idiom.com (mx0.idiom.com [216.240.32.160])
	by out.internet-mail-service.net (Postfix) with ESMTP id 68924B3F80;
	Tue,  8 Sep 2009 09:24:38 -0700 (PDT)
X-Client-Authorized: MaGic Cook1e
X-Client-Authorized: MaGic Cook1e
X-Client-Authorized: MaGic Cook1e
Received: from julian-mac.elischer.org (home.elischer.org [216.240.48.38])
	by idiom.com (Postfix) with ESMTP id 8CF042D6010;
	Tue,  8 Sep 2009 09:24:37 -0700 (PDT)
Message-ID: <4AA68544.8050102@elischer.org>
Date: Tue, 08 Sep 2009 09:24:36 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Thunderbird 2.0.0.23 (Macintosh/20090812)
MIME-Version: 1.0
To: Cheng Renquan <crquan@gmail.com>
References: <4AA4E7A7.60503@gmail.com>
	<91b13c310909080322s21e0fb02o423434206e5f96f6@mail.gmail.com>
In-Reply-To: <91b13c310909080322s21e0fb02o423434206e5f96f6@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org, Ivan Radovanovic <rivanr@gmail.com>
Subject: Re: Kernel panic caused by fork
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Sep 2009 16:24:38 -0000

Cheng Renquan wrote:
> On Mon, Sep 7, 2009 at 6:59 PM, Ivan Radovanovic<rivanr@gmail.com> wrote:
>> I was testing FreeBSD's behavior when running many threads at the same time
>> (and I find it performs excellent) when I wanted to test how system will
>> behave towards program that spawns itself too many times. I wrote a very
>> simple program
>>
>> #include <sys/types.h>
>> #include <unistd.h>
>>
>> int main() {
>>  while(1)
>>   fork();
>>  return 0;
>> }
>>
>> After running this program I got kernel panic with message
>> "get_pv_entry: increase vm.pmap.shpgperproc"
>> IMHO it is not very good idea to bring entire system down if one process
>> misbehaves in this way, it is maybe much better to kill offending process
>> and to send this message to system log. I am not sure whether the panic is
>> actually caused by process forking forever or when the system tries to
>> create new process when maxproc limit is already reached (since system is
>> only printing warning message that maxproc limit is reached and it only
>> panics when I try to start new process (like ps)).
>> System is FreeBSD 7.2-STABLE
> 
> It's just the "fork bomb" problem, all operating system kernels cannot
> deal with it well,
> 
> http://en.wikipedia.org/wiki/Fork_bomb

It's more a tuning problem I think.  The system should tune itself so 
that MAXPROX is hit before critical resources are exhausted I think.
Having said that, there are a lot of resources that need to be watched.


> 
> And it's really a system administration problem rather than a kernel problem,
> 


From owner-freebsd-hackers@FreeBSD.ORG  Tue Sep  8 16:42:05 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0621E1065693
	for <freebsd-hackers@freebsd.org>; Tue,  8 Sep 2009 16:42:05 +0000 (UTC)
	(envelope-from rivanr@gmail.com)
Received: from mail-fx0-f210.google.com (mail-fx0-f210.google.com
	[209.85.220.210])
	by mx1.freebsd.org (Postfix) with ESMTP id 8937F8FC19
	for <freebsd-hackers@freebsd.org>; Tue,  8 Sep 2009 16:42:04 +0000 (UTC)
Received: by fxm6 with SMTP id 6so2654763fxm.43
	for <freebsd-hackers@freebsd.org>; Tue, 08 Sep 2009 09:42:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:date:from
	:user-agent:mime-version:to:cc:subject:references:in-reply-to
	:content-type:content-transfer-encoding;
	bh=KAAXQlCGfvZBk5/xQSnFbGnJGbnW2VRqjAgYP3vPNp4=;
	b=TXye/zWSz+ixau724D8GUpclawstnCCDQuiTGetPwFOQdjRMmRXQ8TNC+FCUMuP7Az
	5ysGFOyg3tUTl4JmCGE0yBv+nNr+a7RWEAi+zn1V5gkxnFijyuzWKC6SHW29VebhK54o
	8PLaL8VpHrptIvHj7iaRPkkBISc9inMRaptNE=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:content-type:content-transfer-encoding;
	b=dWL/g8dwoc8yvUjGwVZBwAYQmPctSqPHYI8MJ1qT1cqKko/0gul1bruDtPiG7nBDum
	fZLkTuYJc+zU/xbMhNyExxo4EEApZxbfpVl0zeesnYpM1BzgBVzB2ktx/a5JX/WilFX5
	L3lIuqJm1bvsf/OvTb0w+Du8Vva+mVQXdvs0o=
Received: by 10.102.14.4 with SMTP id 4mr6747394mun.2.1252428123402;
	Tue, 08 Sep 2009 09:42:03 -0700 (PDT)
Received: from azdaja.softwarehood.com ([95.180.33.218])
	by mx.google.com with ESMTPS id i7sm208783mue.48.2009.09.08.09.42.02
	(version=TLSv1/SSLv3 cipher=RC4-MD5);
	Tue, 08 Sep 2009 09:42:02 -0700 (PDT)
Message-ID: <4AA68959.6000808@gmail.com>
Date: Tue, 08 Sep 2009 18:42:01 +0200
From: Ivan Radovanovic <rivanr@gmail.com>
User-Agent: Thunderbird 2.0.0.22 (X11/20090708)
MIME-Version: 1.0
To: Julian Elischer <julian@elischer.org>
References: <4AA4E7A7.60503@gmail.com>
	<91b13c310909080322s21e0fb02o423434206e5f96f6@mail.gmail.com>
	<4AA68544.8050102@elischer.org>
In-Reply-To: <4AA68544.8050102@elischer.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org, Cheng Renquan <crquan@gmail.com>
Subject: Re: Kernel panic caused by fork
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Sep 2009 16:42:05 -0000

Julian Elischer napisa:
> Cheng Renquan wrote:
>> On Mon, Sep 7, 2009 at 6:59 PM, Ivan Radovanovic<rivanr@gmail.com> 
>> wrote:
>>> I was testing FreeBSD's behavior when running many threads at the 
>>> same time
>>> (and I find it performs excellent) when I wanted to test how system 
>>> will
>>> behave towards program that spawns itself too many times. I wrote a 
>>> very
>>> simple program
>> It's just the "fork bomb" problem, all operating system kernels cannot
>> deal with it well,
>>
>> http://en.wikipedia.org/wiki/Fork_bomb
> It's more a tuning problem I think.  The system should tune itself so 
> that MAXPROX is hit before critical resources are exhausted I think.
> Having said that, there are a lot of resources that need to be watched.
After reading this nice article on wikipedia and learning about that 
bash one liner I wanted to check if it really works, but I didn't want 
to bring the system down again (and to create crash dump and so on), so 
I wanted to limit number of processes for single user and I did
sysctl kern.maxprocperuid=1000
as root, and after that I started bash and typed
:(){ :|:& };:
as normal user
First thing to notice - there was more than 4000 spawned bash processes 
(why if I set limit to 1000 per user id?), however system didn't crash 
and I was eventually able to recover with
/bin/kill -9 -- -1234
1234 being process group id of bash process

Regards,
Ivan


From owner-freebsd-hackers@FreeBSD.ORG  Tue Sep  8 21:01:49 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3CEB4106568F
	for <freebsd-hackers@freebsd.org>; Tue,  8 Sep 2009 21:01:49 +0000 (UTC)
	(envelope-from freebsd-hackers@m.gmane.org)
Received: from lo.gmane.org (lo.gmane.org [80.91.229.12])
	by mx1.freebsd.org (Postfix) with ESMTP id BDFCE8FC13
	for <freebsd-hackers@freebsd.org>; Tue,  8 Sep 2009 21:01:48 +0000 (UTC)
Received: from list by lo.gmane.org with local (Exim 4.50) id 1Ml7ox-00056T-Fz
	for freebsd-hackers@freebsd.org; Tue, 08 Sep 2009 23:01:47 +0200
Received: from 93-138-19-116.adsl.net.t-com.hr ([93.138.19.116])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-hackers@freebsd.org>; Tue, 08 Sep 2009 23:01:47 +0200
Received: from ivoras by 93-138-19-116.adsl.net.t-com.hr with local (Gmexim
	0.1 (Debian)) id 1AlnuQ-0007hv-00
	for <freebsd-hackers@freebsd.org>; Tue, 08 Sep 2009 23:01:47 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-hackers@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Tue, 08 Sep 2009 23:00:58 +0200
Lines: 54
Message-ID: <h86gn2$ghr$1@ger.gmane.org>
References: <4AA4E7A7.60503@gmail.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enigF560D7BCDCFD4CBFF39C050C"
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: 93-138-19-116.adsl.net.t-com.hr
User-Agent: Thunderbird 2.0.0.23 (Windows/20090812)
In-Reply-To: <4AA4E7A7.60503@gmail.com>
X-Enigmail-Version: 0.96.0
Sender: news <news@ger.gmane.org>
Subject: Re: Kernel panic caused by fork
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Sep 2009 21:01:49 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigF560D7BCDCFD4CBFF39C050C
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Ivan Radovanovic wrote:
> I was testing FreeBSD's behavior when running many threads at the same
> time (and I find it performs excellent) when I wanted to test how syste=
m
> will behave towards program that spawns itself too many times. I wrote =
a
> very simple program
>=20
> #include <sys/types.h>
> #include <unistd.h>
>=20
> int main() {
>  while(1)
>    fork();
>  return 0;
> }

A simple fork bomb. Hmm, it should just crash and if it does crash it's
a regression. I've "tested" fork bombs on 7-STABLE and early 8-CURRENT
and they were behaving as expected - stopped at the maxproc limit.

I don't currently have spare 7.x stable machines but I have just run it
on 8-BETA2 one and the maxproc limit still works, though as expected the
console is almost unusable for anything except switching (i.e. processes
don't get to receive input very often). A lot of them are in "locked"
state with "*vm ob" as state/channel name.

I couldn't clean the system from the fork bomb with "killall" as root.

Can you describe your machine? My is an Atom-based (slow) netbook with 1
GB RAM.


--------------enigF560D7BCDCFD4CBFF39C050C
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqmxhAACgkQldnAQVacBcjzUwCfeBvJ/Kd6zFakn6qP9BNBH9TS
1i4An09wFsbLJ7vgoyQjZ4n+sx6oBGZG
=uppB
-----END PGP SIGNATURE-----

--------------enigF560D7BCDCFD4CBFF39C050C--


From owner-freebsd-hackers@FreeBSD.ORG  Wed Sep  9 17:01:34 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 95A6F1065676
	for <freebsd-hackers@freebsd.org>; Wed,  9 Sep 2009 17:01:34 +0000 (UTC)
	(envelope-from a_best01@uni-muenster.de)
Received: from zivm-out3.uni-muenster.de (ZIVM-OUT3.UNI-MUENSTER.DE
	[128.176.192.18])
	by mx1.freebsd.org (Postfix) with ESMTP id BE95E8FC1E
	for <freebsd-hackers@freebsd.org>; Wed,  9 Sep 2009 17:01:33 +0000 (UTC)
X-IronPort-AV: E=Sophos;i="4.44,359,1249250400"; d="scan'208";a="12815829"
Received: from zivmaildisp2.uni-muenster.de (HELO
	ZIVMAILUSER04.UNI-MUENSTER.DE) ([128.176.188.143])
	by zivm-relay3.uni-muenster.de with ESMTP; 09 Sep 2009 19:01:31 +0200
Received: by ZIVMAILUSER04.UNI-MUENSTER.DE (Postfix, from userid 149459)
	id CE9971B0096; Wed,  9 Sep 2009 19:01:31 +0200 (CEST)
Date: Wed, 09 Sep 2009 19:01:31 +0200 (CEST)
From: Alexander Best <alexbestms@math.uni-muenster.de>
Sender: <a_best01@uni-muenster.de>
Organization: Westfaelische Wilhelms-Universitaet Muenster
To: <freebsd-hackers@FreeBSD.org>
Message-ID: <permail-2009090917013180e26a0b0000570b-a_best01@message-id.uni-muenster.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Buffer overflow detected by REDZONE with linuxulator
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 09 Sep 2009 17:01:34 -0000

hi there,

i've installed emulators/linux_dist-gentoo-stage3 and grabbed a snapshot from
the ltp git repository (http://ltp.sourceforge.net/). as expected some tests
failed because i'm using compat.linux.osrelease: 2.6.16 which is still missing
a few linux syscalls, ipcs and ioctls.

however i also noticed REDZONE reporting buffer overflows. i'm only a user and
not a developer so i don't know if the ltp is to be blamed or if the problem
lies within the linuxulator.

i'm running 9.0-CURRENT (r196879). as i mentioned before i'm using 2.6 linux
kernel emulation. here are the buffer overflow reports:

Sep  9 14:12:42 otaku kernel: REDZONE: Buffer overflow detected. 9 bytes
corrupted after 0xcc28c483 (3 bytes allocated).
Sep  9 14:12:42 otaku kernel: Allocation backtrace:
Sep  9 14:12:42 otaku kernel: #0 0xc0709aaa at redzone_setup+0x3a
Sep  9 14:12:42 otaku kernel: #1 0xc05bc673 at malloc+0x1c3
Sep  9 14:12:42 otaku kernel: #2 0xc07428b8 at linux_getsockaddr+0x48
Sep  9 14:12:42 otaku kernel: #3 0xc0742eb8 at linux_socketcall+0x178
Sep  9 14:12:42 otaku kernel: #4 0xc0772f56 at syscall+0x2a6
Sep  9 14:12:42 otaku kernel: #5 0xc07568b0 at Xint0x80_syscall+0x20
Sep  9 14:12:42 otaku kernel: Free backtrace:
Sep  9 14:12:42 otaku kernel: #0 0xc0709a3a at redzone_check+0x17a
Sep  9 14:12:42 otaku kernel: #1 0xc05bc32d at free+0x5d
Sep  9 14:12:42 otaku kernel: #2 0xc0742ef0 at linux_socketcall+0x1b0
Sep  9 14:12:42 otaku kernel: #3 0xc0772f56 at syscall+0x2a6
Sep  9 14:12:42 otaku kernel: #4 0xc07568b0 at Xint0x80_syscall+0x20
Sep  9 14:20:08 otaku kernel: REDZONE: Buffer overflow detected. 4 bytes
corrupted after 0xcc2538ea (106 bytes allocated).
Sep  9 14:20:08 otaku kernel: Allocation backtrace:
Sep  9 14:20:08 otaku kernel: #0 0xc0709aaa at redzone_setup+0x3a
Sep  9 14:20:08 otaku kernel: #1 0xc05bc673 at malloc+0x1c3
Sep  9 14:20:08 otaku kernel: #2 0xc063a902 at unp_connect+0x162
Sep  9 14:20:08 otaku kernel: #3 0xc063d6c9 at uipc_connect+0x49
Sep  9 14:20:08 otaku kernel: #4 0xc062fde2 at soconnect+0x52
Sep  9 14:20:08 otaku kernel: #5 0xc0638eb6 at kern_connect+0x96
Sep  9 14:20:08 otaku kernel: #6 0xc0742c7b at linux_connect+0x3b
Sep  9 14:20:08 otaku kernel: #7 0xc0742f22 at linux_socketcall+0x1e2
Sep  9 14:20:08 otaku kernel: #8 0xc0772f56 at syscall+0x2a6
Sep  9 14:20:08 otaku kernel: #9 0xc07568b0 at Xint0x80_syscall+0x20
Sep  9 14:20:08 otaku kernel: Free backtrace:
Sep  9 14:20:08 otaku kernel: #0 0xc0709a3a at redzone_check+0x17a
Sep  9 14:20:08 otaku kernel: #1 0xc05bc32d at free+0x5d
Sep  9 14:20:08 otaku kernel: #2 0xc063bfb2 at uipc_detach+0x242
Sep  9 14:20:08 otaku kernel: #3 0xc0632a7e at sofree+0x22e
Sep  9 14:20:08 otaku kernel: #4 0xc0632f26 at soclose+0x386
Sep  9 14:20:08 otaku kernel: #5 0xc0617c49 at soo_close+0x29
Sep  9 14:20:08 otaku kernel: #6 0xc0598b13 at _fdrop+0x43
Sep  9 14:20:08 otaku kernel: #7 0xc059ab90 at closef+0x290
Sep  9 14:20:08 otaku kernel: #8 0xc059af22 at kern_close+0x102
Sep  9 14:20:08 otaku kernel: #9 0xc059b09a at close+0x1a
Sep  9 14:20:08 otaku kernel: #10 0xc0772f56 at syscall+0x2a6
Sep  9 14:20:08 otaku kernel: #11 0xc07568b0 at Xint0x80_syscall+0x20
Sep  9 14:20:09 otaku kernel: REDZONE: Buffer overflow detected. 4 bytes
corrupted after 0xccc653ea (106 bytes allocated).
Sep  9 14:20:09 otaku kernel: Allocation backtrace:
Sep  9 14:20:09 otaku kernel: #0 0xc0709aaa at redzone_setup+0x3a
Sep  9 14:20:09 otaku kernel: #1 0xc05bc673 at malloc+0x1c3
Sep  9 14:20:09 otaku kernel: #2 0xc063a902 at unp_connect+0x162
Sep  9 14:20:09 otaku kernel: #3 0xc063d6c9 at uipc_connect+0x49
Sep  9 14:20:09 otaku kernel: #4 0xc062fde2 at soconnect+0x52
Sep  9 14:20:09 otaku kernel: #5 0xc0638eb6 at kern_connect+0x96
Sep  9 14:20:09 otaku kernel: #6 0xc0742c7b at linux_connect+0x3b
Sep  9 14:20:09 otaku kernel: #7 0xc0742f22 at linux_socketcall+0x1e2
Sep  9 14:20:09 otaku kernel: #8 0xc0772f56 at syscall+0x2a6
Sep  9 14:20:09 otaku kernel: #9 0xc07568b0 at Xint0x80_syscall+0x20
Sep  9 14:20:09 otaku kernel: Free backtrace:
Sep  9 14:20:09 otaku kernel: #0 0xc0709a3a at redzone_check+0x17a
Sep  9 14:20:09 otaku kernel: #1 0xc05bc32d at free+0x5d
Sep  9 14:20:09 otaku kernel: #2 0xc063bfb2 at uipc_detach+0x242
Sep  9 14:20:09 otaku kernel: #3 0xc0632a7e at sofree+0x22e
Sep  9 14:20:09 otaku kernel: #4 0xc0632f26 at soclose+0x386
Sep  9 14:20:09 otaku kernel: #5 0xc0617c49 at soo_close+0x29
Sep  9 14:20:09 otaku kernel: #6 0xc0598b13 at _fdrop+0x43
Sep  9 14:20:09 otaku kernel: #7 0xc059ab90 at closef+0x290
Sep  9 14:20:09 otaku kernel: #8 0xc059af22 at kern_close+0x102
Sep  9 14:20:09 otaku kernel: #9 0xc059b09a at close+0x1a
Sep  9 14:20:09 otaku kernel: #10 0xc0772f56 at syscall+0x2a6
Sep  9 14:20:09 otaku kernel: #11 0xc07568b0 at Xint0x80_syscall+0x20
Sep  9 14:20:09 otaku kernel: REDZONE: Buffer overflow detected. 4 bytes
corrupted after 0xcf45a9ea (106 bytes allocated).
Sep  9 14:20:09 otaku kernel: Allocation backtrace:
Sep  9 14:20:09 otaku kernel: #0 0xc0709aaa at redzone_setup+0x3a
Sep  9 14:20:09 otaku kernel: #1 0xc05bc673 at malloc+0x1c3
Sep  9 14:20:09 otaku kernel: #2 0xc063a902 at unp_connect+0x162
Sep  9 14:20:09 otaku kernel: #3 0xc063d6c9 at uipc_connect+0x49
Sep  9 14:20:09 otaku kernel: #4 0xc062fde2 at soconnect+0x52
Sep  9 14:20:09 otaku kernel: #5 0xc0638eb6 at kern_connect+0x96
Sep  9 14:20:09 otaku kernel: #6 0xc0742c7b at linux_connect+0x3b
Sep  9 14:20:09 otaku kernel: #7 0xc0742f22 at linux_socketcall+0x1e2
Sep  9 14:20:09 otaku kernel: #8 0xc0772f56 at syscall+0x2a6
Sep  9 14:20:09 otaku kernel: #9 0xc07568b0 at Xint0x80_syscall+0x20
Sep  9 14:20:09 otaku kernel: Free backtrace:
Sep  9 14:20:09 otaku kernel: #0 0xc0709a3a at redzone_check+0x17a
Sep  9 14:20:09 otaku kernel: #1 0xc05bc32d at free+0x5d
Sep  9 14:20:09 otaku kernel: #2 0xc063bfb2 at uipc_detach+0x242
Sep  9 14:20:09 otaku kernel: #3 0xc0632a7e at sofree+0x22e
Sep  9 14:20:09 otaku kernel: #4 0xc0632f26 at soclose+0x386
Sep  9 14:20:09 otaku kernel: #5 0xc0617c49 at soo_close+0x29
Sep  9 14:20:09 otaku kernel: #6 0xc0598b13 at _fdrop+0x43
Sep  9 14:20:09 otaku kernel: #7 0xc059ab90 at closef+0x290
Sep  9 14:20:09 otaku kernel: #8 0xc059b55a at fdfree+0x3ea
Sep  9 14:20:09 otaku kernel: #9 0xc05a57b3 at exit1+0x513
Sep  9 14:20:09 otaku kernel: #10 0xc05d17f4 at sigexit+0xa14
Sep  9 14:20:09 otaku kernel: #11 0xc05d19fd at postsig+0x1dd
Sep  9 14:20:09 otaku kernel: #12 0xc0608fca at ast+0x35a
Sep  9 14:20:09 otaku kernel: #13 0xc0757174 at doreti_ast+0x17

cheers.
alex

From owner-freebsd-hackers@FreeBSD.ORG  Thu Sep 10 06:55:57 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 77FB1106566B
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 06:55:57 +0000 (UTC)
	(envelope-from guomingyan@gmail.com)
Received: from mail-ew0-f208.google.com (mail-ew0-f208.google.com
	[209.85.219.208])
	by mx1.freebsd.org (Postfix) with ESMTP id 0F45A8FC15
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 06:55:56 +0000 (UTC)
Received: by ewy4 with SMTP id 4so46332ewy.36
	for <freebsd-hackers@freebsd.org>; Wed, 09 Sep 2009 23:55:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:date:message-id:subject
	:from:to:cc:content-type;
	bh=/iep6HpKpJcqErenzwyaOadoxp17hjbJkHPLZhI57PU=;
	b=Yd/Am6pxWv8xeQwhEoKnKO0oDX61OakPa4r1QpKBgZ86s0YR7dnIJJZNlqd+1sRVqG
	a04VmQdYw+C/64rOdI9HFqgy55P/P5yArD1mzDEam80iUZtoEa/0OBwl2JvX1sm260sb
	k9vGm8BJAX+UH/L0vxLPlXyVLQegaePYjrQm8=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:date:message-id:subject:from:to:cc:content-type;
	b=LrHvrh9UOcDwCUSlYCO8kP1MeBPlCn+19K9efzhExAgFJRa2mhA5QG5AA++9Vx3AkQ
	TOdz4uyN0hiLoFWsf8wUE3Lfqju1GdAx0EKASbYveGdTJdx5cFSulriA7UDvgMsbfGUE
	Tx4pY40mfsbnO8v8/dSq+dQTFeGK/TlHvtu+I=
MIME-Version: 1.0
Received: by 10.210.9.5 with SMTP id 5mr432788ebi.78.1252564013540; Wed, 09 
	Sep 2009 23:26:53 -0700 (PDT)
Date: Wed, 9 Sep 2009 23:26:53 -0700
Message-ID: <1fa17f810909092326l1271df94t1dea5ac9d5deba1b@mail.gmail.com>
From: MingyanGuo <guomingyan@gmail.com>
To: freebsd-hackers@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: LI Xin <delphij@delphij.net>
Subject: How to prevent other CPU from accessing a set of pages before
	calling pmap_remove_all function
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Sep 2009 06:55:57 -0000

Hi all,

I find that function pmap_remove_all for arch amd64 works with a time window
between reading & clearing the PTE flags(access flag and dirty flag) and
invalidating its TLB entry on other CPU. After some discussion with Li
Xin(cced), I think all the processes that are using the PTE being removed
should be blocked before calling pmap_remove_all, or other CPU may dirty the
page but does not set the dirty flag before the TLB entry is flushed. But I
can not find how to block them to call the function. I read the function
vm_pageout_scan in file vm/vm_pageout.c but can not find the exact method it
used.  Or I just misunderstood the semantics of function pmap_remove_all ?

Thanks in advance.

Regards,
MingyanGuo

From owner-freebsd-hackers@FreeBSD.ORG  Thu Sep 10 06:57:25 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BA420106568B
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 06:57:25 +0000 (UTC)
	(envelope-from guomingyan@gmail.com)
Received: from mail-ew0-f208.google.com (mail-ew0-f208.google.com
	[209.85.219.208])
	by mx1.freebsd.org (Postfix) with ESMTP id 4735F8FC0A
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 06:57:25 +0000 (UTC)
Received: by ewy4 with SMTP id 4so47106ewy.36
	for <freebsd-hackers@freebsd.org>; Wed, 09 Sep 2009 23:57:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:in-reply-to:references
	:date:message-id:subject:from:to:cc:content-type;
	bh=Db1QftQ0KNHlMqva0vqufJQ8othqJc8dYGIiu2VPh8E=;
	b=VxbpdtfzHib8DGridxuqW2L/r0KecgGW5iolzuDcQ/eNaPxVJB8CvSXoJKIWphKK77
	RoBuVpVR2JyqeQPyfZEolvQxQ1RclYYsQg4kX793DlrBRctPYFyBrVch4NWGFjLtKVIu
	awV+IuBkxaMMscHnzrGn+FtdNNHJnkUhGv3BU=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=ssmvhIECriRTZQTmIJ2aXm4f/tOJM8sG6NN1DrTXSqqox9NRspj5yb+o+yi0zWd32j
	WCLQl7pl6Dy7vJmPnlBIY6TusOoP1b23TdhSNu35EewgRxA5/2siAGfOlDqee0fxe5uL
	GaKiH2p9fY77luVelIfWDw0iEHDoPDWwd7LxM=
MIME-Version: 1.0
Received: by 10.211.172.8 with SMTP id z8mr1256202ebo.92.1252565844410; Wed, 
	09 Sep 2009 23:57:24 -0700 (PDT)
In-Reply-To: <1fa17f810909092326l1271df94t1dea5ac9d5deba1b@mail.gmail.com>
References: <1fa17f810909092326l1271df94t1dea5ac9d5deba1b@mail.gmail.com>
Date: Wed, 9 Sep 2009 23:57:24 -0700
Message-ID: <1fa17f810909092357x8625182q970f8fb6aa76e7a9@mail.gmail.com>
From: MingyanGuo <guomingyan@gmail.com>
To: freebsd-hackers@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: LI Xin <delphij@delphij.net>
Subject: Re: How to prevent other CPU from accessing a set of pages before 
	calling pmap_remove_all function
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Sep 2009 06:57:25 -0000

On Wed, Sep 9, 2009 at 11:26 PM, MingyanGuo <guomingyan@gmail.com> wrote:

> Hi all,
>
> I find that function pmap_remove_all for arch amd64 works with a time
> window between reading & clearing the PTE flags(access flag and dirty flag)
> and invalidating its TLB entry on other CPU. After some discussion with Li
> Xin(cced), I think all the processes that are using the PTE being removed
> should be blocked before calling pmap_remove_all, or other CPU may dirty the
> page but does not set the dirty flag before the TLB entry is flushed. But I
> can not find how to block them to call the function. I read the function
> vm_pageout_scan in file vm/vm_pageout.c but can not find the exact method it
> used.  Or I just misunderstood the semantics of function pmap_remove_all ?
>
> Thanks in advance.
>
> Regards,
> MingyanGuo
>

Sorry for the noise. I understand the logic now. There is no time window
problem between reading & clearing the PTE and invalidating it on other CPU,
even if other CPU is using the PTE.  I misunderstood the logic.

Regards,
MingyanGuo

From owner-freebsd-hackers@FreeBSD.ORG  Thu Sep 10 12:08:51 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9F43D106566C
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 12:08:51 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (skuns.zoral.com.ua [91.193.166.194])
	by mx1.freebsd.org (Postfix) with ESMTP id 3839B8FC14
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 12:08:50 +0000 (UTC)
Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua
	[10.1.1.148])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id n8AC8Cv8004405
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Thu, 10 Sep 2009 15:08:12 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.3/8.14.3) with ESMTP id
	n8AC8CYQ077994; Thu, 10 Sep 2009 15:08:12 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.3/8.14.3/Submit) id n8AC8BU6077993; 
	Thu, 10 Sep 2009 15:08:11 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Thu, 10 Sep 2009 15:08:11 +0300
From: Kostik Belousov <kostikbel@gmail.com>
To: MingyanGuo <guomingyan@gmail.com>
Message-ID: <20090910120811.GH47688@deviant.kiev.zoral.com.ua>
References: <1fa17f810909092326l1271df94t1dea5ac9d5deba1b@mail.gmail.com>
	<1fa17f810909092357x8625182q970f8fb6aa76e7a9@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="spzFwWfYKRzjK1rH"
Content-Disposition: inline
In-Reply-To: <1fa17f810909092357x8625182q970f8fb6aa76e7a9@mail.gmail.com>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: freebsd-hackers@freebsd.org, LI Xin <delphij@delphij.net>
Subject: Re: How to prevent other CPU from accessing a set of pages before
	calling pmap_remove_all function
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Sep 2009 12:08:51 -0000


--spzFwWfYKRzjK1rH
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Sep 09, 2009 at 11:57:24PM -0700, MingyanGuo wrote:
> On Wed, Sep 9, 2009 at 11:26 PM, MingyanGuo <guomingyan@gmail.com> wrote:
>=20
> > Hi all,
> >
> > I find that function pmap_remove_all for arch amd64 works with a time
> > window between reading & clearing the PTE flags(access flag and dirty f=
lag)
> > and invalidating its TLB entry on other CPU. After some discussion with=
 Li
> > Xin(cced), I think all the processes that are using the PTE being remov=
ed
> > should be blocked before calling pmap_remove_all, or other CPU may dirt=
y the
> > page but does not set the dirty flag before the TLB entry is flushed. B=
ut I
> > can not find how to block them to call the function. I read the function
> > vm_pageout_scan in file vm/vm_pageout.c but can not find the exact meth=
od it
> > used.  Or I just misunderstood the semantics of function pmap_remove_al=
l ?
> >
> > Thanks in advance.
> >
> > Regards,
> > MingyanGuo
> >
>=20
> Sorry for the noise. I understand the logic now. There is no time window
> problem between reading & clearing the PTE and invalidating it on other C=
PU,
> even if other CPU is using the PTE.  I misunderstood the logic.

Hmm. What would happen for the following scenario.

Assume that the page m is mapped by vm map active on CPU1, and that
CPU1 has cached TLB entry for some writable mapping of this page,
but neither TLB entry not PTE has dirty bit set.

Then, assume that the following sequence of events occur:

CPU1:						CPU2:
					call pmap_remove_all(m)
					clear pte
write to the address mapped
    by m [*]
					invalidate the TLB,
					    possibly making IPI to CPU1

I assume that at the point marked [*], we can
- either loose the dirty bit, while CPU1 (atomically) sets the dirty bit
  in the cleared pte.
  Besides not properly tracking the modification status of the page,
  it could also cause the page table page to be modified, that would
  create non-zero page with PG_ZERO flag set.
- or CPU1 re-reads the PTE entry when setting the dirty bit, and generates
  #pf since valid bit in PTE is zero.

Intel documentation mentions that dirty or accessed bits updates are done
with locked cycle, that definitely means that PTE is re-read, but I cannot
find whether valid bit is rechecked.

--spzFwWfYKRzjK1rH
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (FreeBSD)

iEYEARECAAYFAkqo7CoACgkQC3+MBN1Mb4gRGgCgscvKZFeh4uPhTADH2tERZtVh
Y98AnR/9HAbNm6DqTmKYv+LtC/FaJGMW
=gKPs
-----END PGP SIGNATURE-----

--spzFwWfYKRzjK1rH--

From owner-freebsd-hackers@FreeBSD.ORG  Thu Sep 10 16:46:46 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BBC7E106566C
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 16:46:46 +0000 (UTC)
	(envelope-from linda.messerschmidt@gmail.com)
Received: from mail-qy0-f200.google.com (mail-qy0-f200.google.com
	[209.85.221.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 72E278FC0A
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 16:46:46 +0000 (UTC)
Received: by qyk38 with SMTP id 38so243758qyk.27
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 09:46:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:in-reply-to:references
	:date:message-id:subject:from:to:content-type
	:content-transfer-encoding;
	bh=qNNycoBbxJH+ZU6thx3OtaGneU72NfUzfypTEP61b7M=;
	b=oRtvXL05FO1Dr8oUhpD7S6gq/x8zP2zwXR9zSSxKIfyDM8+5K+ka0XdmLodX68r2wc
	x+ieA+OS9ZP6j2vT3ksLghif121SDK6JALfyQx1QLlPRbl9L6rvrGUoGFsfmWSpTwHB4
	pc9V3kpjpV6xVjvCSB1hp45yQugqSQEJdTKd0=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:content-type:content-transfer-encoding;
	b=a5bl9pVhf/TFuUu8EynmSRq//JYFyjkMh23gjhd4zfpx2zjrjx+ZAfRXbA0qs59jjq
	RSrGrMHA5MgyYSjLSKIW0j/ME8uI3etXF+1NRn4wzZZpwgP55UzqkngafeLgJwEdEFU7
	rGy+4SLKbDgtSLo+pQbLQ/JTbAW6VZeJMeuS8=
MIME-Version: 1.0
Received: by 10.229.106.83 with SMTP id w19mr973707qco.72.1252601205724; Thu, 
	10 Sep 2009 09:46:45 -0700 (PDT)
In-Reply-To: <200908271729.55213.jhb@freebsd.org>
References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com>
	<200908261642.59419.jhb@freebsd.org>
	<237c27100908271237y66219ef4o4b1b8a6e13ab2f6c@mail.gmail.com>
	<200908271729.55213.jhb@freebsd.org>
Date: Thu, 10 Sep 2009 12:46:45 -0400
Message-ID: <237c27100909100946q3d186af3h66757e0efff307a5@mail.gmail.com>
From: Linda Messerschmidt <linda.messerschmidt@gmail.com>
To: freebsd-hackers@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Sep 2009 16:46:46 -0000

On Thu, Aug 27, 2009 at 5:29 PM, John Baldwin<jhb@freebsd.org> wrote:
> Ah, cool, what you want to do is use KTR with KTR_SCHED and then use
> schedgraph.py (src/tools/sched) to get a visual picture of what the box d=
oes
> during a hang. =A0The timestamps in KTR are TSC cycle counts rather than =
an
> actual wall time which is why they look off. =A0If you have a lot of even=
ts you
> may want to use a larger KTR_ENTRIES size btw (I use 1048576 (2 ^ 20) her=
e at
> work to get large (multiple-second) traces).

I'm still working on this.

I enabled KTR and set it up to log KTR_SCHED events.  Then, I wrote a
script to exercise the HTTP server that actually ran on that machine,
and set it to issue "sysctl debug.ktr.cpumask=3D0" and abort if a
request took over 2 seconds.  28,613 requests later, it tripped over
one that took 2007ms.

(Just a refresher: this is a static file being served by an Apache
process that has nothing else to do but serve this file on a
relatively unloaded machine.)

I don't have access to any machines that can run X, so I did the best
I could to examine it from the shell.

First, this machine has two CPU's so I split up the KTR results on a
per-CPU basis so I could look at each individually.

With KTR_ENTRIES set to 1048576, I got about 53 seconds of data with
just KTR_SCHED enabled.  Since I was interested in a 2.007 second
period of time right at the end, I hacked it down to the last 3.795
seconds.

In the 3.795 seconds captured in the trace period on CPU 0 that
includes the entire 2.007 second stall, CPU 0 was idle for 3.175
seconds.

In the same period, CPU 1 was idle for 3.2589 seconds.

I did the best I could to manually page through all the scheduling
activity on both CPUs during that 3.7 second time, and I didn't see
anything really disruptive.  Mainly idle, with jumps into the clock
and ethernet kernel threads, as well as httpd.

If I understand that correctly and have done everything right, that
means that whatever happened, it wasn't related to CPU contention or
scheduling issues of any sort.

So, a couple of follow-up questions:

First, what else should I be looking at?  I built the kernel with kind
of a lot of KTR flags
(KTR_LOCK|KTR_SCHED|KTR_PROC|KTR_INTR|KTR_CALLOUT|KTR_UMA|KTR_SYSC)
but enabling them all produces enough output that even 1048576 entries
doesn't always go back two seconds; the volume of data is all but
unmanageable.

Second, is there any way to correlate the process address reported by
the KTR scheduler entries back to a PID?  It'd be nice to be able to
view the scheduler activity just for the process I'm interested in,
but I can't figure out which one it is. :)

Thanks!

From owner-freebsd-hackers@FreeBSD.ORG  Thu Sep 10 17:21:02 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BA2BF1065670
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 17:21:02 +0000 (UTC)
	(envelope-from guomingyan@gmail.com)
Received: from mail-pz0-f235.google.com (mail-pz0-f235.google.com
	[209.85.222.235])
	by mx1.freebsd.org (Postfix) with ESMTP id 99E648FC12
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 17:21:02 +0000 (UTC)
Received: by pzk24 with SMTP id 24so2571pzk.3
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 10:21:02 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <20090910120811.GH47688@deviant.kiev.zoral.com.ua>
Received: by 10.114.54.8 with SMTP id c8mr909442waa.1.1252603261728; Thu, 10 
	Sep 2009 10:21:01 -0700 (PDT)
Message-ID: <001636b149b575c79204733c6c1c@google.com>
Date: Thu, 10 Sep 2009 17:21:01 +0000
From: guomingyan@gmail.com
To: Kostik Belousov <kostikbel@gmail.com>, MingyanGuo <guomingyan@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-hackers@freebsd.org, LI Xin <delphij@delphij.net>
Subject: Re: Re: How to prevent other CPU from accessing a set of pages
	before calling pmap_remove_all functi
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Sep 2009 17:21:02 -0000

On Sep 10, 2009 5:08am, Kostik Belousov <kostikbel@gmail.com> wrote:
> On Wed, Sep 09, 2009 at 11:57:24PM -0700, MingyanGuo wrote:

> > On Wed, Sep 9, 2009 at 11:26 PM, MingyanGuo guomingyan@gmail.com> wrote:

> >

> > > Hi all,

> > >

> > > I find that function pmap_remove_all for arch amd64 works with a time

> > > window between reading & clearing the PTE flags(access flag and dirty  
> flag)

> > > and invalidating its TLB entry on other CPU. After some discussion  
> with Li

> > > Xin(cced), I think all the processes that are using the PTE being  
> removed

> > > should be blocked before calling pmap_remove_all, or other CPU may  
> dirty the

> > > page but does not set the dirty flag before the TLB entry is flushed.  
> But I

> > > can not find how to block them to call the function. I read the  
> function

> > > vm_pageout_scan in file vm/vm_pageout.c but can not find the exact  
> method it

> > > used. Or I just misunderstood the semantics of function  
> pmap_remove_all ?

> > >

> > > Thanks in advance.

> > >

> > > Regards,

> > > MingyanGuo

> > >

> >

> > Sorry for the noise. I understand the logic now. There is no time window

> > problem between reading & clearing the PTE and invalidating it on other  
> CPU,

> > even if other CPU is using the PTE. I misunderstood the logic.


> Hmm. What would happen for the following scenario.


> Assume that the page m is mapped by vm map active on CPU1, and that

> CPU1 has cached TLB entry for some writable mapping of this page,

> but neither TLB entry not PTE has dirty bit set.


> Then, assume that the following sequence of events occur:


> CPU1: CPU2:

> call pmap_remove_all(m)

> clear pte

> write to the address mapped

> by m [*]

> invalidate the TLB,

> possibly making IPI to CPU1


> I assume that at the point marked [*], we can

> - either loose the dirty bit, while CPU1 (atomically) sets the dirty bit

> in the cleared pte.

> Besides not properly tracking the modification status of the page,

> it could also cause the page table page to be modified, that would

> create non-zero page with PG_ZERO flag set.

> - or CPU1 re-reads the PTE entry when setting the dirty bit, and generates

> #pf since valid bit in PTE is zero.


> Intel documentation mentions that dirty or accessed bits updates are done

> with locked cycle, that definitely means that PTE is re-read, but I cannot

> find whether valid bit is rechecked.


I am not an architecture expert, but from a programmer's view,
I *think* using the 'in memory' PTE structure for the first write to
that PTE is more reasonable. To set the dirty bit, a CPU has to access  
memory
with locked cycles, so using the 'in memory' PTE structure should add few
performance burden but more friendly to software. However, it is just my
guess, I am reading the manuals to find if any description about it.

Regards,
MingyanGuo

From owner-freebsd-hackers@FreeBSD.ORG  Thu Sep 10 17:30:33 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5725F1065676
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 17:30:33 +0000 (UTC)
	(envelope-from rysto32@gmail.com)
Received: from ey-out-2122.google.com (ey-out-2122.google.com [74.125.78.26])
	by mx1.freebsd.org (Postfix) with ESMTP id DD1E08FC15
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 17:30:32 +0000 (UTC)
Received: by ey-out-2122.google.com with SMTP id 4so109081eyf.9
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 10:30:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:in-reply-to:references
	:date:message-id:subject:from:to:cc:content-type;
	bh=qfC62rcGOhakT2L1Uk3XZaRRqpTlPu9kZfRokq4KVbc=;
	b=JVC1lE1YiDAQwVXuqACyAmSj+FdFtlCG8bl91DBXA1xdQkN5Dy6IrlIMNbj4RoKhLv
	4UV+txk4oPRlKjt8ItD1DGo/DApMdEAU7We6/gbA9E1AuV6rV3pBSbz5aco0ShdRWDNd
	hv369YrYclMGtG5Z5HJq/xxRt3lUk0lmm0+sU=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=JKDXuAJ+wr7TflaEA5X4P9K/2ZIp50XUJXIL2Mp7JEISJMQBDbSP5tiJbqsIkNCCP8
	15ktA+EuZAodzb91p0nsonacfWRNxTl9ATyP4VOBZoVm8/2MrUhcVBuftaSoLQDzDBZv
	RvxGPHm+g6SX9PZoqnA0kEi11odXxYpAhFGxg=
MIME-Version: 1.0
Received: by 10.210.101.1 with SMTP id y1mr2004016ebb.67.1252601836555; Thu, 
	10 Sep 2009 09:57:16 -0700 (PDT)
In-Reply-To: <237c27100909100946q3d186af3h66757e0efff307a5@mail.gmail.com>
References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com>
	<200908261642.59419.jhb@freebsd.org>
	<237c27100908271237y66219ef4o4b1b8a6e13ab2f6c@mail.gmail.com>
	<200908271729.55213.jhb@freebsd.org>
	<237c27100909100946q3d186af3h66757e0efff307a5@mail.gmail.com>
Date: Thu, 10 Sep 2009 12:57:16 -0400
Message-ID: <bc2d970909100957y6d7fd707g9f3184165f8cb766@mail.gmail.com>
From: Ryan Stone <rysto32@gmail.com>
To: Linda Messerschmidt <linda.messerschmidt@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-hackers@freebsd.org
Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Sep 2009 17:30:33 -0000

You should be able to run schedgraph.py on a windows machine with python
installed.  It works just fine for me on XP.

From owner-freebsd-hackers@FreeBSD.ORG  Thu Sep 10 18:36:53 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4D1311065676
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 18:36:53 +0000 (UTC)
	(envelope-from linda.messerschmidt@gmail.com)
Received: from mail-qy0-f195.google.com (mail-qy0-f195.google.com
	[209.85.221.195])
	by mx1.freebsd.org (Postfix) with ESMTP id 02A4B8FC13
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 18:36:52 +0000 (UTC)
Received: by qyk33 with SMTP id 33so4139qyk.14
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 11:36:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:in-reply-to:references
	:date:message-id:subject:from:to:content-type
	:content-transfer-encoding;
	bh=0y6GXBQfPRSM1p1wkRQwAVsAH8YmHNuWOwsX3BEG/WI=;
	b=ePreWNDHWFINBiLRHPtluE9R/oqDMRYHYqkUOLhUvbbaLNsyrfqVZh+1p5icGNiHsr
	5xk0XrusKD1r3Ap5zDvna4ex/mYa/7CyvKJRNQ1bDtEPlSbfGhYXqQ44nTYP2B5ie7Ot
	ftQzIb3r5GskmK12tjibjCGJ09+ULEJ3cHV1A=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:content-type:content-transfer-encoding;
	b=iI0nOwUwBydW70PEh046n3uUiE/5zp9JJSEfBAjVcZig2CGGkf4gftp7ylEiGMO0W5
	1YfWwRfQYGDyYLzPc0kFj/mKdBJiwkRVSkFBIpKTct9U8VzSAy/giYespRTEBI/wC6Q9
	QQyVzd3qs8DdUtDZ+T42WH7HywMW/xE/4F0qw=
MIME-Version: 1.0
Received: by 10.229.39.69 with SMTP id f5mr1039615qce.107.1252607363953; Thu, 
	10 Sep 2009 11:29:23 -0700 (PDT)
In-Reply-To: <bc2d970909100957y6d7fd707g9f3184165f8cb766@mail.gmail.com>
References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com>
	<200908261642.59419.jhb@freebsd.org>
	<237c27100908271237y66219ef4o4b1b8a6e13ab2f6c@mail.gmail.com>
	<200908271729.55213.jhb@freebsd.org>
	<237c27100909100946q3d186af3h66757e0efff307a5@mail.gmail.com>
	<bc2d970909100957y6d7fd707g9f3184165f8cb766@mail.gmail.com>
Date: Thu, 10 Sep 2009 14:29:23 -0400
Message-ID: <237c27100909101129y28771061o86db3c6a50a640eb@mail.gmail.com>
From: Linda Messerschmidt <linda.messerschmidt@gmail.com>
To: freebsd-hackers@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Sep 2009 18:36:53 -0000

On Thu, Sep 10, 2009 at 12:57 PM, Ryan Stone<rysto32@gmail.com> wrote:
> You should be able to run schedgraph.py on a windows machine with python
> installed.=A0 It works just fine for me on XP.

Don't have any of those either, but I *did* get it working on a Mac
right out of the box.  Should have thought of that sooner. :)

The output looks pretty straightforward, but there are a couple of
things I find odd.

First, there's a point right around what I estimate to be the problem
time where schedgraph.py indicates gmond (the Ganglia monitor) was
running uninterrupted for a period of exactly 1 second.  However, it
also indicates that both CPU's idle tasks were *also* running almost
continuously during that time (subject to clock/net interrupts), and
that the run queue on both CPU's was zero for most of that second
while gmond was allegedly running.

Second, the interval I graphed was about nine seconds.  During that
time, the PHP command line script made a whole lot of requests: it
usleeps 50ms between requests, and non-broken requests average about
1.4ms.  So even with the stalled request chopping 2 seconds off the
end, there should be somewhere in the neighborhood of 130 requests
during the graphed period.  But that php process doesn't appear in the
schedgraph output at all.

So that doesn't make a whole lot of sense to me.

I'll try to get another trace and see if that happens the same way again.

From owner-freebsd-hackers@FreeBSD.ORG  Thu Sep 10 18:46:47 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7B6B3106568F
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 18:46:47 +0000 (UTC)
	(envelope-from julian@elischer.org)
Received: from outN.internet-mail-service.net (outn.internet-mail-service.net
	[216.240.47.237])
	by mx1.freebsd.org (Postfix) with ESMTP id 5C9A58FC1C
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 18:46:47 +0000 (UTC)
Received: from idiom.com (mx0.idiom.com [216.240.32.160])
	by out.internet-mail-service.net (Postfix) with ESMTP id 34EDEB9888;
	Thu, 10 Sep 2009 11:46:47 -0700 (PDT)
X-Client-Authorized: MaGic Cook1e
X-Client-Authorized: MaGic Cook1e
Received: from julian-mac.elischer.org (home.elischer.org [216.240.48.38])
	by idiom.com (Postfix) with ESMTP id B3EFB2D6021;
	Thu, 10 Sep 2009 11:46:46 -0700 (PDT)
Message-ID: <4AA94995.6030700@elischer.org>
Date: Thu, 10 Sep 2009 11:46:45 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Thunderbird 2.0.0.23 (Macintosh/20090812)
MIME-Version: 1.0
To: Linda Messerschmidt <linda.messerschmidt@gmail.com>
References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com>	<200908261642.59419.jhb@freebsd.org>	<237c27100908271237y66219ef4o4b1b8a6e13ab2f6c@mail.gmail.com>	<200908271729.55213.jhb@freebsd.org>	<237c27100909100946q3d186af3h66757e0efff307a5@mail.gmail.com>	<bc2d970909100957y6d7fd707g9f3184165f8cb766@mail.gmail.com>
	<237c27100909101129y28771061o86db3c6a50a640eb@mail.gmail.com>
In-Reply-To: <237c27100909101129y28771061o86db3c6a50a640eb@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org
Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Sep 2009 18:46:47 -0000

Linda Messerschmidt wrote:
> On Thu, Sep 10, 2009 at 12:57 PM, Ryan Stone<rysto32@gmail.com> wrote:
>> You should be able to run schedgraph.py on a windows machine with python
>> installed.  It works just fine for me on XP.
> 
> Don't have any of those either, but I *did* get it working on a Mac
> right out of the box.  Should have thought of that sooner. :)
> 
> The output looks pretty straightforward, but there are a couple of
> things I find odd.
> 
> First, there's a point right around what I estimate to be the problem
> time where schedgraph.py indicates gmond (the Ganglia monitor) was
> running uninterrupted for a period of exactly 1 second.  However, it
> also indicates that both CPU's idle tasks were *also* running almost
> continuously during that time (subject to clock/net interrupts), and
> that the run queue on both CPU's was zero for most of that second
> while gmond was allegedly running.

I've noticed that schedgraph tends to show the idle threads slightly
skewed one way or the other.  I think there is a cumulative rounding
error in the way they are drawn due to the fact that they are run so
often.  Check the raw data and I think you will find that you just
need to imagine the idle threads slightly to the left or right a bit.
The longer the trace and the further to he right you are looking
the more "out" the idle threads appear to be.

I saw this on both Linux and Mac python implementations.

> 
> Second, the interval I graphed was about nine seconds.  During that
> time, the PHP command line script made a whole lot of requests: it
> usleeps 50ms between requests, and non-broken requests average about
> 1.4ms.  So even with the stalled request chopping 2 seconds off the
> end, there should be somewhere in the neighborhood of 130 requests
> during the graphed period.  But that php process doesn't appear in the
> schedgraph output at all.
> 
> So that doesn't make a whole lot of sense to me.
> 
> I'll try to get another trace and see if that happens the same way again.
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"


From owner-freebsd-hackers@FreeBSD.ORG  Thu Sep 10 19:12:37 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7AD3E106568D
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 19:12:37 +0000 (UTC)
	(envelope-from linda.messerschmidt@gmail.com)
Received: from qw-out-2122.google.com (qw-out-2122.google.com [74.125.92.26])
	by mx1.freebsd.org (Postfix) with ESMTP id 232A98FC16
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 19:12:36 +0000 (UTC)
Received: by qw-out-2122.google.com with SMTP id 3so135323qwe.7
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 12:12:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:in-reply-to:references
	:date:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=sVbC4Lh/vy1vEFmT5CDa2TzOaYb0hOzpY+MQ/76B9/o=;
	b=bXy5eGDEziznh0qxMQHOITc7/6oY067mEczlDrGNHzpXI5H5/8uzVobA9IQVlsu8PF
	yTpbqEhyup07vyGbD4nDiWduOJVKPWBWYvjIHvtP2YM3IN316ZRJ4vrvGgslfyTiqpR9
	mRkLfv676H3PDECL5xxoRH1KaehQ8BAyVA+uE=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	b=hAbRRNHIbN5FNsWXrFkokNuriZ2GDaghcxj09vCcujsynDtzirliPNJKhE6WaL3CyY
	isbRNo7xaU+egxKj1xUc34Ko+jgWYvTRxbYu6C5aDVk/rnNGqz1Jg6MD3yeEK4FW9Kwx
	SsXgSSGTdd1yTDkaCf0jPIbM7V7TgAQyW5KvM=
MIME-Version: 1.0
Received: by 10.229.118.135 with SMTP id v7mr1052007qcq.62.1252609655432; Thu, 
	10 Sep 2009 12:07:35 -0700 (PDT)
In-Reply-To: <4AA94995.6030700@elischer.org>
References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com>
	<200908261642.59419.jhb@freebsd.org>
	<237c27100908271237y66219ef4o4b1b8a6e13ab2f6c@mail.gmail.com>
	<200908271729.55213.jhb@freebsd.org>
	<237c27100909100946q3d186af3h66757e0efff307a5@mail.gmail.com>
	<bc2d970909100957y6d7fd707g9f3184165f8cb766@mail.gmail.com>
	<237c27100909101129y28771061o86db3c6a50a640eb@mail.gmail.com>
	<4AA94995.6030700@elischer.org>
Date: Thu, 10 Sep 2009 15:07:35 -0400
Message-ID: <237c27100909101207q73f0c513r60dd5ab83fdfd083@mail.gmail.com>
From: Linda Messerschmidt <linda.messerschmidt@gmail.com>
To: Julian Elischer <julian@elischer.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-hackers@freebsd.org
Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Sep 2009 19:12:37 -0000

On Thu, Sep 10, 2009 at 2:46 PM, Julian Elischer<julian@elischer.org> wrote=
:
> I've noticed that schedgraph tends to show the idle threads slightly
> skewed one way or the other. =A0I think there is a cumulative rounding
> error in the way they are drawn due to the fact that they are run so
> often. =A0Check the raw data and I think you will find that you just
> need to imagine the idle threads slightly to the left or right a bit.

No, there's no period anywhere in the trace where either idle thread
didn't run for an entire second.

I'm pretty sure schedgraph is throwing in some nonsense results.  I
did capture a second, larger, dataset after a 2.1s stall, and
schedgraph includes an httpd process that supposedly spent 58 seconds
on the run queue.  I don't know if it's a dropped record or a parsing
error or what.

I do think on this second graph I can kind of see the *end* of the
stall, because all of a sudden a ton of processes... everything from
sshd to httpd to gmond to sh to vnlru to bufdaemon to fdc0... comes
off of whatever it's waiting on and hits the run queue.  The combined
run queues for both processors spike up to 32 tasks at one point and
then rapidly tail off as things return to normal.

That pretty much matches the behavior shown by ktrace in my initial
post, where everything goes to sleep on something-or-other in the
kernel, and then at the end of the stall, everything wakes up at the
same time.

I think this means the problem is somehow related to locking, rather
than scheduling.

From owner-freebsd-hackers@FreeBSD.ORG  Fri Sep 11 01:34:32 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 03A9C1065679
	for <freebsd-hackers@freebsd.org>; Fri, 11 Sep 2009 01:34:32 +0000 (UTC)
	(envelope-from linda.messerschmidt@gmail.com)
Received: from qw-out-2122.google.com (qw-out-2122.google.com [74.125.92.25])
	by mx1.freebsd.org (Postfix) with ESMTP id AE13F8FC12
	for <freebsd-hackers@freebsd.org>; Fri, 11 Sep 2009 01:34:31 +0000 (UTC)
Received: by qw-out-2122.google.com with SMTP id 3so224843qwe.7
	for <freebsd-hackers@freebsd.org>; Thu, 10 Sep 2009 18:34:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:in-reply-to:references
	:date:message-id:subject:from:to:content-type;
	bh=AcOA6yWvOzZohD7QvXPRhGb02MXgGs1sLdcUkXvI6w0=;
	b=ILyy6PH3CMZKLolRDXFlBsWrPcISlN0qfaTR8nPGeSvtx/C79R9wLcUDlJpecn2Yv8
	LlcoRl1/mSORdDstPfnKx6vOL0SpOfznsUAR0qP36qyAWQoEf19b2RnUkK+Z72Yrl1MB
	W+yMpCpHbrDJxaHdCYpo6lKheRCG45EWlK4iQ=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:content-type;
	b=ACqCSRvS+vFOMHOGRcC9ZqKV4rJqtTuMqyvEOOOLYiICoGzF4zeZyddMoBRqVwGWda
	uOUSiMkgk1WK8BzdTIBMIHVQvtPyyh7U4TAyeDXGLGNOsYS2AgwEAYuDkT8HGYpFGqoz
	7iQLoe0DmjhCOpRX5bw+2/XDTvAuLbfDI7t0E=
MIME-Version: 1.0
Received: by 10.229.9.147 with SMTP id l19mr1146347qcl.65.1252632870963; Thu, 
	10 Sep 2009 18:34:30 -0700 (PDT)
In-Reply-To: <237c27100909101207q73f0c513r60dd5ab83fdfd083@mail.gmail.com>
References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com>
	<200908261642.59419.jhb@freebsd.org>
	<237c27100908271237y66219ef4o4b1b8a6e13ab2f6c@mail.gmail.com>
	<200908271729.55213.jhb@freebsd.org>
	<237c27100909100946q3d186af3h66757e0efff307a5@mail.gmail.com>
	<bc2d970909100957y6d7fd707g9f3184165f8cb766@mail.gmail.com>
	<237c27100909101129y28771061o86db3c6a50a640eb@mail.gmail.com>
	<4AA94995.6030700@elischer.org>
	<237c27100909101207q73f0c513r60dd5ab83fdfd083@mail.gmail.com>
Date: Thu, 10 Sep 2009 21:34:30 -0400
Message-ID: <237c27100909101834g49438707l96fa58df5f717945@mail.gmail.com>
From: Linda Messerschmidt <linda.messerschmidt@gmail.com>
To: freebsd-hackers@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Sep 2009 01:34:32 -0000

Just to follow up, I've been doing some testing with masking for
KTR_LOCK rather than KTR_SCHED.

I'm having trouble with this because I have the KTR buffer size set to
1048576 entries, and with only KTR_LOCK enabled, this isn't enough for
even a full second of tracing; the sample I'm working with now is just
under 0.9s.  It's an average of one entry every 2001 TSC ticks.  That
*seems* like a lot of locking activity, but some of the lock points
are only a couple of lines apart, so maybe it's just incredibly
verbose.

Since it's so much data and I'm still working on a way to correlate it
(lockgraph.py?), all I've got so far is a list of what trace points
are coming up the most:

51927 src/sys/kern/kern_lock.c:215  (_lockmgr UNLOCK mtx_unlock() when
flags & LK_INTERLOCK)
48033 src/sys/kern/vfs_subr.c:2284  (vdropl UNLOCK)
41548 src/sys/kern/vfs_subr.c:2187  (vput VI_LOCK)
29359 src/sys/kern/vfs_subr.c:2067  (vget VI_LOCK)
29358 src/sys/kern/vfs_subr.c:2079  (vget VI_UNLOCK)
23799 src/sys/nfsclient/nfs_subs.c:755  (nfs_getattrcache mtx_lock)
23460 src/sys/nfsclient/nfs_vnops.c:645  (nfs_getattr mtx_unlock)
23460 src/sys/nfsclient/nfs_vnops.c:642  (nfs_getattr mtx_lock)
23460 src/sys/nfsclient/nfs_subs.c:815  (nfs_getattrcache mtx_unlock)
23138 src/sys/kern/vfs_cache.c:345  (cache_lookup CACHE_LOCK)

Unfortunately, it kind of sounds like I'm on my way to answering "why
is this system slow?" even though it really isn't slow.  (And I rush
to point out that the Apache process in question doesn't at any point
in its life touch NFS, though some of the other ones on the machine
do.)

In order to be the cause of my Apache problem, all this goobering
around with NFS would have to be relatively infrequent but so intense
that it shoves everything else out of the way.  I'm skeptical, but I'm
sure one of you guys can offer a more informed opinion.

The only other thing I can think of is maybe all this is running me
out of something I need (vnodes?) so everybody else blocks until it
finishes and lets go of whatever finite resource it's using up?  But
that doesn't make a ton of sense either, because why would a lack of
vnodes cause stalls in accept() or select() in unrelated processes?

Not sure if I'm going in the right direction here or not.

From owner-freebsd-hackers@FreeBSD.ORG  Fri Sep 11 15:19:28 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 00660106566C
	for <freebsd-hackers@freebsd.org>; Fri, 11 Sep 2009 15:19:27 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id B04AB8FC1A
	for <freebsd-hackers@freebsd.org>; Fri, 11 Sep 2009 15:19:27 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 40AC646B03;
	Fri, 11 Sep 2009 11:19:27 -0400 (EDT)
Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8])
	by bigwig.baldwin.cx (Postfix) with ESMTPA id 772D48A01B;
	Fri, 11 Sep 2009 11:19:26 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-hackers@freebsd.org
Date: Fri, 11 Sep 2009 11:02:14 -0400
User-Agent: KMail/1.9.7
References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com>
	<237c27100909101207q73f0c513r60dd5ab83fdfd083@mail.gmail.com>
	<237c27100909101834g49438707l96fa58df5f717945@mail.gmail.com>
In-Reply-To: <237c27100909101834g49438707l96fa58df5f717945@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200909111102.14503.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(bigwig.baldwin.cx); Fri, 11 Sep 2009 11:19:26 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE
	autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx
Cc: Linda Messerschmidt <linda.messerschmidt@gmail.com>
Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Sep 2009 15:19:28 -0000

On Thursday 10 September 2009 9:34:30 pm Linda Messerschmidt wrote:
> Just to follow up, I've been doing some testing with masking for
> KTR_LOCK rather than KTR_SCHED.
> 
> I'm having trouble with this because I have the KTR buffer size set to
> 1048576 entries, and with only KTR_LOCK enabled, this isn't enough for
> even a full second of tracing; the sample I'm working with now is just
> under 0.9s.  It's an average of one entry every 2001 TSC ticks.  That
> *seems* like a lot of locking activity, but some of the lock points
> are only a couple of lines apart, so maybe it's just incredibly
> verbose.
> 
> Since it's so much data and I'm still working on a way to correlate it
> (lockgraph.py?), all I've got so far is a list of what trace points
> are coming up the most:
> 
> 51927 src/sys/kern/kern_lock.c:215  (_lockmgr UNLOCK mtx_unlock() when
> flags & LK_INTERLOCK)
> 48033 src/sys/kern/vfs_subr.c:2284  (vdropl UNLOCK)
> 41548 src/sys/kern/vfs_subr.c:2187  (vput VI_LOCK)
> 29359 src/sys/kern/vfs_subr.c:2067  (vget VI_LOCK)
> 29358 src/sys/kern/vfs_subr.c:2079  (vget VI_UNLOCK)
> 23799 src/sys/nfsclient/nfs_subs.c:755  (nfs_getattrcache mtx_lock)
> 23460 src/sys/nfsclient/nfs_vnops.c:645  (nfs_getattr mtx_unlock)
> 23460 src/sys/nfsclient/nfs_vnops.c:642  (nfs_getattr mtx_lock)
> 23460 src/sys/nfsclient/nfs_subs.c:815  (nfs_getattrcache mtx_unlock)
> 23138 src/sys/kern/vfs_cache.c:345  (cache_lookup CACHE_LOCK)
> 
> Unfortunately, it kind of sounds like I'm on my way to answering "why
> is this system slow?" even though it really isn't slow.  (And I rush
> to point out that the Apache process in question doesn't at any point
> in its life touch NFS, though some of the other ones on the machine
> do.)
> 
> In order to be the cause of my Apache problem, all this goobering
> around with NFS would have to be relatively infrequent but so intense
> that it shoves everything else out of the way.  I'm skeptical, but I'm
> sure one of you guys can offer a more informed opinion.
> 
> The only other thing I can think of is maybe all this is running me
> out of something I need (vnodes?) so everybody else blocks until it
> finishes and lets go of whatever finite resource it's using up?  But
> that doesn't make a ton of sense either, because why would a lack of
> vnodes cause stalls in accept() or select() in unrelated processes?
> 
> Not sure if I'm going in the right direction here or not.

Try turning off KTR_LOCK for spin mutexes (just force LO_QUIET on in 
mtx_init() if MTX_SPIN is set) and use a schedgraph.py from the latest 
RELENG_7.  It knows how to parse KTR_LOCK events and drop event "bars" for 
locks showing when they are held.  A more recently schedgraph.py might also 
fix the bugs you were seeing with the idle threads looking too long (esp. at 
the start and end of graphs).

-- 
John Baldwin

From owner-freebsd-hackers@FreeBSD.ORG  Fri Sep 11 15:35:17 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 21EB41065672
	for <freebsd-hackers@freebsd.org>; Fri, 11 Sep 2009 15:35:17 +0000 (UTC)
	(envelope-from julian@elischer.org)
Received: from outY.internet-mail-service.net (outy.internet-mail-service.net
	[216.240.47.248])
	by mx1.freebsd.org (Postfix) with ESMTP id 086C48FC14
	for <freebsd-hackers@freebsd.org>; Fri, 11 Sep 2009 15:35:16 +0000 (UTC)
Received: from idiom.com (mx0.idiom.com [216.240.32.160])
	by out.internet-mail-service.net (Postfix) with ESMTP id C6CC99DA80;
	Fri, 11 Sep 2009 08:35:16 -0700 (PDT)
X-Client-Authorized: MaGic Cook1e
X-Client-Authorized: MaGic Cook1e
X-Client-Authorized: MaGic Cook1e
Received: from julian-mac.elischer.org (home.elischer.org [216.240.48.38])
	by idiom.com (Postfix) with ESMTP id 322E02D6026;
	Fri, 11 Sep 2009 08:35:16 -0700 (PDT)
Message-ID: <4AAA6E32.2080609@elischer.org>
Date: Fri, 11 Sep 2009 08:35:14 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Thunderbird 2.0.0.23 (Macintosh/20090812)
MIME-Version: 1.0
To: John Baldwin <jhb@freebsd.org>
References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com>	<237c27100909101207q73f0c513r60dd5ab83fdfd083@mail.gmail.com>	<237c27100909101834g49438707l96fa58df5f717945@mail.gmail.com>
	<200909111102.14503.jhb@freebsd.org>
In-Reply-To: <200909111102.14503.jhb@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org,
	Linda Messerschmidt <linda.messerschmidt@gmail.com>
Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Sep 2009 15:35:17 -0000

John Baldwin wrote:
>
> 
>  A more recently schedgraph.py might also 
> fix the bugs you were seeing with the idle threads looking too long (esp. at 
> the start and end of graphs).

not unless something has been fixed in the last week or so.


From owner-freebsd-hackers@FreeBSD.ORG  Fri Sep 11 17:35:02 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5A7AC1065672;
	Fri, 11 Sep 2009 17:35:02 +0000 (UTC)
	(envelope-from linda.messerschmidt@gmail.com)
Received: from qw-out-2122.google.com (qw-out-2122.google.com [74.125.92.24])
	by mx1.freebsd.org (Postfix) with ESMTP id 027A98FC22;
	Fri, 11 Sep 2009 17:35:01 +0000 (UTC)
Received: by qw-out-2122.google.com with SMTP id 3so398647qwe.7
	for <multiple recipients>; Fri, 11 Sep 2009 10:35:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:in-reply-to:references
	:date:message-id:subject:from:to:cc:content-type;
	bh=om33xvDen/rMS5cuPwK8tF3AE5NDy4Tb7L9akpe400E=;
	b=qO5DYVTFj2b6ubn+WG4fZs0iHebWu1ZWo8V2Uhq0E6IvOgwR716xZyhgVwd5u2LMq9
	FRyLN7ny7LLUs0hPiWu2bJUqi+h29KoPTqYhjltQ5j0gwpZ5pfoTC68eTLT1uU4yjaha
	QW2UF0exwPbMEyEJFiLORz3gGkQGZbGAgzoYk=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=TiWTjKD54Xr8FvMjJ+I0QAFRmsfDmvfBlVosAdz/hffLpWh9U6z1AX2VyOtk9PNSop
	a3bg1uU1rtv8qMjRc5ib+O772Kl6bPp9w8cgNkr7KwSuOII2vkPBAV/sAGkM8I9Fz2ut
	TxKwP4UgHV67t0jOf5Q2gDXd1J3YB70gb0V7A=
MIME-Version: 1.0
Received: by 10.229.23.212 with SMTP id s20mr1355284qcb.71.1252690501044; Fri, 
	11 Sep 2009 10:35:01 -0700 (PDT)
In-Reply-To: <200909111102.14503.jhb@freebsd.org>
References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com>
	<237c27100909101207q73f0c513r60dd5ab83fdfd083@mail.gmail.com>
	<237c27100909101834g49438707l96fa58df5f717945@mail.gmail.com>
	<200909111102.14503.jhb@freebsd.org>
Date: Fri, 11 Sep 2009 13:35:00 -0400
Message-ID: <237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com>
From: Linda Messerschmidt <linda.messerschmidt@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-hackers@freebsd.org
Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Sep 2009 17:35:02 -0000

On Fri, Sep 11, 2009 at 11:02 AM, John Baldwin <jhb@freebsd.org> wrote:
> Try turning off KTR_LOCK for spin mutexes (just force LO_QUIET on in
> mtx_init() if MTX_SPIN is set)

I have *no* idea what you just said. :)

Which is fine.  But more to the point, I have no idea how to do it. :)

> A more recently schedgraph.py might also
> fix the bugs you were seeing with the idle threads looking too long (esp. at
> the start and end of graphs).

We are already on RELENG_7 due to the KTR-enabling rebuild, so that'd
be the version we're using unless, as Julian observed, it's been fixed
in the past week or so.

Thanks!

From owner-freebsd-hackers@FreeBSD.ORG  Fri Sep 11 19:14:46 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4239A1065693
	for <freebsd-hackers@freebsd.org>; Fri, 11 Sep 2009 19:14:46 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 120668FC1C
	for <freebsd-hackers@freebsd.org>; Fri, 11 Sep 2009 19:14:46 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id A101346B35;
	Fri, 11 Sep 2009 15:14:45 -0400 (EDT)
Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8])
	by bigwig.baldwin.cx (Postfix) with ESMTPA id E42868A026;
	Fri, 11 Sep 2009 15:14:44 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Julian Elischer <julian@elischer.org>
Date: Fri, 11 Sep 2009 13:00:37 -0400
User-Agent: KMail/1.9.7
References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com>
	<200909111102.14503.jhb@freebsd.org>
	<4AAA6E32.2080609@elischer.org>
In-Reply-To: <4AAA6E32.2080609@elischer.org>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200909111300.37599.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(bigwig.baldwin.cx); Fri, 11 Sep 2009 15:14:44 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE
	autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx
Cc: freebsd-hackers@freebsd.org,
	Linda Messerschmidt <linda.messerschmidt@gmail.com>
Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Sep 2009 19:14:46 -0000

On Friday 11 September 2009 11:35:14 am Julian Elischer wrote:
> John Baldwin wrote:
> >
> > 
> >  A more recently schedgraph.py might also 
> > fix the bugs you were seeing with the idle threads looking too long (esp. 
at 
> > the start and end of graphs).
> 
> not unless something has been fixed in the last week or so.

Well, I wasn't sure how old of a schedgraph.py is being used.  7.1 would have 
the bugs, but I think 7.2 should be fine.

-- 
John Baldwin

From owner-freebsd-hackers@FreeBSD.ORG  Fri Sep 11 19:14:47 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E574C1065694
	for <freebsd-hackers@freebsd.org>; Fri, 11 Sep 2009 19:14:47 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id B5E9D8FC13
	for <freebsd-hackers@freebsd.org>; Fri, 11 Sep 2009 19:14:47 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 66A3946B52;
	Fri, 11 Sep 2009 15:14:47 -0400 (EDT)
Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8])
	by bigwig.baldwin.cx (Postfix) with ESMTPA id 7E1348A01B;
	Fri, 11 Sep 2009 15:14:46 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Linda Messerschmidt <linda.messerschmidt@gmail.com>
Date: Fri, 11 Sep 2009 15:06:47 -0400
User-Agent: KMail/1.9.7
References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com>
	<200909111102.14503.jhb@freebsd.org>
	<237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com>
In-Reply-To: <237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200909111506.47309.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(bigwig.baldwin.cx); Fri, 11 Sep 2009 15:14:46 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE
	autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx
Cc: freebsd-hackers@freebsd.org
Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Sep 2009 19:14:48 -0000

On Friday 11 September 2009 1:35:00 pm Linda Messerschmidt wrote:
> On Fri, Sep 11, 2009 at 11:02 AM, John Baldwin <jhb@freebsd.org> wrote:
> > Try turning off KTR_LOCK for spin mutexes (just force LO_QUIET on in
> > mtx_init() if MTX_SPIN is set)
> 
> I have *no* idea what you just said. :)
> 
> Which is fine.  But more to the point, I have no idea how to do it. :)

Something like this:

Index: sys/kern/kern_mutex.c
===================================================================
--- sys/kern/kern_mutex.c       (.../mirror/FreeBSD/stable/7)   (revision 195943)
+++ sys/kern/kern_mutex.c       (.../stable/7)  (revision 195943)
@@ -747,6 +747,10 @@
        if (opts & MTX_NOPROFILE)
                flags |= LO_NOPROFILE;

+       /* XXX: Only log for regular mutexes. */
+       if (opts & MTX_SPIN)
+               flags |= LO_QUIET;
+
        /* Initialize mutex. */
        m->mtx_lock = MTX_UNOWNED;
        m->mtx_recurse = 0;

> > A more recently schedgraph.py might also
> > fix the bugs you were seeing with the idle threads looking too long (esp. at
> > the start and end of graphs).
> 
> We are already on RELENG_7 due to the KTR-enabling rebuild, so that'd
> be the version we're using unless, as Julian observed, it's been fixed
> in the past week or so.

Hmm.  It works well for me for doing traces.

-- 
John Baldwin

From owner-freebsd-hackers@FreeBSD.ORG  Fri Sep 11 23:14:23 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EF84A106566B;
	Fri, 11 Sep 2009 23:14:23 +0000 (UTC) (envelope-from jilles@stack.nl)
Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104])
	by mx1.freebsd.org (Postfix) with ESMTP id B7CB48FC0A;
	Fri, 11 Sep 2009 23:14:23 +0000 (UTC)
Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131])
	by mx1.stack.nl (Postfix) with ESMTP id CCBB335A829;
	Sat, 12 Sep 2009 01:14:22 +0200 (CEST)
Received: by snail.stack.nl (Postfix, from userid 1677)
	id B0850228CD; Sat, 12 Sep 2009 01:14:22 +0200 (CEST)
Date: Sat, 12 Sep 2009 01:14:22 +0200
From: Jilles Tjoelker <jilles@stack.nl>
To: Eygene Ryabinkin <rea-fbsd@codelabs.ru>
Message-ID: <20090911231422.GA41683@stack.nl>
References: <4A7B1DB0.1040602@FreeBSD.org>
	<ruze2YSbnZSTRbUbUJTTZyWDlyA@PR865FKBRXbdFBYZMkH1tmebpCc>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <ruze2YSbnZSTRbUbUJTTZyWDlyA@PR865FKBRXbdFBYZMkH1tmebpCc>
User-Agent: Mutt/1.5.18 (2008-05-17)
Cc: freebsd-hackers@freebsd.org, Doug Barton <dougb@FreeBSD.org>
Subject: Re: Problem in bin/sh stripping the * character through
	${expansion%}
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Sep 2009 23:14:24 -0000

On Fri, Aug 07, 2009 at 03:26:50AM +0400, Eygene Ryabinkin wrote:
> Thu, Aug 06, 2009 at 11:15:12AM -0700, Doug Barton wrote:
> > I came across this problem during a recent portmaster update. When
> > trying to strip off the * character using variable expansion in bin/sh
> > it doesn't work. Other "special" characters do work if they are
> > properly escaped.

> > The attached mini-script clearly shows the problem:

> > $ sh sh-strip-problem
> > var before stripping: foo\*
> > var after stripping: foo\*

> > var before stripping: foo\$
> > var after stripping: foo\

> According to the sh(1), it is not a problem.  Namely,
>  - \* being unquoted at all will produce a lone '*';
>  - '*' when treated as the smallest pattern, will result in a stripping
>    of a zero-length string -- it is the smallest pattern in the case of
>    '*' that matches anything.

That is indeed an explanation why it works that way, but I think it is
wrong. Generally, the shell command language avoids unnecessary levels
of quoting. In the POSIX spec, "Shell Command Language", note the part
about "${x#*}" (pattern) and ${x#"*"} (literal asterisk). Also compare
with  case $something in \*) echo asterisk;; esac  which matches a
literal asterisk.

Two PRs already exist for aspects of stripping: bin/57554 (double
quotes) and bin/117748 (trying to match pattern matching characters
literally).

> In order to strip the trailing star you should use
> -----
> var=${var%[*]}
> -----
> This gives you the pattern of '[*]' that is properly treated as the
> single star -- it's a weird way to escape the star in the patterns.

This is indeed a good workaround.

-- 
Jilles Tjoelker

From owner-freebsd-hackers@FreeBSD.ORG  Sat Sep 12 02:05:16 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D1C4F106568B;
	Sat, 12 Sep 2009 02:05:16 +0000 (UTC)
	(envelope-from linda.messerschmidt@gmail.com)
Received: from qw-out-2122.google.com (qw-out-2122.google.com [74.125.92.26])
	by mx1.freebsd.org (Postfix) with ESMTP id 740DE8FC1F;
	Sat, 12 Sep 2009 02:05:16 +0000 (UTC)
Received: by qw-out-2122.google.com with SMTP id 3so510164qwe.7
	for <multiple recipients>; Fri, 11 Sep 2009 19:05:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:in-reply-to:references
	:date:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=DB9dYl4Xb52O8ZSJQSPaSURbA1jWuHf6B81Kc2pwfoc=;
	b=ippOQgOWPszWVgs106f9osi5Cocz5c3jgPEs4LJKgF9GEfwxbZ5lG6URpYEUDOYNIT
	rEtiCMevutbV7eTvLq5HiDiYbehfMiVl6c3rKltug7PT/RuBNcoBv3BB+hz+EPUrcqDN
	3c1paR9+nv62uKXi/U7xLDeumVNiIvBCB7cGk=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	b=P1WCcfYrqNxK2SBukrZ1+Sckvf+7kC8uv/uO3MBMmszLVuL7V+JCEx8qR0GCGaVnhs
	EgTlhV9auh57UQbzkVanQF8DwUstFAjic8/eDGVWSddrMTiPQKhkcx5IQfJWqL1BfwYq
	t/mgIGDq3xOXBCBNjDFnjYe6gdfT61OB8xQqk=
MIME-Version: 1.0
Received: by 10.229.10.13 with SMTP id n13mr1443531qcn.103.1252721115838; Fri, 
	11 Sep 2009 19:05:15 -0700 (PDT)
In-Reply-To: <200909111506.47309.jhb@freebsd.org>
References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com>
	<200909111102.14503.jhb@freebsd.org>
	<237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com>
	<200909111506.47309.jhb@freebsd.org>
Date: Fri, 11 Sep 2009 22:05:15 -0400
Message-ID: <237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com>
From: Linda Messerschmidt <linda.messerschmidt@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-hackers@freebsd.org
Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Sep 2009 02:05:16 -0000

On Fri, Sep 11, 2009 at 3:06 PM, John Baldwin <jhb@freebsd.org> wrote:
> Something like this:

Ah, I understand now. :)

Got up to 17 seconds of trace with that change.

> Hmm. =A0It works well for me for doing traces.

It definitely works, it just always seems to have some-or-another
weird artifact.

But, with the lock info added, the locks that show big ugly gaping
multi-second "lock acquire" bars are: unp_mtx and so_rcv_sx.  I'm not
100% confident in this data yet, so I will try to get more data to
confirm, but if that offers any clues about where to look, I'm all
ears.

I'm also a bit hazy on what the dark grey vs. light grey background is abou=
t.

Thanks!

From owner-freebsd-hackers@FreeBSD.ORG  Sat Sep 12 03:55:36 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 530541065670;
	Sat, 12 Sep 2009 03:55:36 +0000 (UTC)
	(envelope-from linda.messerschmidt@gmail.com)
Received: from qw-out-2122.google.com (qw-out-2122.google.com [74.125.92.27])
	by mx1.freebsd.org (Postfix) with ESMTP id EC23E8FC14;
	Sat, 12 Sep 2009 03:55:35 +0000 (UTC)
Received: by qw-out-2122.google.com with SMTP id 3so522095qwe.7
	for <multiple recipients>; Fri, 11 Sep 2009 20:55:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:in-reply-to:references
	:date:message-id:subject:from:to:cc:content-type;
	bh=G01ABb/Ddx21tXgaeEnog8YW96nc8mzkPuBBMY514z8=;
	b=CRWDThaOrl+AKs0iLZiQ7daVKoGaSuUg2plrUrFgmR4RHQk08z3zib8jGur13nou8R
	osmlV7fiPd2XCei+vX6DXjfSu5Y8Uwyjsmis09f0NHDb3Mzn5+l8vG6W1hox3dA3EjH9
	iVn5RHJUM4xsBBtzI+mTBEaY6IB1T7raBCPTU=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=bQdWyuiZJAhuOx1PxoSrOvGCnpK1LJp3bQGhBG/sih+wv8t4XLxLmi+DiRGKBcNTnM
	el5RlrB1l0e8sM8/YLDzkUDuCh0RhYPAqo6r+xdhYQEMv1qYgIjqNsWC2qki0hvAIgHv
	OM31pptO9iACY+oEAI4ZDvSwslMi5FZQ8B84c=
MIME-Version: 1.0
Received: by 10.229.29.85 with SMTP id p21mr1488496qcc.101.1252727735381; Fri, 
	11 Sep 2009 20:55:35 -0700 (PDT)
In-Reply-To: <237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com>
References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com>
	<200909111102.14503.jhb@freebsd.org>
	<237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com>
	<200909111506.47309.jhb@freebsd.org>
	<237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com>
Date: Fri, 11 Sep 2009 23:55:35 -0400
Message-ID: <237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com>
From: Linda Messerschmidt <linda.messerschmidt@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-hackers@freebsd.org
Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Sep 2009 03:55:36 -0000

OK, I have learned that ktrdump looks up the name of the process
associated with a particular KSE at the the time of the dump, so if
it's changed since tracing stopped, it will blissfully blame the wrong
process.  I understand why that's the case, but it still sucks for
troubleshooting. :(

This time, "pf task mtx" and "vnode_free_list" are the locks getting
the blame.  The processes fingered are an httpd ( (the root "parent"
of the one doing the work, which does nothing but select() for 1s and
wait to see if its children died), and vnlru.  No correlation at all
to the previous results, and this machine is now utterly quiescent
except for the httpd process and the PHP exerciser.  Hard to imagine
vnlru has 1s worth of running to do on a machine with 949 total vnodes
in use.

A third run produced a 997ms "lock acquire" for "buffer daemon lock,"
a 497ms one for ip6qlock (no, there's no IPv6 in use on this machine),
and an 8s (!!!) one on unp_mtx. bufdaemon had a 997s "running" bar,
but according to the raw TSC values, that happened on the same CPU
1.999s *after* the 997ms buffer daemon lock acquire.

I really don't know where to go from here.  There's so little
consistency that I'm just not sure if the data is bad, the tool is
bad, the operator is bad, or there's some problem so fundamentally
horrible that all I'm seeing is random side effects.

From owner-freebsd-hackers@FreeBSD.ORG  Sat Sep 12 04:06:15 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 29CC41065670
	for <freebsd-hackers@freebsd.org>; Sat, 12 Sep 2009 04:06:15 +0000 (UTC)
	(envelope-from julian@elischer.org)
Received: from outQ.internet-mail-service.net (outq.internet-mail-service.net
	[216.240.47.240])
	by mx1.freebsd.org (Postfix) with ESMTP id 0B3508FC23
	for <freebsd-hackers@freebsd.org>; Sat, 12 Sep 2009 04:06:14 +0000 (UTC)
Received: from idiom.com (mx0.idiom.com [216.240.32.160])
	by out.internet-mail-service.net (Postfix) with ESMTP id D6174B9872;
	Fri, 11 Sep 2009 21:06:14 -0700 (PDT)
X-Client-Authorized: MaGic Cook1e
X-Client-Authorized: MaGic Cook1e
X-Client-Authorized: MaGic Cook1e
Received: from julian-mac.elischer.org (home.elischer.org [216.240.48.38])
	by idiom.com (Postfix) with ESMTP id 160F42D6011;
	Fri, 11 Sep 2009 21:06:14 -0700 (PDT)
Message-ID: <4AAB1E34.2060908@elischer.org>
Date: Fri, 11 Sep 2009 21:06:12 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Thunderbird 2.0.0.23 (Macintosh/20090812)
MIME-Version: 1.0
To: Linda Messerschmidt <linda.messerschmidt@gmail.com>
References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com>	<200909111102.14503.jhb@freebsd.org>	<237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com>	<200909111506.47309.jhb@freebsd.org>	<237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com>
	<237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com>
In-Reply-To: <237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org
Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Sep 2009 04:06:15 -0000

Linda Messerschmidt wrote:
> OK, I have learned that ktrdump looks up the name of the process
> associated with a particular KSE at the the time of the dump, so if
> it's changed since tracing stopped, it will blissfully blame the wrong
> process.  I understand why that's the case, but it still sucks for
> troubleshooting. :(
> 
> This time, "pf task mtx" and "vnode_free_list" are the locks getting
> the blame.  The processes fingered are an httpd ( (the root "parent"
> of the one doing the work, which does nothing but select() for 1s and
> wait to see if its children died), and vnlru.  No correlation at all
> to the previous results, and this machine is now utterly quiescent
> except for the httpd process and the PHP exerciser.  Hard to imagine
> vnlru has 1s worth of running to do on a machine with 949 total vnodes
> in use.
> 
> A third run produced a 997ms "lock acquire" for "buffer daemon lock,"
> a 497ms one for ip6qlock (no, there's no IPv6 in use on this machine),
> and an 8s (!!!) one on unp_mtx. bufdaemon had a 997s "running" bar,
> but according to the raw TSC values, that happened on the same CPU
> 1.999s *after* the 997ms buffer daemon lock acquire.
> 
> I really don't know where to go from here.  There's so little
> consistency that I'm just not sure if the data is bad, the tool is
> bad, the operator is bad, or there's some problem so fundamentally
> horrible that all I'm seeing is random side effects.
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"

does the system have a serial console? how about a normal console 
/keyboard?

how often deos it hang? and for  how long?
is there a chance that you could notice when it is hung and hit 
<CTL><LAT><ESC> and drop it into the debugger IN teh hung state?

It is possible if you have a serial port to make a program that sends 
a char back and forth and when the machine hangs, sends teh magic 
sequence. (I think it's CR<tilde><CTL-D> for serial debugger break,
but I'm sure you can look up the kernel options and the chars in google.)

if you can drop the machine into DDB (teh kernel debugger) in teh
hung state, then there are lots of comands you can do to find out
what is wrong. jhb actually gave a short talk that I videod and put
on youtube on the topic.

ps will show you what is actually running on which CPU and you an see 
what locks all the other processes are waiting on.
then you can examine those locks and see who owns them.


From owner-freebsd-hackers@FreeBSD.ORG  Sat Sep 12 04:47:33 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1A12C1065670;
	Sat, 12 Sep 2009 04:47:33 +0000 (UTC)
	(envelope-from linda.messerschmidt@gmail.com)
Received: from mail-qy0-f200.google.com (mail-qy0-f200.google.com
	[209.85.221.200])
	by mx1.freebsd.org (Postfix) with ESMTP id ADB998FC19;
	Sat, 12 Sep 2009 04:47:32 +0000 (UTC)
Received: by qyk38 with SMTP id 38so1416699qyk.27
	for <multiple recipients>; Fri, 11 Sep 2009 21:47:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:in-reply-to:references
	:date:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=E0ES0qzX1s/QQHqzNRUrEmLteYbJ5VeX2hoOHPWWUg8=;
	b=stI2a9NCwc458GiVAL7onTljaIwWDcQLtpWXJN74tA/5qGt/7WEVou34mtg2CgHZHB
	QOovx+6zhUOd+76ukA1gWp1FvyV//3aq0pczq6fXcdIYupB0PKyj2Av71yr2QY3qjWzp
	WCptlpP9YO4L5W6mXdjQdPJQlTJeHs5MqyFkg=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	b=u6UEqjLwxIaCAl8xDIukGAj/ufoEowGSIX73EPaV9QWW3Hth0EgSsHTatd3UbmZ2zf
	mMtKtRExEiHzs0PR4fQsP0ZFCyhuqCuDbhuwXYBpoqMQAEaCOKI2U5taEURdDLPGP+35
	ALsZuPH/FPED64my6nLoETbjTuD5Cg2pYrCQ8=
MIME-Version: 1.0
Received: by 10.229.119.69 with SMTP id y5mr1469532qcq.100.1252730851756; Fri, 
	11 Sep 2009 21:47:31 -0700 (PDT)
In-Reply-To: <4AAB1E34.2060908@elischer.org>
References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com>
	<200909111102.14503.jhb@freebsd.org>
	<237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com>
	<200909111506.47309.jhb@freebsd.org>
	<237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com>
	<237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com>
	<4AAB1E34.2060908@elischer.org>
Date: Sat, 12 Sep 2009 00:47:31 -0400
Message-ID: <237c27100909112147h64f71585p2a97f2b48a510985@mail.gmail.com>
From: Linda Messerschmidt <linda.messerschmidt@gmail.com>
To: Julian Elischer <julian@elischer.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-hackers@freebsd.org
Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Sep 2009 04:47:33 -0000

On Sat, Sep 12, 2009 at 12:06 AM, Julian Elischer <julian@elischer.org> wro=
te:
> does the system have a serial console? how about a normal console /keyboa=
rd?

It has an IP KVM.

> how often deos it hang? and for =A0how long?

Well, this is interesting.  I got really frustrated with the other
approach, so I thought I'd thin a machine down absolutely as far as I
could, eliminate every possible source of delay, and see what happens.
 I killed everything... cron, RPC, NFS, devd, gmon, nrpe, everything.
The Apache and its exerciser are now the only things running on the
machine, and the Apache is only touching an md0 swap device mounted on
/mnt.  I *still* get the hangs.

It hangs for all sorts of different periods, but the duration of the
stall is approximately inversely proportional to the chance of seeing
it.  To get a short delay, you need wait only a little bit.  If you
want a 2-3 second delay, you may have to wait 15-20 minutes.

*However* in order to answer your question, I changed up the test
program, which up til now has been cycling requests every 50 ms until
it gets one >2s, at which point it sysctls to stop ktr and aborts.

Now it prints the timestamp of all "too long" requests.  But I also
dropped the threshold for "too long" from 2s to 100ms, since with
everything on RAM disk, there's no longer any reason to expect a
request to take more than 1-2ms in the worst case.

The results are pretty profound:

1252729876: request 82 131ms
1252729883: request 210 388ms
1252729890: request 338 380ms
1252729897: request 466 388ms
1252729904: request 594 404ms
1252729919: request 849 810ms
1252729926: request 977 386ms
1252729933: request 1105 370ms
1252729940: request 1233 366ms
1252729947: request 1361 400ms
1252729961: request 1617 746ms
1252729968: request 1744 477ms
1252729975: request 1872 388ms
1252729982: request 2000 380ms
1252729989: request 2128 384ms
1252729996: request 2256 395ms

It goes on and on like this, I get a 380-400ms stall every seven
seconds.  I have had a few come back higher, in the 750-850ms range,
usually after missing a beat:

1252729897: request 466 388ms
1252729904: request 594 404ms
1252729919: request 849 810ms
1252729926: request 977 386ms

1252730010: request 2512 416ms
1252730017: request 2640 390ms
1252730031: request 2896 774ms
1252730038: request 3023 431ms

1252730454: request 10568 378ms
1252730461: request 10696 397ms
1252730475: request 10952 733ms
1252730482: request 11080 366ms

So far, nothing over 1s.

So what happens every seven seconds??

From owner-freebsd-hackers@FreeBSD.ORG  Sat Sep 12 05:47:15 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4F52C106566C
	for <freebsd-hackers@freebsd.org>; Sat, 12 Sep 2009 05:47:15 +0000 (UTC)
	(envelope-from julian@elischer.org)
Received: from outW.internet-mail-service.net (outw.internet-mail-service.net
	[216.240.47.246])
	by mx1.freebsd.org (Postfix) with ESMTP id E79738FC16
	for <freebsd-hackers@freebsd.org>; Sat, 12 Sep 2009 05:47:14 +0000 (UTC)
Received: from idiom.com (mx0.idiom.com [216.240.32.160])
	by out.internet-mail-service.net (Postfix) with ESMTP id C3B71D1DCD;
	Fri, 11 Sep 2009 22:47:14 -0700 (PDT)
X-Client-Authorized: MaGic Cook1e
X-Client-Authorized: MaGic Cook1e
X-Client-Authorized: MaGic Cook1e
Received: from julian-mac.elischer.org (home.elischer.org [216.240.48.38])
	by idiom.com (Postfix) with ESMTP id 1C0E12D6018;
	Fri, 11 Sep 2009 22:47:14 -0700 (PDT)
Message-ID: <4AAB35E0.3000908@elischer.org>
Date: Fri, 11 Sep 2009 22:47:12 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Thunderbird 2.0.0.23 (Macintosh/20090812)
MIME-Version: 1.0
To: Linda Messerschmidt <linda.messerschmidt@gmail.com>
References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com>	
	<200909111102.14503.jhb@freebsd.org>	
	<237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com>	
	<200909111506.47309.jhb@freebsd.org>	
	<237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com>	
	<237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com>	
	<4AAB1E34.2060908@elischer.org>
	<237c27100909112147h64f71585p2a97f2b48a510985@mail.gmail.com>
In-Reply-To: <237c27100909112147h64f71585p2a97f2b48a510985@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org
Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Sep 2009 05:47:15 -0000

Linda Messerschmidt wrote:
> On Sat, Sep 12, 2009 at 12:06 AM, Julian Elischer <julian@elischer.org> wrote:
>> does the system have a serial console? how about a normal console /keyboard?
> 
> It has an IP KVM.
> 
>> how often deos it hang? and for  how long?
> 
> Well, this is interesting.  I got really frustrated with the other
> approach, so I thought I'd thin a machine down absolutely as far as I
> could, eliminate every possible source of delay, and see what happens.
>  I killed everything... cron, RPC, NFS, devd, gmon, nrpe, everything.
> The Apache and its exerciser are now the only things running on the
> machine, and the Apache is only touching an md0 swap device mounted on
> /mnt.  I *still* get the hangs.

ok now we need to describe the hang..  if you can predictably get a 
hang every 7 seconds does this mean that it doesn't respond to 
keyboard for a moment every 7 seconds? or that it doesn't accept 
packets every 7 seconds? if you lean on the A key, do you see echo 
stop every 7 seconds for a moment?

Or is it just the apache process that hangs?

Does the watching process that you refer to below also hang?
would it hang if it tried to access the disk?
if the watching process is on the same machine, does it only trigger 
AFTER teh request has taken a ling time or could it time out with a 
select DURING the delayed response? (another way of asking "how hung
is 'hung'?"


> 
> It hangs for all sorts of different periods, but the duration of the
> stall is approximately inversely proportional to the chance of seeing
> it.  To get a short delay, you need wait only a little bit.  If you
> want a 2-3 second delay, you may have to wait 15-20 minutes.
> 
> *However* in order to answer your question, I changed up the test
> program, which up til now has been cycling requests every 50 ms until
> it gets one >2s, at which point it sysctls to stop ktr and aborts.
> 
> Now it prints the timestamp of all "too long" requests.  But I also
> dropped the threshold for "too long" from 2s to 100ms, since with
> everything on RAM disk, there's no longer any reason to expect a
> request to take more than 1-2ms in the worst case.
> 
> The results are pretty profound:
> 
> 1252729876: request 82 131ms
> 1252729883: request 210 388ms
> 1252729890: request 338 380ms
> 1252729897: request 466 388ms
> 1252729904: request 594 404ms
> 1252729919: request 849 810ms
> 1252729926: request 977 386ms
> 1252729933: request 1105 370ms
> 1252729940: request 1233 366ms
> 1252729947: request 1361 400ms
> 1252729961: request 1617 746ms
> 1252729968: request 1744 477ms
> 1252729975: request 1872 388ms
> 1252729982: request 2000 380ms
> 1252729989: request 2128 384ms
> 1252729996: request 2256 395ms
> 
> It goes on and on like this, I get a 380-400ms stall every seven
> seconds.  I have had a few come back higher, in the 750-850ms range,
> usually after missing a beat:
> 
> 1252729897: request 466 388ms
> 1252729904: request 594 404ms
> 1252729919: request 849 810ms
> 1252729926: request 977 386ms
> 
> 1252730010: request 2512 416ms
> 1252730017: request 2640 390ms
> 1252730031: request 2896 774ms
> 1252730038: request 3023 431ms
> 
> 1252730454: request 10568 378ms
> 1252730461: request 10696 397ms
> 1252730475: request 10952 733ms
> 1252730482: request 11080 366ms
> 
> So far, nothing over 1s.
> 
> So what happens every seven seconds??


From owner-freebsd-hackers@FreeBSD.ORG  Sat Sep 12 06:52:52 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B033C1065676;
	Sat, 12 Sep 2009 06:52:52 +0000 (UTC)
	(envelope-from linda.messerschmidt@gmail.com)
Received: from mail-qy0-f200.google.com (mail-qy0-f200.google.com
	[209.85.221.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 50A588FC15;
	Sat, 12 Sep 2009 06:52:52 +0000 (UTC)
Received: by qyk38 with SMTP id 38so1442528qyk.27
	for <multiple recipients>; Fri, 11 Sep 2009 23:52:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:in-reply-to:references
	:date:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=t/N0lDkppVuny6+YPau70tbzO/7VQqp55IgXZcc6wIs=;
	b=gQWtXNqzFf/JV/izYVgfiPz3rI8ixeC7LMGI11POKgvyYtVKfzajHuOxnYmZWoCAfe
	I/Ec0rBIBsUv3AkUwqxS2kHMJmg0o5Jdyu8XF9U36U+P3OiKqv2ORr5PAwZ4lBBq9umo
	4hxcAnzA1AnFiZQzVdKnSaJ7SRVo4SM/eW6sU=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	b=bk24Dba7Zudem1d7T0d5vQ6hBdESwNmefkaITp0Ct2+NYwFF1cOYs/jSHR4nUJJnUn
	I/C08RAeKbzG6tRAVIsBE9Kyb+CrthFjY6gvuJFmB2lpktz7g6gCKGo8UHFDz5Jcrbx+
	Mu3RNpFxt+vkN70GIBnLURStyPgwI9SsV21jw=
MIME-Version: 1.0
Received: by 10.229.106.83 with SMTP id w19mr1573556qco.72.1252738371579; Fri, 
	11 Sep 2009 23:52:51 -0700 (PDT)
In-Reply-To: <4AAB35E0.3000908@elischer.org>
References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com>
	<200909111102.14503.jhb@freebsd.org>
	<237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com>
	<200909111506.47309.jhb@freebsd.org>
	<237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com>
	<237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com>
	<4AAB1E34.2060908@elischer.org>
	<237c27100909112147h64f71585p2a97f2b48a510985@mail.gmail.com>
	<4AAB35E0.3000908@elischer.org>
Date: Sat, 12 Sep 2009 02:52:51 -0400
Message-ID: <237c27100909112352k5504357dge725c8f905ee650a@mail.gmail.com>
From: Linda Messerschmidt <linda.messerschmidt@gmail.com>
To: Julian Elischer <julian@elischer.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-hackers@freebsd.org
Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Sep 2009 06:52:52 -0000

On Sat, Sep 12, 2009 at 1:47 AM, Julian Elischer <julian@elischer.org> wrot=
e:
> ok now we need to describe the hang.. =A0if you can predictably get a han=
g
> every 7 seconds does this mean that it doesn't respond to keyboard for a
> moment every 7 seconds?

It's possible.

> or that it doesn't accept packets every 7 seconds?

It appears that it accepts & responds to at least pings; I was able to
do an every-0.1-seconds ping through a bevy of 300-1900ms stalls with:

2323 packets transmitted, 2323 packets received, 0% packet loss
round-trip min/avg/max/stddev =3D 0.120/1.019/5.979/0.288 ms

As best as I could tell, schedgraph also showed that the clock
interrupt and the em0 interrupt always got serviced on time.  Pretty
much seems like its userspace that's getting put on hold.

> Or is it just the apache process that hangs?

This is where I started from.  In the original post (way long ago
now), I described how pretty much every process on the system went
into the kernel for something and stalled there, and then when the
stall ends, they all unblock at once.  I posted some examples via
ktrace that I sadly no longer have the source data for.

> Does the watching process that you refer to below also hang?

I don't think I can say for sure.  I observe visual stalls from time
to time in the output if I have it show every request where there is
no stall shown, which could either indicate that a stall occurred
outside the request or that my shoddy Internet connection has 100ms
latency and consistent 1% packet loss, which it does.

I did write a short C program that just select()s on stdin for 100ms
over and over and aborts if it takes more than 125ms to go through the
loop; it never aborts, even through 1s+ stalls and the loop times it
reports are consistently 110ms regardless of what else is going on,
which I don't think is unexpected.  However, I'm not sure why that
differs from the behavior of the "master" Apache processes, which
select() for 1 second all day long, but do appear to be affected.
Maybe because they are selecting a network socket instead of a tty?  I
don't know.

Also, if I disable NTP, the system does not appear to lose time during
the stalls, which fits with the consistent clock interrupts I saw.

> would it hang if it tried to access the disk?

By using the md device, I believe I have removed the disk from the
equation; neither process is accessing it.

Even without doing that, if I leave iostat -w 1 running alongside the
test, there's no correlation between the tiny amount of disk activity
there is and observed stalls.

> if the watching process is on the same machine, does it only trigger AFTE=
R
> teh request has taken a ling time or could it time out with a select DURI=
NG
> the delayed response? (another way of asking "how hung
> is 'hung'?"

It's just a PHP script using libcurl to request the file.  I only
moved it to the same machine in order to have it be able to write the
sysctl to stop the KTR traces I was doing.

If you're asking could the check script be modified to time out after,
say, 1 second, and if so, would it return during the hang or after it?
 I don't know.  My guess based on the earlier ktrace output is that it
would time out, but not return until the hang ended.  I'll see if I
the curl lib exposes a configurable timeout and try it.

From owner-freebsd-hackers@FreeBSD.ORG  Sat Sep 12 06:55:09 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 403E41065676;
	Sat, 12 Sep 2009 06:55:09 +0000 (UTC)
	(envelope-from linda.messerschmidt@gmail.com)
Received: from mail-qy0-f200.google.com (mail-qy0-f200.google.com
	[209.85.221.200])
	by mx1.freebsd.org (Postfix) with ESMTP id D7AC58FC0A;
	Sat, 12 Sep 2009 06:55:08 +0000 (UTC)
Received: by qyk38 with SMTP id 38so1442943qyk.27
	for <multiple recipients>; Fri, 11 Sep 2009 23:55:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:in-reply-to:references
	:date:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=2+ZgbRPsxYeTS0xEmoVSWlqq+uFDF/XB4YoF8preiS0=;
	b=r+W7DlJAgpX4QS+7SJT+3m/l1Hfz6kL9wxw5ulRnwBGhfubYP9O256v2jE0btLYv2t
	xBDR6GIb2B063eXLI1hcPaqItkqARGp+AImXJ9x0FRqQTwfGbyfIi4oTX9j08A0VIjTM
	rkpgod/ESvrXjuQHRUabswJbN26eUmm/0VxUQ=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	b=CMq6b8nfMjN9eOYHAc0ChgP6WesfQjQOOGUNkNl1VOYkbdgcrhNIp/zC2mcAPk/r3Q
	Knpeyh/v+Kzd9njgh0x2luf0bTH45GsqX0vVndxUnUy3BzJ1Nux0IQTWMS8qDwoSL5bf
	1iXAjNIYl1P0Cm8FuaKVkZ3nGoYBMwLYoEYmM=
MIME-Version: 1.0
Received: by 10.229.119.69 with SMTP id y5mr1480758qcq.100.1252738508390; Fri, 
	11 Sep 2009 23:55:08 -0700 (PDT)
In-Reply-To: <237c27100909112352k5504357dge725c8f905ee650a@mail.gmail.com>
References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com>
	<200909111102.14503.jhb@freebsd.org>
	<237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com>
	<200909111506.47309.jhb@freebsd.org>
	<237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com>
	<237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com>
	<4AAB1E34.2060908@elischer.org>
	<237c27100909112147h64f71585p2a97f2b48a510985@mail.gmail.com>
	<4AAB35E0.3000908@elischer.org>
	<237c27100909112352k5504357dge725c8f905ee650a@mail.gmail.com>
Date: Sat, 12 Sep 2009 02:55:08 -0400
Message-ID: <237c27100909112355xbf1354djfe0b562195546bca@mail.gmail.com>
From: Linda Messerschmidt <linda.messerschmidt@gmail.com>
To: Julian Elischer <julian@elischer.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-hackers@freebsd.org
Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Sep 2009 06:55:09 -0000

On Sat, Sep 12, 2009 at 2:52 AM, Linda Messerschmidt
<linda.messerschmidt@gmail.com> wrote:
> On Sat, Sep 12, 2009 at 1:47 AM, Julian Elischer <julian@elischer.org> wr=
ote:
>> ok now we need to describe the hang.. =A0if you can predictably get a ha=
ng
>> every 7 seconds does this mean that it doesn't respond to keyboard for a
>> moment every 7 seconds?
>
> It's possible.

Oops, I meant to explain that my ISP connection and personal sense of
time are probably not good enough to say one way or the other for
sure.  I do see stalls, but I can't say whether they are the same
stall or just a dropped packet somewhere along the way.

From owner-freebsd-hackers@FreeBSD.ORG  Sat Sep 12 07:52:23 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E81FA1065676;
	Sat, 12 Sep 2009 07:52:23 +0000 (UTC)
	(envelope-from linda.messerschmidt@gmail.com)
Received: from mail-qy0-f195.google.com (mail-qy0-f195.google.com
	[209.85.221.195])
	by mx1.freebsd.org (Postfix) with ESMTP id 88D448FC12;
	Sat, 12 Sep 2009 07:52:23 +0000 (UTC)
Received: by qyk33 with SMTP id 33so252696qyk.14
	for <multiple recipients>; Sat, 12 Sep 2009 00:52:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:in-reply-to:references
	:date:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=zMOGUudmOKhc8QWBvSLLYus+/xijt/SOWjTny66tLDY=;
	b=qnV0WdMjnL6DIUf1GAaNEdGuPnhXNX8sF/+K2q2U25pBs8TSWiUGlZ8ibjNSnEjb2L
	Rkyz+EbzNn4FOWdmCU4mh/M/Xyc50yvTnb80qgObAJT5DDqZrMYJEoqBjRI0QMTeijaT
	qvneSQRMPnoHxjW23gNwkvVfhZPJDqQBUZ+YI=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	b=sVWbpmSmsmdN9GKph/qP7/ZKJ7pl0UANg8yEuocBHDEsPuf3kBF4XgRFXY1OEbk7DR
	jeSXf7cSeFZo835MLK3qDvTs+SSt7gnHrA0+w3JVTAdzR63HAmkS/GRv8REwVE9hRYFP
	X6z+3EqWSz5nj6wsliVDgGqkDPTBQN9ePGYP8=
MIME-Version: 1.0
Received: by 10.229.9.147 with SMTP id l19mr1536685qcl.65.1252741942766; Sat, 
	12 Sep 2009 00:52:22 -0700 (PDT)
In-Reply-To: <237c27100909112352k5504357dge725c8f905ee650a@mail.gmail.com>
References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com>
	<200909111102.14503.jhb@freebsd.org>
	<237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com>
	<200909111506.47309.jhb@freebsd.org>
	<237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com>
	<237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com>
	<4AAB1E34.2060908@elischer.org>
	<237c27100909112147h64f71585p2a97f2b48a510985@mail.gmail.com>
	<4AAB35E0.3000908@elischer.org>
	<237c27100909112352k5504357dge725c8f905ee650a@mail.gmail.com>
Date: Sat, 12 Sep 2009 03:52:22 -0400
Message-ID: <237c27100909120052k1db7e029xcf36e075865d29d8@mail.gmail.com>
From: Linda Messerschmidt <linda.messerschmidt@gmail.com>
To: Julian Elischer <julian@elischer.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-hackers@freebsd.org
Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Sep 2009 07:52:24 -0000

OK, first, I figured out the seven second thing.  I actually had
already found that particular issue earlier in the troubleshooting
process, but forgot all about it when I pulled in a second machine to
test with.  It was simply a case of setting Apache's
MaxRequestsPerChild to a very low value (128) in combination with only
allowing 1 access at a time.  128 requests * (50ms sleep + 2ms request
+ overhead) ~=3D 7s.  So that was just noise masking the real problem,
which is less frequent and less predictable.  Sorry for the red
herring. :(

On Sat, Sep 12, 2009 at 2:52 AM, Linda Messerschmidt
<linda.messerschmidt@gmail.com> wrote:
> If you're asking could the check script be modified to time out after,
> say, 1 second, and if so, would it return during the hang or after it?
> =A0I don't know. =A0My guess based on the earlier ktrace output is that i=
t
> would time out, but not return until the hang ended. =A0I'll see if I
> the curl lib exposes a configurable timeout and try it.

This proved to be quite easy to do.  I ran the script twice, once with
the timeout and once without.

Without timeout:
1252741492: request 910 101ms
1252741567: request 2133 1429ms
1252741603: request 2722 146ms

With 1s timeout:
1252741492: request 1078 106ms
1252741567: request 2302 1010ms (<--- Timeout)
1252741567: request 2303 273ms   (<--- after 50ms sleep, goes back to
end of stall)
1252741603: request 2892 136ms

As you can see, the two scripts experience stalls in pretty much
lockstep, but the script itself does not appear affected, so it's just
on the Apache side.

From owner-freebsd-hackers@FreeBSD.ORG  Sat Sep 12 13:54:00 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EC0A3106568D
	for <hackers@freebsd.org>; Sat, 12 Sep 2009 13:54:00 +0000 (UTC)
	(envelope-from des@des.no)
Received: from tim.des.no (tim.des.no [194.63.250.121])
	by mx1.freebsd.org (Postfix) with ESMTP id B277B8FC0C
	for <hackers@freebsd.org>; Sat, 12 Sep 2009 13:54:00 +0000 (UTC)
Received: from ds4.des.no (des.no [84.49.246.2])
	by smtp.des.no (Postfix) with ESMTP id BFC2E6D418
	for <hackers@freebsd.org>; Sat, 12 Sep 2009 13:53:59 +0000 (UTC)
Received: by ds4.des.no (Postfix, from userid 1001)
	id 96406844A5; Sat, 12 Sep 2009 15:53:59 +0200 (CEST)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: hackers@freebsd.org
Date: Sat, 12 Sep 2009 15:53:59 +0200
Message-ID: <86fxasl154.fsf@ds4.des.no>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.92 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: 
Subject: DDB capture buffer
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Sep 2009 13:54:01 -0000

The default maximum size of the DDB capture buffer is 48 kB.  This is
ridiculously low; it's not even nearly enough to capture the output from
the first example in textdump(4):

    script kdb.enter.panic=3Dtextdump set; capture on; show allpcpu; bt;
         ps; alltrace; show alllocks; call doadump; reset

Would anyone object to increasing it to 1 MB?  DDB is opt-in, so it will
only affect people who compile it into their kernel (or -CURRENT users
who don't compile it out; they have it coming).

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no