From owner-freebsd-current@FreeBSD.ORG  Tue Jul 21 10:59:44 2009
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 42C9F106566B;
	Tue, 21 Jul 2009 10:59:44 +0000 (UTC)
	(envelope-from spambox@haruhiism.net)
Received: from fujibayashi.jp (karas.fujibayashi.jp [77.221.159.4])
	by mx1.freebsd.org (Postfix) with ESMTP id F15DF8FC13;
	Tue, 21 Jul 2009 10:59:43 +0000 (UTC)
	(envelope-from spambox@haruhiism.net)
Received: from [192.168.0.10] (datacenter.telecombusinessconsulting.net
	[77.221.137.211])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by fujibayashi.jp (Postfix) with ESMTPSA id 8F62678F53;
	Tue, 21 Jul 2009 14:59:41 +0400 (MSD)
Message-ID: <4A659F98.2060007@haruhiism.net>
Date: Tue, 21 Jul 2009 14:59:36 +0400
From: Kamigishi Rei <spambox@haruhiism.net>
User-Agent: Thunderbird 2.0.0.22 (Windows/20090605)
MIME-Version: 1.0
To: freebsd-current@FreeBSD.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Lawrence Stewart <lstewart@freebsd.org>
Subject: [follow-up] Fatal trap 12 in r195146+ in netisr_queue_internal
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Jul 2009 10:59:44 -0000

Hello, hope you're having a good day,

I've been researching the issue I mentioned in my last message in 
"r194546 amd64: kernel panic in tcp_sack.c" thread since July 07 and 
here are some of the findings:
The fatal trap triggers inside mtx_lock_sleep() during a dereference of 
a pointer (owner, points to struct thread @ m->mtx_lock & 
~MTX_FLAGMASK). The code goes like this (shortened):

v = m->mtx_lock;
if (v == MTX_UNOWNED) { turnstile_cancel(ts); continue; }
owner = (struct thread *)(v & ~MTX_FLAGMASK);
if (TD_IS_RUNNING(owner)) { turnstile_cancel(ts); continue; }

Everything goes fine until - under heavy load on an interface, usually - 
we reach a point where:

1. m->mtx_lock is 4 (== MTX_UNOWNED).
2. v is assigned mtx_lock's value (4 == MTX_UNOWNED).
3. condition (v == MTX_UNOWNED) fails.
4. owner is assigned an address from v.
5. dereference fails as v has a bogus value which is not inside kernel 
address space.

The only affected variable is v; I've added temporary variables around 
it (i.e. uint64ptr_t foo1, v, foo2;)
 and those variables are not altered - even though v has moved 64bits 
further inside the stack.

The variable is not only altered at that point; by adding debugging 
lines along the code I've seen multiple cases of v and mtx_lock being 
changed during the execution of mtx_lock_sleep(). Moreover, my own test 
variables were changing inside it.
I had the following structure for tests:
1. At the start of the function, foo1 = 0.
2. Before lock_profile_obtain_lock_failed, foo1 = 1.
3. After lock_profile_obtain_lock_failed, foo1 = 2.
4. Before (v == MTX_UNOWNED) conditional, foo1 = 3.
During tests, foo1 changed values inside this range (0..3) several 
times; during heavy lo0/em0 local traffic load, these conditionals 
failed multiple (up to 100) times in 2-5 seconds.

v gets changed like that as well, but in 99.99% cases it gets assigned a 
value that references kernel memory area so the dereference works.

Is this behaviour (variables changing their value inside a single 
function call) correct?

--
Kamigishi Rei
KREI-RIPE