From owner-freebsd-current@FreeBSD.ORG Tue Jul 21 10:59:44 2009 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 42C9F106566B; Tue, 21 Jul 2009 10:59:44 +0000 (UTC) (envelope-from spambox@haruhiism.net) Received: from fujibayashi.jp (karas.fujibayashi.jp [77.221.159.4]) by mx1.freebsd.org (Postfix) with ESMTP id F15DF8FC13; Tue, 21 Jul 2009 10:59:43 +0000 (UTC) (envelope-from spambox@haruhiism.net) Received: from [192.168.0.10] (datacenter.telecombusinessconsulting.net [77.221.137.211]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by fujibayashi.jp (Postfix) with ESMTPSA id 8F62678F53; Tue, 21 Jul 2009 14:59:41 +0400 (MSD) Message-ID: <4A659F98.2060007@haruhiism.net> Date: Tue, 21 Jul 2009 14:59:36 +0400 From: Kamigishi Rei User-Agent: Thunderbird 2.0.0.22 (Windows/20090605) MIME-Version: 1.0 To: freebsd-current@FreeBSD.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Lawrence Stewart Subject: [follow-up] Fatal trap 12 in r195146+ in netisr_queue_internal X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jul 2009 10:59:44 -0000 Hello, hope you're having a good day, I've been researching the issue I mentioned in my last message in "r194546 amd64: kernel panic in tcp_sack.c" thread since July 07 and here are some of the findings: The fatal trap triggers inside mtx_lock_sleep() during a dereference of a pointer (owner, points to struct thread @ m->mtx_lock & ~MTX_FLAGMASK). The code goes like this (shortened): v = m->mtx_lock; if (v == MTX_UNOWNED) { turnstile_cancel(ts); continue; } owner = (struct thread *)(v & ~MTX_FLAGMASK); if (TD_IS_RUNNING(owner)) { turnstile_cancel(ts); continue; } Everything goes fine until - under heavy load on an interface, usually - we reach a point where: 1. m->mtx_lock is 4 (== MTX_UNOWNED). 2. v is assigned mtx_lock's value (4 == MTX_UNOWNED). 3. condition (v == MTX_UNOWNED) fails. 4. owner is assigned an address from v. 5. dereference fails as v has a bogus value which is not inside kernel address space. The only affected variable is v; I've added temporary variables around it (i.e. uint64ptr_t foo1, v, foo2;) and those variables are not altered - even though v has moved 64bits further inside the stack. The variable is not only altered at that point; by adding debugging lines along the code I've seen multiple cases of v and mtx_lock being changed during the execution of mtx_lock_sleep(). Moreover, my own test variables were changing inside it. I had the following structure for tests: 1. At the start of the function, foo1 = 0. 2. Before lock_profile_obtain_lock_failed, foo1 = 1. 3. After lock_profile_obtain_lock_failed, foo1 = 2. 4. Before (v == MTX_UNOWNED) conditional, foo1 = 3. During tests, foo1 changed values inside this range (0..3) several times; during heavy lo0/em0 local traffic load, these conditionals failed multiple (up to 100) times in 2-5 seconds. v gets changed like that as well, but in 99.99% cases it gets assigned a value that references kernel memory area so the dereference works. Is this behaviour (variables changing their value inside a single function call) correct? -- Kamigishi Rei KREI-RIPE