From owner-freebsd-stable@FreeBSD.ORG Sat Oct 28 19:15:25 2006 Return-Path: X-Original-To: stable@freebsd.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5CFE716A412; Sat, 28 Oct 2006 19:15:25 +0000 (UTC) (envelope-from csjp@freebsd.org) Received: from ems01.seccuris.com (ems01.seccuris.com [204.112.0.35]) by mx1.FreeBSD.org (Postfix) with ESMTP id 66CB743D6B; Sat, 28 Oct 2006 19:15:24 +0000 (GMT) (envelope-from csjp@freebsd.org) Received: from [127.0.0.1] (stf01.seccuris.com [204.112.0.40]) by ems01.seccuris.com (Postfix) with ESMTP id 39849462C88; Sat, 28 Oct 2006 15:11:17 -0500 (CDT) Message-ID: <4543AC4D.3090308@freebsd.org> Date: Sat, 28 Oct 2006 14:15:25 -0500 From: "Christian S.J. Peron" User-Agent: Thunderbird 1.5.0.7 (Macintosh/20060909) MIME-Version: 1.0 To: stable@freebsd.org References: <7ad7ddd90610262337q25afcf0ej7610d0e1b4ff202d@mail.gmail.com> <20061028175240.GB1519@roadrunner.q.local> <4543AA79.4050903@freebsd.org> In-Reply-To: <4543AA79.4050903@freebsd.org> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Cc: kris@freebsd.org, Pawel Jakub Dawidek , uspoerlein@gmail.com Subject: Re: RELENG_6: I/O deadlock under load X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Oct 2006 19:15:25 -0000 Sorry, I forgot to include the chunk of code from the gmirror worker thread which made me suspect this could be the problem: [..] /* Get first request from the queue. */ mtx_lock(&sc->sc_queue_mtx); bp = bioq_first(&sc->sc_queue); if (bp == NULL) { if ((sc->sc_flags & G_MIRROR_DEVICE_FLAG_DESTROY) != 0) { mtx_unlock(&sc->sc_queue_mtx); if (g_mirror_try_destroy(sc)) { curthread->td_pflags &= ~TDP_GEOM; G_MIRROR_DEBUG(1, "Thread exiting."); kthread_exit(0); } mtx_lock(&sc->sc_queue_mtx); } sx_xunlock(&sc->sc_lock); /* * XXX: We can miss an event here, because an event * can be added without sx-device-lock and without * mtx-queue-lock. Maybe I should just stop using * dedicated mutex for events synchronization and * stick with the queue lock? * The event will hang here until next I/O request * or next event is received. */ MSLEEP(sc, &sc->sc_queue_mtx, PRIBIO | PDROP, "m:w1", timeout * hz); sx_xlock(&sc->sc_lock); G_MIRROR_DEBUG(5, "%s: I'm here 4.", __func__); continue; } bioq_remove(&sc->sc_queue, bp); mtx_unlock(&sc->sc_queue_mtx); Christian S.J. Peron wrote: > > It almost looks as if a user frequently runs gmirror(8) to query the > status of their array. Under a high load situation, the worker is > busy, so at one un-lucky momment, gmirror(8) is run: > > (1) gmirror(8) waits for sc->sc_lock owned by the worker > (2) The worker then drops the lock > (3) gmirror(8) proceeds > (4) Worker wakes up and waits for sc->sc_lock > (5) Only gmirror never will because it's waiting on a resource > (presumably owned by the worker thread)? > > I am not certain this is correct, so I have included pjd in the CC > loop, hoping he can help shed some light on the subject :) > > >