Date: Sat, 28 Oct 2006 14:15:25 -0500 From: "Christian S.J. Peron" <csjp@freebsd.org> To: stable@freebsd.org Cc: kris@freebsd.org, Pawel Jakub Dawidek <pjd@FreeBSD.org>, uspoerlein@gmail.com Subject: Re: RELENG_6: I/O deadlock under load Message-ID: <4543AC4D.3090308@freebsd.org> In-Reply-To: <4543AA79.4050903@freebsd.org> References: <7ad7ddd90610262337q25afcf0ej7610d0e1b4ff202d@mail.gmail.com> <20061028175240.GB1519@roadrunner.q.local> <4543AA79.4050903@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Sorry, I forgot to include the chunk of code from the gmirror worker thread which made me suspect this could be the problem: [..] /* Get first request from the queue. */ mtx_lock(&sc->sc_queue_mtx); bp = bioq_first(&sc->sc_queue); if (bp == NULL) { if ((sc->sc_flags & G_MIRROR_DEVICE_FLAG_DESTROY) != 0) { mtx_unlock(&sc->sc_queue_mtx); if (g_mirror_try_destroy(sc)) { curthread->td_pflags &= ~TDP_GEOM; G_MIRROR_DEBUG(1, "Thread exiting."); kthread_exit(0); } mtx_lock(&sc->sc_queue_mtx); } sx_xunlock(&sc->sc_lock); /* * XXX: We can miss an event here, because an event * can be added without sx-device-lock and without * mtx-queue-lock. Maybe I should just stop using * dedicated mutex for events synchronization and * stick with the queue lock? * The event will hang here until next I/O request * or next event is received. */ MSLEEP(sc, &sc->sc_queue_mtx, PRIBIO | PDROP, "m:w1", timeout * hz); sx_xlock(&sc->sc_lock); G_MIRROR_DEBUG(5, "%s: I'm here 4.", __func__); continue; } bioq_remove(&sc->sc_queue, bp); mtx_unlock(&sc->sc_queue_mtx); Christian S.J. Peron wrote: > > It almost looks as if a user frequently runs gmirror(8) to query the > status of their array. Under a high load situation, the worker is > busy, so at one un-lucky momment, gmirror(8) is run: > > (1) gmirror(8) waits for sc->sc_lock owned by the worker > (2) The worker then drops the lock > (3) gmirror(8) proceeds > (4) Worker wakes up and waits for sc->sc_lock > (5) Only gmirror never will because it's waiting on a resource > (presumably owned by the worker thread)? > > I am not certain this is correct, so I have included pjd in the CC > loop, hoping he can help shed some light on the subject :) > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4543AC4D.3090308>