Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Jun 2008 14:51:52 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        James Gritton <jamie@gritton.org>
Cc:        freebsd-hackers@freebsd.org, freebsd-stable@freebsd.org
Subject:   Re: FreeBSD 6.3 deadlock (vm_map?) with DDB output
Message-ID:  <200806231451.52340.jhb@freebsd.org>
In-Reply-To: <485A81FF.1000000@gritton.org>
References:  <20080615112318.146C1F18512@mx.npubs.com> <200806180917.05689.jhb@freebsd.org> <485A81FF.1000000@gritton.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 19 June 2008 11:57:51 am James Gritton wrote:
> John Baldwin wrote:
> > On Sunday 15 June 2008 07:23:19 am Stef Walter wrote:
> >   
> >> I've been trying to track down a deadlock on some newish production
> >> servers running FreeBSD 6.3-RELEASE-p2. The deadlock occurs on a
> >> specific (although mundane) hardware configuration, and each of several
> >> servers running this hardware deadlock about once per week.
> >>
> >> Although I suspect that this is not hardware related, from a (naive)
> >> perusal of the attached stack traces.
> >>
> >> Forgive me if my interpretation of this is all wrong, but I'm pretty
> >> desperate for help. So here's my basic understanding of the deadlock:
> >>
> >> These processes seem to be waiting on the page queue mutex:
> >>  sendmail (in vm_mmap > vm_map_find > vm_map_insert > vm_map_pmap_enter)
> >>  bsnmpd (in malloc, uma_large_malloc > page_alloc > kmem_malloc)
> >>  httpd (in trap > trap_pfault > vm_fault)
> >>  [g_up] (in g_vfs_done > bufdone)
> >>
> >> The page queue mutex is held by rsync process:
> >>  rsync (in trap > trap_pfault > vm_fault > pmap_enter)
> >>
> >> Rsync kernel process (in pmap_enter) was interrupted while holding the
> >> page queue lock?
> >>
> >>
> >> Giant is enabled in loader.conf due to the needs of the pf firewall when
> >> dealing with user credentials lookups. I do not believe that Giant plays
> >> into this deadlock. Kernel config attached.
> >>
> >> Any and all help or info is welcome. Thanks in advance.
> >>     
> >
> > Try this change:
> >
> > jhb         2007-10-27 22:07:40 UTC
> >
> >   FreeBSD src repository
> >
> >   Modified files:
> >     sys/kern             sched_4bsd.c
> >   Log:
> >   Change the roundrobin implementation in the 4BSD scheduler to trigger a
> >   userland preemption directly from hardclock() via sched_clock() when a
> >   thread uses up a full quantum instead of using a periodic timeout to 
cause
> >   a userland preemption every so often.  This fixes a potential deadlock
> >   when IPI_PREEMPTION isn't enabled where softclock blocks on a lock held
> >   by a thread pinned or bound to another CPU.  The current thread on that
> >   CPU will never be preempted while softclock is blocked.
> >
> >   Note that ULE already drives its round-robin userland preemption from
> >   sched_clock() as well and always enables IPI_PREEMPT.
> >
> >   MFC after:      1 week
> >
> >   Revision  Changes    Path
> >   1.108     +8 -29     src/sys/kern/sched_4bsd.c
> >
> > We use it at work on 6.x.  W/o this fix, round-robin stops working on 4BSD 
> > when softclock() (swi4: clock) blocks on a lock like Giant.
> >   
> 
> I've been seeing similar troubles on 6.2 and I'll have to give this a 
> try as we upgrade to 6.3.  I notice "MFC after: 1 week" in the log; it's 
> been a week - any chance of seeing this fix rolled into 6.x?

If people confirm it fixes issues I will MFC it.  There was some pushback when 
I first committed it so I waited on the MFC.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200806231451.52340.jhb>