Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 15 Feb 2016 20:49:56 +0300
From:      Slawa Olhovchenkov <slw@zxy.spb.ru>
To:        Giuseppe Lettieri <g.lettieri@iet.unipi.it>
Cc:        Luigi Rizzo <rizzo@iet.unipi.it>, Adrian Chadd <adrian.chadd@gmail.com>, "stable@freebsd.org" <stable@freebsd.org>
Subject:   Re: 82576 + NETMAP + VLAN
Message-ID:  <20160215174956.GD68298@zxy.spb.ru>
In-Reply-To: <56C1F69C.5010004@iet.unipi.it>
References:  <CA%2BhQ2%2BiD3X9wR8exw2p-9G8pPNHCQtLdMdJJXU78PDrQaWBH7w@mail.gmail.com> <56B9E398.1060105@iet.unipi.it> <20160210115937.GA37895@zxy.spb.ru> <56BB3C20.600@iet.unipi.it> <20160210135318.GL68298@zxy.spb.ru> <56BC505F.7080309@iet.unipi.it> <20160211133428.GM68298@zxy.spb.ru> <56C1EA66.807@iet.unipi.it> <20160215151318.GQ68298@zxy.spb.ru> <56C1F69C.5010004@iet.unipi.it>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Feb 15, 2016 at 05:02:36PM +0100, Giuseppe Lettieri wrote:

> Il 15/02/2016 16:13, Slawa Olhovchenkov ha scritto:
> > On Mon, Feb 15, 2016 at 04:10:30PM +0100, Giuseppe Lettieri wrote:
> >
> >> Hi Slawa,
> >>
> >> I think WITNESS is seeing a false positive, since those two are always
> >> different mutexes.
> >>
> >> The actual deadlock you experience should be caused by something else. I
> >
> > Are you sure? When deadlock occur I am see threads waiting on nm_kn_lock.
> 
> The deadlock I mentioned still involves nm_kn_locks, sorry if I was not 
> clear about that. I am just saying that we never try to take the same 
> lock that we already holding.
> 
> Nonetheless, there are indeed problems in the path that WITNESS has 
> seen. The problem is that pipes have to notify the other end while 
> called by kevent. kevent holds the nm_kn_lock on the TX src ring and the 
> notification takes the nm_kn_lock on the RX dst ring.

Thanks for clarification.

> >
> >> have not been able to reproduce it locally (I have not tried that hard,
> >> to be honest). I am pretty sure that there is a lock inversion - one
> >> that may cause real deadlocks - when you use netmap pipes+kqueue and you
> >> don't pass NETMAP_NO_TX_POLL at NIOCREGIF time. The attached patch
> >> should solve this particular problem, but there may be others. May you
> >> please try it?
> >
> > Try it with or w/o WITNESS?
> 
> I am trying to see if the actual deadlock disappears, so disable WITNESS 
> if it slows down the system and masks the real deadlock. Otherwise, 
> leave it on.

OK. With and w/o WITNESS I am currently don't see deadlock.

Just for record, two LOR, may be already well-known:

lock order reversal:
 1st 0xfffffe0172c6fa78 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:3130
 2nd 0xfffff8005ca81000 dirhash (dirhash) @ /usr/src/sys/ufs/ufs/ufs_dirhash.c:280
KDB: stack backtrace:
#0 0xffffffff809702b0 at kdb_backtrace+0x60
#1 0xffffffff8098825e at witness_checkorder+0xc7e
#2 0xffffffff8093e137 at _sx_xlock+0x47
#3 0xffffffff80b75d6a at ufsdirhash_add+0x3a
#4 0xffffffff80b78b40 at ufs_direnter+0x6a0
#5 0xffffffff80b815ab at ufs_makeinode+0x56b
#6 0xffffffff80b7d5dd at ufs_create+0x2d
#7 0xffffffff80e33311 at VOP_CREATE_APV+0xa1
#8 0xffffffff809e2009 at vn_open_cred+0x3b9
#9 0xffffffff809db30f at kern_openat+0x26f
#10 0xffffffff80d0e8a4 at amd64_syscall+0x2d4
#11 0xffffffff80cf4f5b at Xfast_syscall+0xfb
lock order reversal:
 1st 0xfffff80049138d50 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2415
 2nd 0xfffffe0172cb1b80 bufwait (bufwait) @ /usr/src/sys/ufs/ffs/ffs_vnops.c:262
 3rd 0xfffff800a6832d50 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2415
KDB: stack backtrace:
#0 0xffffffff809702b0 at kdb_backtrace+0x60
#1 0xffffffff8098825e at witness_checkorder+0xc7e
#2 0xffffffff80918dd8 at __lockmgr_args+0x738
#3 0xffffffff80b71594 at ffs_lock+0x84
#4 0xffffffff80e3512b at VOP_LOCK1_APV+0xab
#5 0xffffffff809e28f3 at _vn_lock+0x43
#6 0xffffffff809d42fb at vget+0x5b
#7 0xffffffff809c8c51 at vfs_hash_get+0xe1
#8 0xffffffff80b6d0a0 at ffs_vgetf+0x40
#9 0xffffffff80b64c50 at softdep_sync_buf+0x300
#10 0xffffffff80b72296 at ffs_syncvnode+0x226
#11 0xffffffff80b4b6b3 at ffs_truncate+0x683
#12 0xffffffff80b78c99 at ufs_direnter+0x7f9
#13 0xffffffff80b808eb at ufs_mkdir+0x86b
#14 0xffffffff80e34987 at VOP_MKDIR_APV+0xa7
#15 0xffffffff809dfca9 at kern_mkdirat+0x209
#16 0xffffffff80d0e8a4 at amd64_syscall+0x2d4
#17 0xffffffff80cf4f5b at Xfast_syscall+0xfb


> >
> >> Cheers,
> >> Giuseppe
> >>
> >> Il 11/02/2016 14:34, Slawa Olhovchenkov ha scritto:
> >>> On Thu, Feb 11, 2016 at 10:11:59AM +0100, Giuseppe Lettieri wrote:
> >>>
> >>>> Il 10/02/2016 14:53, Slawa Olhovchenkov ha scritto:
> >>>>> On Wed, Feb 10, 2016 at 02:33:20PM +0100, Giuseppe Lettieri wrote:
> >>>>>
> >>>>>> Il 10/02/2016 12:59, Slawa Olhovchenkov ha scritto:
> >>>>>>> Can you look also on second issue?
> >>>>>>>
> >>>>>>> PS: What need from me? May be open PR?
> >>>>>>
> >>>>>> May you provide some example code that triggers the issue?
> >>>>>
> >>>>> This is about 700 lines of code (not very clear), may be I can describe it?
> >>>>
> >>>> I just need some code to trigger the problem locally. Don't worry about
> >>>> the clarity and the line count, unless you cannot share the code for
> >>>> other reasons.
> >>>
> >>> I am attach source.
> >>> run as "prog if1 if2"
> >>> Got `acquiring duplicate lock of same type: "nm_kn_lock"` immediatly
> >>> after start.
> >>> Dead locking may be occur immediatly after start or may be need
> >>> traffic flooding.
> >>>
> >>
> >>
> >> --
> >> Dr. Ing. Giuseppe Lettieri
> >> Dipartimento di Ingegneria della Informazione
> >> Universita' di Pisa
> >> Largo Lucio Lazzarino 1, 56122 Pisa - Italy
> >> Ph. : (+39) 050-2217.649 (direct) .599 (switch)
> >> Fax : (+39) 050-2217.600
> >> e-mail: g.lettieri@iet.unipi.it
> >
> >> Index: dev/netmap/netmap.c
> >> ===================================================================
> >> --- dev/netmap/netmap.c	(revision 287671)
> >> +++ dev/netmap/netmap.c	(working copy)
> >> @@ -2378,7 +2378,7 @@
> >>   	 * XXX should also check cur != hwcur on the tx rings.
> >>   	 * Fortunately, normal tx mode has np_txpoll set.
> >>   	 */
> >> -	if (priv->np_txpoll || want_tx) {
> >> +	if ((priv->np_txpoll && !is_kevent) || want_tx) {
> >>   		/*
> >>   		 * The first round checks if anyone is ready, if not
> >>   		 * do a selrecord and another round to handle races.
> >
> 
> 
> -- 
> Dr. Ing. Giuseppe Lettieri
> Dipartimento di Ingegneria della Informazione
> Universita' di Pisa
> Largo Lucio Lazzarino 1, 56122 Pisa - Italy
> Ph. : (+39) 050-2217.649 (direct) .599 (switch)
> Fax : (+39) 050-2217.600
> e-mail: g.lettieri@iet.unipi.it



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160215174956.GD68298>