Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 22 Jun 2005 15:03:53 +0200
From:      Andre Oppermann <andre@freebsd.org>
To:        Gleb Smirnoff <glebius@FreeBSD.org>
Cc:        qingli@FreeBSD.org, sam@FreeBSD.org, Jeremie Le Hen <jeremie@le-hen.org>, freebsd-stable@FreeBSD.org
Subject:   Re: panic in RELENG_5 UMA
Message-ID:  <42B961B9.7A5856B3@freebsd.org>
References:  <20050621070427.GA738@obiwan.tataz.chchile.org> <20050621090701.GB34406@cell.sick.ru> <20050621105154.GA36538@cell.sick.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
Gleb Smirnoff wrote:
> 
> [ cc'ing parties involved in this part of code]
> 
> On Tue, Jun 21, 2005 at 01:07:01PM +0400, Gleb Smirnoff wrote:
> T> On Tue, Jun 21, 2005 at 09:04:27AM +0200, Jeremie Le Hen wrote:
> T> J> #25 0xc05a0a0b in m_freem (mb=0x0) at uma.h:304
> T> J> No locals.
> T> J> #26 0xc05ee0d5 in arpresolve (ifp=0xc1a5b000, rt0=0xc1d44000, m=0xc1be7200,
> T> J>     dst=0xd6d3fa94, desten=0xd6d3fa2c "/??]??????w??")
> T> J>     at ../../../netinet/if_ether.c:442
> T> J>         la = (struct llinfo_arp *) 0xc1a75a00
> T> J>         sdl = (struct sockaddr_dl *) 0xc2128910
> T> J>         error = -1038972656
> T> J>         rt = (struct rtentry *) 0xc1d44000
> T>
> T> IMHO, this looks like a race. The route is not locked, when
> T> its llinfo is edited.
> T>
> T> Probably the mbuf was freed when arp reply arrived and la_hold was send.
> T> Look into in_arpinput() near 736:
> T>
> T>                         (*ifp->if_output)(ifp, la->la_hold, rt_key(rt), rt);
> T>                         la->la_hold = 0;
> T>
> T> Yeah, I have just triggered another panic running 15 instances of this script on
> T> SMP box:
> T>
> T> (
> T> while (true); do
> T>      arp -d 81.19.64.111  >/dev/null 2>&1;
> T>      ping -c 1 -t 1 81.19.64.111 >/dev/null 2>&1;
> T> done
> T> ) &
> T>
> T> But my duplicate free is in fxp_txeof(). This means that output thread has
> T> won the race.
> 
> I suppose that the attached patch closes your race. However, there is still
> race between RTM_DELETE and output path. The above script still drops kernel
> to panic, but the other one. Output path works with already freed llinfo:
> 
> #28 0xc0507000 in m_freem (mb=0x0) at mbuf.h:410
> #29 0xc053fde3 in arpresolve (ifp=0xc2012800, rt0=0xc22fcdec, m=0xc25a8000, dst=0xe720bb28,
>     desten=0xe720bacc "uøbÀ+\001") at /usr/src/sys/netinet/if_ether.c:443
> #30 0xc0538078 in ether_output (ifp=0xc2012800, m=0xc25a8000, dst=0xe720bb28, rt0=0xc22fcdec)
>     at /usr/src/sys/net/if_ethersubr.c:173
> #31 0xc054b5b4 in ip_output (m=0xc25a8000, opt=0xc25a80ac, ro=0xe720bb24, flags=0x20, imo=0x0, inp=0xc25eb5a0)
>     at /usr/src/sys/netinet/ip_output.c:772
> #32 0xc054d36b in rip_output (m=0xc25a8000, so=0x0, dst=0x0) at /usr/src/sys/netinet/raw_ip.c:320
> #33 0xc054de7b in rip_send (so=0xc248c914, flags=0x0, m=0xc25a8000, nam=0xc218d410, control=0x0, td=0xc224d7d0)
>     at /usr/src/sys/netinet/raw_ip.c:785
> #34 0xc050a30f in sosend (so=0xc248c914, addr=0xc218d410, uio=0xe720bc3c, top=0xc25a8000, control=0x0, flags=0x0,
>     td=0xc224d7d0) at /usr/src/sys/kern/uipc_socket.c:827
> 
> (kgdb) frame 29
> #29 0xc053fde3 in arpresolve (ifp=0xc2012800, rt0=0xc22fcdec, m=0xc25a8000, dst=0xe720bb28,
>     desten=0xe720bacc "uøbÀ+\001") at /usr/src/sys/netinet/if_ether.c:443
> 443                     m_freem(la->la_hold);
> (kgdb) p *la
> $3 = {
>   la_le = {
>     le_next = 0xdeadc0de,
>     le_prev = 0xdeadc0de
>   },
>   la_rt = 0xdeadc0de,
>   la_hold = 0xdeadc0de,
>   la_preempt = 0xc0de,
>   la_asked = 0xdead
> }
> 
> Fixing this one is harder. We take la from unlocked rtentry obtained via
> rt_check(), or from arplookup(). The latter drops lock on rtentry, too.
> Then we do some work and use this la. It may have already been freed in
> arp_rtrequest(), the RTM_DELETE case.
> 
> I see two approaches here:
> 
> 1) Protecting llinfo with route lock. In this case we need rt_check()
> to return locked *rt (just reference won't help). We also need
> arplookup() to return locked rt. And do not unlock it withing all
> arpresolve() and a big part of in_arpinput() functions.

I think for 5-stable this is the way to go.

> 2) Add mutex to llinfo_arp. I'm afraid this will hurt performance.

The new ARP stuff should fix these issues, however it is not ready yet.
At the moment it looks like it wont make it right away into 6.0 but go
into 7-current and then MFC'd back for 6.1R.

-- 
Andre



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?42B961B9.7A5856B3>