Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 7 Nov 2011 23:25:34 +0000 (UTC)
From:      "Bjoern A. Zeeb" <bz@FreeBSD.ORG>
To:        Maxim Sobolev <sobomax@sippysoft.com>
Cc:        freebsd-net@freebsd.org, Robert Watson <rwatson@FreeBSD.ORG>, Jack Vogel <jfvogel@gmail.com>
Subject:   Re: Panic in the udp_input() under heavy load
Message-ID:  <alpine.BSF.2.00.1111072324340.4603@ai.fobar.qr>
In-Reply-To: <4EB86866.9060102@sippysoft.com>
References:  <4EB804D2.2090101@FreeBSD.org> <alpine.BSF.2.00.1111071818250.4603@ai.fobar.qr> <4EB86276.6080801@sippysoft.com> <4EB86866.9060102@sippysoft.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 7 Nov 2011, Maxim Sobolev wrote:

> On 11/7/2011 2:57 PM, Maxim Sobolev wrote:
>> On 11/7/2011 10:24 AM, Bjoern A. Zeeb wrote:
>>> Unlikely; the inp is properly locked there and the udp info attach
>>> better still be valid there; your problem is most likely elsewhere;
>>> try to see if you have other threads and see what they do at the same
>>> time, etc. You would need to race with udp_detach(); you also want
>>> to make sure that the inp still looks sane from either ddb or a dump
>>> and we are not talking about random memory corruption here.
>> 
>> Well, as you can see from the trace it points pretty strongly to that
>> piece of code. And as I said this panic is completely reproducible,
>> we've seen it at least 5 times to date in exactly this location.
>> Unfortunately the trace is rather long so we could not capture it in
>> full before, until we've switched to the 80x50 mode.
>> 
>> If it was a memory corruption it would be just random fault, while here
>> we have it failing in this point reliably.
>> 
>> Unfortunately the panic happens in the driver thread context (I
>> believe), so the KDB/dump is not working. After panicing the machine
>> just hangs there. Keyboard is not working and I need to do a hard reset.
>> 
>> Is there any other explanation that you can think of? Is it possible for
>> some other portion of the code (i.e. network driver, DMA engine etc) to
>> trash this structure by writing something off bound? Or something along
>> the lines?
>
> OK, I've put the following catch to prove the case:
>
>        up = intoudpcb(inp);
>        if (up == NULL) {
>                printf("BZZT! Something is terribly wrong, up == NULL!\n");
>                INP_RUNLOCK(inp);
>                goto badunlocked;
>        }
>        if (up->u_tun_func == NULL) {
>
> I am going to give it a spin on two busiest boxes and see if I can log 
> anything.

Now if you are clever you'd also log the inp there as the above will
only prove the case that something is wrong but still not help us in
anything to figure out what.

/bz

-- 
Bjoern A. Zeeb                                 You have to have visions!
          Stop bit received. Insert coin for new address family.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1111072324340.4603>