Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 1 Apr 2012 17:27:58 +0200
From:      Marius Strobl <marius@alchemy.franken.de>
To:        David Cross <dcrosstech@gmail.com>
Cc:        freebsd-sparc64@freebsd.org
Subject:   Re: 9.0-RELEASE, SPARC64, Ultra10, dummynet hard hang
Message-ID:  <20120401152758.GA25442@alchemy.franken.de>
In-Reply-To: <CAM9edeOMGwp-Kx3k5kfPB0w6eB-APKAvEE-ERq%2BaL1ggDW5D-w@mail.gmail.com>
References:  <CAM9edePAidH4Mp24MraBKQZ7S=j7d4qs=P=k_q3v5L4KR8CA-A@mail.gmail.com> <CAM9edeOMGwp-Kx3k5kfPB0w6eB-APKAvEE-ERq%2BaL1ggDW5D-w@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Mar 30, 2012 at 10:37:51PM -0400, David Cross wrote:
> Ok.. to follow up on my own question, I have tracked it down!
> 
> So, the problem is an unalligned memory access in the "burst" parameter of
> dn_link.  A printf of it on my system gives:
> &(p->burst)=0x0xfffff80002c48f7c
> 
> burst is an uint64_t.. that isn't 64bit aligned.
> 
> This raises a few questions:
> 1) Why isn't it being autoaligned, doesn't gcc do this (I am almost
> positive it does (or it should) (I have no /etc/make.conf, completely stock
> options)

The compiler is free to assume native alignment of the struct members
for optimization, i.e. use 64-bit accesses for 64-bit members instead
of using 8 8-byte accesses. On architectures with strict alignment
requirements, i.e. with FreeBSD everything but x86, requires that
the memory address accessed also is 64-bit aligned though. The problem
is that dummynet(4) consists of a lot of broken code that casts random
chunks of memory to structs with the memory not necessarily fulfilling
the alignment of these structs. There's nothing wrong with the compiler
or its settings in this regard; it's the code that only has a chance of
working on x86.

> 2) Why is this causing a _deadlock_? (note kernel debugger _does_ work..
> which was a boon in getting to "close" to where the problem was in the
> first place)
> 3) Since it does cause a deadlock, it means that a bus-fault handler is
> being called that _doesn't_ panic.. and doesn't return correctly?
> 4) since its not tripping a RED error, its not looping the handler.
> and

Turns out that I once broke panicing on certain fatal exceptions while
in the kernel, which is fixed in r233747. Now dummynet(4) triggers a
panic again as it should.

> 5) given all of the above.. what's the fix?  I am modifying dn_link to be
> 64 bit aligned (manually).. but this feels like the wrong approach (though
> it will hopefully get me what I want for 'now'.
> 

The correct fix is to copy the random memory byte-wise into instances
with the expected alignment and to use the latter instead like in the
following patch:
http://people.freebsd.org/~marius/dummynet_unfuck_dn_link.diff
This only fixes the tip of the iceberg though, potentially all of these
types of erroneous casts in dummynet(4) potentially blow on !x86.
An acceptable band-aid actually allowing these casts would be to declare
the structs as packed, which forces byte-accesses as a side-effect.
Given that the layout of struct dn_link isn't thought out well this
would break the ABI though.

Marius




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120401152758.GA25442>