Date: Sun, 1 Apr 2012 17:27:58 +0200 From: Marius Strobl <marius@alchemy.franken.de> To: David Cross <dcrosstech@gmail.com> Cc: freebsd-sparc64@freebsd.org Subject: Re: 9.0-RELEASE, SPARC64, Ultra10, dummynet hard hang Message-ID: <20120401152758.GA25442@alchemy.franken.de> In-Reply-To: <CAM9edeOMGwp-Kx3k5kfPB0w6eB-APKAvEE-ERq%2BaL1ggDW5D-w@mail.gmail.com> References: <CAM9edePAidH4Mp24MraBKQZ7S=j7d4qs=P=k_q3v5L4KR8CA-A@mail.gmail.com> <CAM9edeOMGwp-Kx3k5kfPB0w6eB-APKAvEE-ERq%2BaL1ggDW5D-w@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Mar 30, 2012 at 10:37:51PM -0400, David Cross wrote: > Ok.. to follow up on my own question, I have tracked it down! > > So, the problem is an unalligned memory access in the "burst" parameter of > dn_link. A printf of it on my system gives: > &(p->burst)=0x0xfffff80002c48f7c > > burst is an uint64_t.. that isn't 64bit aligned. > > This raises a few questions: > 1) Why isn't it being autoaligned, doesn't gcc do this (I am almost > positive it does (or it should) (I have no /etc/make.conf, completely stock > options) The compiler is free to assume native alignment of the struct members for optimization, i.e. use 64-bit accesses for 64-bit members instead of using 8 8-byte accesses. On architectures with strict alignment requirements, i.e. with FreeBSD everything but x86, requires that the memory address accessed also is 64-bit aligned though. The problem is that dummynet(4) consists of a lot of broken code that casts random chunks of memory to structs with the memory not necessarily fulfilling the alignment of these structs. There's nothing wrong with the compiler or its settings in this regard; it's the code that only has a chance of working on x86. > 2) Why is this causing a _deadlock_? (note kernel debugger _does_ work.. > which was a boon in getting to "close" to where the problem was in the > first place) > 3) Since it does cause a deadlock, it means that a bus-fault handler is > being called that _doesn't_ panic.. and doesn't return correctly? > 4) since its not tripping a RED error, its not looping the handler. > and Turns out that I once broke panicing on certain fatal exceptions while in the kernel, which is fixed in r233747. Now dummynet(4) triggers a panic again as it should. > 5) given all of the above.. what's the fix? I am modifying dn_link to be > 64 bit aligned (manually).. but this feels like the wrong approach (though > it will hopefully get me what I want for 'now'. > The correct fix is to copy the random memory byte-wise into instances with the expected alignment and to use the latter instead like in the following patch: http://people.freebsd.org/~marius/dummynet_unfuck_dn_link.diff This only fixes the tip of the iceberg though, potentially all of these types of erroneous casts in dummynet(4) potentially blow on !x86. An acceptable band-aid actually allowing these casts would be to declare the structs as packed, which forces byte-accesses as a side-effect. Given that the layout of struct dn_link isn't thought out well this would break the ABI though. Marius
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120401152758.GA25442>