Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 23 Dec 2011 16:20:44 -0800
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Charlie Martin <crmartin@sgi.com>
Cc:        Eric Richards <erichards@sgi.com>, Larry Fenske <LFenske@sgi.com>, freebsd-stable@FreeBSD.org, "Peter W. Morreale" <morreale@sgi.com>
Subject:   Re: PRINTF_BUFR_SIZE=4096?
Message-ID:  <20111224002044.GA30339@icarus.home.lan>
In-Reply-To: <4EF50882.9080609@sgi.com>
References:  <4EF3B790.5050509@sgi.com> <20111223000705.GA6242@icarus.home.lan> <4EF4FED2.6020909@sgi.com> <20111223225445.GA29093@icarus.home.lan> <4EF50882.9080609@sgi.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Dec 23, 2011 at 04:02:26PM -0700, Charlie Martin wrote:
> Thanks, Jeremy, I really was trying to keep you from needing to dig
> this out.  This is inherited code with some very peculiar
> intermittent panics, so you can imagine that I would be interested
> in specifics of the odd behavior.  Sadly, I don't think we're seeing
> any stack overflows.

I say this politely, not condescendingly: your last statement indicates
you don't quite understand the nature of what having a large-ish
stack-based buffer could do to the kernel.  This is not userland.

I'm pretty sure the issues you're seeing (the devfs stuff) is fixed or
improved in RELENG_8, but I say that without being able to point you to
a specific commit.  My reasoning is that there has been a *ton* of
improvements in devfs in RELENG_8 onward, and these will almost
certainly not be backported to RELENG_7.

You are very, *very* adamant about stating "we cannot upgrade", and it
is my opinion that as long as you don't upgrade, you're going to be
"stuck" with these kind of bugs.  Therefore, it would be worth your time
to put forth efforts in testing RELENG_8 (not 8.2-RELEASE please;
seriously, go with 8.2-STABLE (RELENG_8), just trust me on this) in a
test environment and see how things go for you.  I think you will be
pleased with the results.  You'll also get much more attentive/better
support from the community/developers since RELENG_8 is supported,
while RELENG_7 (especially -PRERELEASE) is losing more and more
attention.  It's Security EOL ends sometime early next year, and I hope
you're aware of that fact as well.

What I'm getting at here, without getting political: you need to start
considering developing resources to help with upgrading.  But for sake
of example, we have a FreeBSD RELENG_6 box (6.4-STABLE) in our cluster
that has actively been up for 385 days (went down a year ago because of
co-lo maintenance I was doing on power conduits).  If this machine
suddenly panic'd, would I report the bug to -stable and so on?  No.  I
would suck it up.

> On 12/23/2011 03:54 PM, Jeremy Chadwick wrote:
> >On Fri, Dec 23, 2011 at 03:21:06PM -0700, Charlie Martin wrote:
> 
> >When I was doing FreeBSD "stuff" as part of the Project, I added this to
> >my Commonly Reported Issues wiki page since it comes up quite often.
> >Search for "BUFR".
> >
> >http://wiki.freebsd.org/BugBusting/Commonly_reported_issues
> >
> 
> I will note that all the "Commonly reported" page says is "set the
> value to 256" and point to three examples of people seeing garbled
> output.

There's some history here for why that is.... kind of.  I'll try to
explain:

For many years, PRINT_BUFR_SIZE was not defined in any of the default
kernel configs.  It was mentioned in /sys/conf/NOTES, but did not ship
in GENERIC, etc..

Then after more and more people (since the FreeBSD 6.x days) began
reporting interspersed kernel output, more and more developers started
finding it annoying too (both the reports and the problem itself; let me
tell you, it makes using ddb to debug a kernel crash in real-time), the
option was added to the default kernels since it *does* improve things a
little bit (better than nothing).

The value 256 is something *I personally* chose, because 128 was simply
not improving things "enough" on our systems.  256 made a bigger
difference.  The reason it still remains as 128 in the stock kernel
configs is due to the issue I mentioned in my previous post, re:
developers having justified concerns over the implications of increasing
this value too high.

I want readers of this thread to understand something: my previous
paragraph should not elude to "the higher the value, the better off you
are".  I have not actually *looked* at the code to see how it works.  I
tend to trust folks who know more about the implications (especially in
kernel space) of large static buffers, but even in userland I understand
the difference and implications of doing char buf[65536]; rather than
char *buf = calloc(1, 65536);.

TL;DR -- Don't just go increasing this value to something gigantic in
hopes that the larger value means you can solve the problem.  It won't
solve the problem entirely.  For now, *knowing* about interspersed
output is enough.  I'll also point out that Solaris 10 (not sure about
OpenIndiana) also has this problem (we see it at work on occasion), so
FreeBSD isn't alone.

P.S. -- No one on this list should *ever* feel obliged to "cut me some
slack" because of holidays.  For example, for the past 10 years I have
worked on every single US holiday including Christmas.  I consider them
just like any other day.  Maybe it's because I'm not married, don't have
kids, don't have a tree, etc. instead preferring to stick with relying
on nostalgia/old memories of childhood Christmases and stuff like that.
That's just how I am.  :-)

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20111224002044.GA30339>