Skip site navigation (1)Skip section navigation (2)
From:      Daniel Lang <dl@leo.org>
To:        Greg Lehey <grog@lemis.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: kern/21148: multiple crashes while using vinum
Message-ID:  <20010104105845.A14755@atrbg11.informatik.tu-muenchen.de>
In-Reply-To: <20010104105428.D4336@wantadilla.lemis.com>; from grog@lemis.com on Thu, Jan 04, 2001 at 01:25:57AM %2B0000
References:  <200101012239.f01MdiH40906@freefall.freebsd.org> <20010103145232.B10169@atrbg11.informatik.tu-muenchen.de> <20010104105428.D4336@wantadilla.lemis.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Dear Greg,

Greg Lehey wrote on Thu, Jan 04, 2001 at 01:25:57AM +0000:
[..]
> As my closing message says, the reason I closed the PR was:
> 
> >> No feedback from submitter.
> 
> I sent you a message on 10 September 2000 asking for additional
> information.  I received none.  There's no reason to get all upset
I've sent _two_ direct replies. If you have not received
them, then maybe some MX had a problem? If you decided they still did
not contain any of the information you need, or have been 
malformed/mutilated, etc. a short hint would have been appreciated.
The first was:

[..]
Date: Sun, 10 Sep 2000 16:18:11 +0200
Message-ID: <20000910161811.A56954@atrbg11.informatik.tu-muenchen.de>
[..]
Still in the archives at:
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=4693+0+archive/2000/freebsd-bugs/20000917.freebsd-bugs

It includes three stack traces then with full debugging symbols.

The next day, I sent a followup:
Date: Mon, 11 Sep 2000 13:54:21 +0200
Message-ID: <20000911135421.C58840@atrbg11.informatik.tu-muenchen.de>

http://docs.freebsd.org/cgi/getmsg.cgi?fetch=77649+0+archive/2000/freebsd-bugs/20000917.freebsd-bugs

They have been sent to you personally and to freebsd-bugs. The
second one as a pr-followup too. Alas, I omitted freebsd-gnats-submit
on the first reply.

> now, or make claims about my intentions.  This was just a dead PR, and
> you've made it clear, both before and now, that you have no intention
> of following up on it.  This is not a question of "ignorant morons" or
> "trust".
Sorry, I'm not upset, even if it sounded that way, and I would
not dare to speculate about your intentions. My apologies.
My memory and my Mailfolders tell me, I did try to be helpful and
did indeed follow your request from September 10, to provide
you with additional information. I did not hear from you since
then. If the reason is, that they still did not meet 
your requirements, I'm sorry that I would have needed some more
hints. IMHO the lacking information was a valid backtrace, which
I supplied. So my intention then was indeed to following up
on it. After I did not hear anything from you, I even emailed 
Søren Schmidt, the ATA guy, because I suspected that the
bug was in the ATA driver (turned out to be unlikely because
scsi-only systems showed similar problems).
Even now, I'm still interested in helping to fix the problem,
but I may not be able to help with crash-dumps at the moment.

[..]
> Yes, this looks very much like the other issues.  But you must
> understand that there's nothing I can do without further information.
Agreed.

[..]
> The trouble with that is that this only happens when the system is
> very active, and there are thousands of potential buffer headers which
> could be trashed.  I do have a trace facility within Vinum, but even
> with that it's difficult to figure out what's going on.
No doubt.
[..]
> If you mean that the same part of the buffer header gets smashed every
> time, yes, this is reliably reproducible (well, in other words, when
> it happens (at random), it happens in the same place every time).  It
> may mean that Vinum is doing it, but as far as I can tell it's always
> 6 words being zeroed out, and I don't do that anywhere in Vinum.  The
> other possibility, which I consider most likely, is that the data
> structures accidentally get freed and used by some other driver (or,
> possibly, that some other driver freed them first and then continued
> using them).  This would explain the observed correlation with the fxp
> driver.
This is indeed interesting, and maybe a reason why dmesg is
not utterly useless. ;)

My boxes have a fxp NIC, as well:
[..]
fxp0: <Intel Pro 10/100B/100+ Ethernet> port 0xff80-0xff9f mem 0xfe900000-0xfe9fffff,0xfe2ff000-0xfe2fffff irq 11 at device 12.0 on pci0
[..]

[..]
> Well, I sent you a message on 10 September 2000, asking for additional
> information.  You didn't send it to me.
See above. I at least tried. :-)

[..]
> Correct.  I have no doubt about it.  But some bugs are difficult to
> find, and I need help.
Ok. Here we are. Unfortunately we should have had this 
discussion already in September, when the issue was more
current to many of us. :-/

However, I've got a twin box, which is not in production
at the moment, but currently runs Slowlaris X86. 
I'm going to put a current FreeBSD on it, and if I find
some time and enough disks, I will set up a raid5 again.
Maybe we can still find the nasty bugger.
Unfortunately I cannot tell, when I find time to do this,
it may still take a month or two.

Best regards,
 Daniel
-- 
IRCnet: Mr-Spock         - ceterum censeo Microsoftinem esse delendam -  
*Daniel Lang * dl@leo.org * +49 89 289 25735 * http://www.leo.org/~dl/*


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010104105845.A14755>