Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 May 1998 22:22:51 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        tom@sdf.com (Tom)
Cc:        tlambert@primenet.com, beng@lcs.mit.edu, freebsd-hackers@FreeBSD.ORG
Subject:   Re: Network problem with 2.2.6-STABLE
Message-ID:  <199805072222.PAA13813@usr01.primenet.com>
In-Reply-To: <Pine.BSF.3.95q.980507000533.28352B-100000@misery.sdf.com> from "Tom" at May 7, 98 00:22:57 am

next in thread | previous in thread | raw e-mail | index | archive | help
> > >   Question:  why does it segfault several seconds after producing the
> > > "abort?" prompt?
> > 
> > Because it is busy dumping core because of the explicit call to "abort"
> > because you did not set "yflag" using the "-y" command line option.
> >
> > > It had already detected the "hole in map" problem?
> > 
> > Yes.  It won't call panic (a function in utilities.c) if it hasn't
> > panic'ed.
> 
>   That is bizare.  So it prints a "y/n" prompt and calls the panic
> function anyhow?

Look at the code.

You could argue that this was a cosmetic bug.

> > One possible reason for this problem *could* ge your limits on the
> > account doing the restore (see login.conf).
> 
>   I run restore as root.

Root is limited by login.conf, just like everyone else.


>   Media is good.  Dump to disk file does same thing.  I've proved that the
> tape media and tape drive are not a factor.

OK.  Now we need proof that it isn't the raw disk device, the EIDE
controller, the EIDE drive, the cable being too long, etc..

If you can do that, we still haven't localized it to whether it's
dump writing bad data or restore thinking there's bad data when there
isn't.


>   Except that most of those items you listed are the same thing.  I've
> eliminated the tape drive, the tape media, the tape SCSI controller,
> the cable etc., leaving only three things:
> 
> - raw access to a partition on EIDE
> - EIDE controller
> - dump/restore
> 
>   #2 is doubtful, as it would corrupting data all the time, and fsck
> never reports a problem, unless the controller has an interesting 
> fail-when-accessed-by-dump bug.

Or a fail under heavy access bug.

You still need to md5 an unmounted raw device and run fsck, a bunch
of times to make sure you are getting the same answer in both cases
to eliminated the raw disk driver.

This still won't eliminate some possible non-sequential access bugs
that could be introduced either by the driver, the disk cache, or the
VM system.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199805072222.PAA13813>