Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 Nov 2002 03:46:45 -0500
From:      Scott Sipe <cscotts@mindspring.com>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        John Baldwin <jhb@FreeBSD.org>, current@FreeBSD.org
Subject:   Re: DP2 Fatal Trap
Message-ID:  <200211230346.45917.cscotts@mindspring.com>
In-Reply-To: <3DDF305B.7C468ED6@mindspring.com>
References:  <XFMail.20021122105737.jhb@FreeBSD.org> <200211222210.24713.cscotts@mindspring.com> <3DDF305B.7C468ED6@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Saturday 23 November 2002 02:38 am, Terry Lambert wrote:
> Scott Sipe wrote:
> > Alright, this is pretty frustrating.  I've installed DP2 4 or 5 times=
 now
> > (each time reformatting).
> >
> > The first time the installation program acted really weird and didn't=
 do
> > the install correctly.
>
> Not useful information, but expected, based on your other reports.
> Most people would accuse you of overclocking.  8-).

I'm actually underclocking..(and have been)

> > The third time I think was with the trap 12 when I started tcsh...
> > now I reinstalled and it SEEMS to be working fine.  At least enough
> > that I was able to compile a custom kernel, and compile most of
> > the gnome suite from ports too (and then to remove it ;).
>
> The trap 12 is a real problem.  The useful information you posted
> before was the traceback.  The fact that the error occurred where
> no such error should be possible is indicative of a hardware
> problem: either bad RAM, or a cooked CPU (usually a result of
> overclocking), or a CPU bug from the vendor, or a problem with
> the data as it was transferred from the hard drive (a disk or
> controller problem; rarely, a driver problem, though it's not
> likely, since you got as far as you did).

as mentioned earlier, not overclocking. (and plus I am running Stable and=
=20
WinXP with no stability problems).

> > There was ONE problem I had -- one of the g++ include files
> > (limits) had one line that was corrupted and I could fix.
> > the line was like:
> >
> > coint name_more; (somethingl ike that)
> >
> > when it should have been
> > const int name_more10;
>
> If this was actually it, then it dropped 32 bits on its way
> into cache.  Did you try rebooting, to see if the file "healed
> itself"?  This would support the theory of a disk/controller/driver
> error.  Is this maybe a CMD 640B or similar IDE/ATAPI controller?
> Note that this could also be a result of a dirty cache line and/or
> a CPU bug, but it's more likely to *change* characters, rather
> than deleting them.  The correct thing to do would probably be to
> "hd" the file, to see if the characters were not there, or if they
> were converted to non-displaed characters (e.g. four "0x00"'s).

I didn't make a backup copy (or mark down the errors) of the bad file or =
try=20
rebooting which in retrospect would have been a good idea..sorry--I just=20
fixed the file and saved it so I could compile some ports--and that worke=
d.=20

I have an IWILL KK266 motherboard which has a "AMI MegaRaid" controller a=
nd a=20
VIA Apollo KT133A chipset.  The FreeBSD drive is primary master ad0 on th=
e=20
via ide line (both Current and Stable are on the same disk).  I have a dv=
d=20
drive and a cdrw on the secondary channel.  Then 2 harddisks, one each on=
 the=20
RAID controller (I use the bios to alternate which drives are used for=20
booting--the RAID or the IDE)

some pertinent parts of my STABLE dmesg:
atapci0: <VIA 82C686 ATA100 controller> port 0xc000-0xc00f at device 7.1
on pci0
atapci0: Correcting VIA config for southbridge data corruption bug
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
atapci1: <CMD 649 ATA100 controller> port 0xe800-0xe80f,0xe400-0xe403,0xe
000-0xe007,0xdc00-0xdc03,0xd800-0xd807 irq 10 at device 16.0 on pci0
ata2: at 0xd800 on atapci1
ata3: at 0xe000 on atapci1

ad0: 76319MB <MAXTOR 4K080H4> [155061/16/63] at ata0-master UDMA100
ad1: 114440MB <WDC WD1200JB-75CRA0> [232514/16/63] at ata2-master UDMA100
ad2: 114440MB <WDC WD1200JB-75CRA0> [232514/16/63] at ata3-master UDMA100
acd0: DVD-ROM <Pioneer DVD-ROM ATAPIModel DVD-106S 0109> at ata1-master P=
                      =20
IO4
acd1: CD-RW <LITE-ON LTR-32123S> at ata1-slave PIO4

> > (line 1710 iirc)
> >
> > so basically it seems like I'm getting random data corruption at rand=
om
> > times with random results.  fwiw, I'm running stable off the same
> > harddisk as I type this.
>
> Stable has this problem?  Yes or No?

No, Stable has no problems at all like I see in current.  Install never m=
essed=20
up, no data corruption, no traps, few core dumps (and none like I experie=
nced=20
in Current).  I've been running stable on this current computer since ear=
ly=20
in the 4.x series and haven't seen this kinda problem before.

> > sorry if this throws a wrench in things again.
>
> No, it doesn't.  But it would help if you answered the question
> about whether or not Stable has the problem, too, and the first
> three questions I asked.  If it's the CPU bug, I *can* provide a
> kernel that fixes the problem, I believe.  I just have to be able
> to create a kernel that has the problem, first, and since my hardware
> doesn't have the problem locally, that leaves your hardware, for the
> testing.

1) Yes it happened with a generic kernel straight off the DP2 install CD.

2) I had the problems directly off DP2 iso image burned cd install, so ca=
n=20
that tell you what you need to know about the cvs date or do you want me =
to=20
do more?

3) Yes, I'm at college on a fast connection (though with a limited upload=
) so=20
if you need to I can setup an ftp login for you on my computer.

thanks,
Scott

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200211230346.45917.cscotts>