Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Jan 2002 15:56:19 -0800
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Peter Wemm <peter@wemm.org>
Cc:        Andrew Gallatin <gallatin@cs.duke.edu>, alpha@FreeBSD.ORG
Subject:   Re: Is anybody actually able to netboot at the moment?
Message-ID:  <3C4DFC23.F5391D2D@mindspring.com>
References:  <20020122234007.1983E3BAD@overcee.wemm.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Peter Wemm wrote:
> > Actually, there's a bug in the one's complement case on the
> > FreeBSD checksum calculation, sometimes.  I was able to see
> > incorrect checksums on a number of packets.  I think it's in
> > the incremental update code, but since it doesn't seem to
> > stop things from working, I never tracked down the source of
> > the ethreal traces where I saw this.
> 
> Terry, what crack are you smoking this time?  We dont do incremental
> checksums in the libstand code.  That stuff is as simple and as unoptimized
> as it gets.

The bug is on transmit, not on receive, Peter.  8-).  Working
validation on the receive with packets with bad checksums would
stop the load.

To see if this is the problem, it would be wise to do a dump
of a failed boot attempt with ethreal, which flags checksum
errors on packets on the wire.

As always, this may or may not be the problem at all, but in
the spirit of Sherlock Holmes...

> The alpha problems were in boot1 (the 7.5K loader) and that shares no
> code with netboot at all.

OK.  I typically don't use netboot, so I can believe this...


> I have experimented with alignment in the ethernet frame send code.. it
> seems that we are trying to send with 2-byte alignment for the bootp case.
> Fixing it doesn't seem to make much difference.  However, I wonder if SRM
> is doing some length rounding or something because the lengths are not 4 or
> 8 byte multiples for the bootp queries but are for the working rarp
> queries.  However, even that doesn't make sense because it sometimes works.
> I'm more suspicious of interactions between the tulip cards when being
> driven by SRM and the switch at the moment.

OK, another shot in the dark.  The first 16 bit NE1000 cards
an interesting problem, in that, unless you sent an even
number of bus transfer units, it would always do an even
transfer anyway, and the last two bytes would be byte-swapped
when you went to checksum them, and you'd sum some garbage
byte instead of the right byte.

The fix for this was to always send an even number of bytes,
even if the payload wwas an odd length, to get around the
problem.

Maybe this is a byte-order problem?

If it is, the place to fix it is on the server (again), by
making it pad packets out to a 2 (or 4 or 8?) byte boundary
so that the received packets are transferred as a unit, but
only the payload portion is checked.

This "fix" would only apply if the packets sent on the wire
were good in both directions (i.e. it's still time for the
ethreal trace by an otherwise uninvolved third party machine).

Hope this helps... I'm waving my hands as fast as I can... ;^)

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3C4DFC23.F5391D2D>