From owner-freebsd-hackers  Tue Jun  1  9:14:28 1999
Delivered-To: freebsd-hackers@freebsd.org
Received: from skynet.ctr.columbia.edu (skynet.ctr.columbia.edu [128.59.64.70])
	by hub.freebsd.org (Postfix) with SMTP id E58C315084
	for <hackers@freebsd.org>; Tue,  1 Jun 1999 09:13:46 -0700 (PDT)
	(envelope-from wpaul@skynet.ctr.columbia.edu)
Received: (from wpaul@localhost) by skynet.ctr.columbia.edu (8.6.12/8.6.9) id MAA06119; Tue, 1 Jun 1999 12:13:37 -0400
From: Bill Paul <wpaul@skynet.ctr.columbia.edu>
Message-Id: <199906011613.MAA06119@skynet.ctr.columbia.edu>
Subject: Re: xl driver for 3Com
To: maret@axis.de (Alexander Maret)
Date: Tue, 1 Jun 1999 12:13:35 -0400 (EDT)
Cc: hackers@freebsd.org
In-Reply-To: <91DA20EC3C3DD211833400A0245A4EA9BA0E4F@erlangen01.axis.de> from "Alexander Maret" at Jun 1, 99 08:58:30 am
X-Mailer: ELM [version 2.4 PL24]
Content-Type: text
Content-Length: 10765     
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Of all the gin joints in all the towns in all the world, Alexander Maret 
had to walk into mine and say:

> Hi,
> 
> > Well maybe FreeBSD is transmitting packets much faster than Linux. :)
> > You still haven't actually measured the transfer speed, so there's
> > no way for us to know.
> 
> Well, I'll do and report the results to you.

Actually, there's another difference between the behavior of the FreeBSD
and Linux drivers that could affect this. There may be a similar difference
between the FreeBSD and LoseNT drivers too, but I'm only vaguely familiar
with how LoseNT (or rather, NDIS miniport) drivers work so I can't be
sure about that.

Basically, the driver has one particular entry point for initiating
packet transmission. In Linux, this entry point gets handed a single
packet (stored in an skbuff) and a pointer to the device structure
for that particular driver instance. In FreeBSD, the start routine
gets handed a pointer to the ifnet structure for the interface (which
is associated with the driver). The ifnet structure in turn contains
the send queue which may have several packets queued for transmission
in the form of mbufs (the maximum number depends on the size of 
ifq_maxlen, which the driver can set at initialization time).

The difference is that in Linux, the driver sets up a single DMA/transmit
sequence at a time, because the transmit routine only gets access to one
packet at a time. In FreeBSD, the driver has access to the entire send
queue, where there may be several packets waiting. The driver can then
handle the send queue as it sees fit: it can pop the first packet off
the queue and transmit it, then wait for it to complete before moving
on to the next packet, or it can pop a whole series of packets off the
send queue and set up a large DMA transfer where all of the packets will
get transfered at once, rather than one at a time. (The driver may
still program the NIC to signal successful transmission of each frame
in the transfer just to make sure things are working right, but it
may also choose just to have the NIC acknowledge the last packet in
the transfer in order to reduce the number of interrupts. The xl driver
requests an interrupt only for the last frame in a DMA transfer.)

The FreeBSD driver also sets the transmit threshold for best performance.
The transmit start threshold specifies how many bytes should be transfered
to the NIC's memory before it will begin putting the data on the wire.
The idea is that transfer of data from the host to the NIC can proceed
simultaneously with transmission of data from the NIC to the wire: as
new data arrives in the NIC, it gets dumped onto the network as soon as
possible. Note however that this only works well if the host can keep
up. With slower systems, you may see transmit underruns where the NIC
wants to transmit but the data isn't ready yet. In this case, the driver
will increase the transmit start threshold and generate a message telling
what happened. Eventually, the threshold will be increased enough that
the transmit underrun condition will not appear anymore.

This means that it's possible for the FreeBSD driver to transmit a
whole bunch of packets at once with very little time in between. If 
there's another host transmitting back at the same time, this also means
that you're more likely to see collisions. However it also means that
you get very fast transmissions, which is supposed to be a good thing.

Can you throttle back the xl driver? Well, yes, if you want. There are
two things you can do:

- Use a different default for the transmit start threshold. In xl_init(),
  the driver initializes sc->xl_tx_thresh to XL_MIN_FRAMELEN, which is 60.
  You can change this to 120 or 512, or even 1536 if you want to disable
  the threshold entirely and have the NIC wait until the whole packet has
  been DMAed into its memory before it starts a transmission.

- Make xl_start() only queue one packet at a time. This unforunately
  requires some code changes (not big ones, but it's more than just
  changing a setting somewhere).

> > Grrr. I'm sorry, but I really don't think you're putting the pieces
> > together correctly. Setting the NT machine to full duplex should have
> > absolutely no effect on the FreeBSD host. It will completely screw up
> > performance since the LoseNT host will then no longer be set to match
> > the hub, but that's another problem. I strongly suspect that you're
> > not making the proper observations when your problem manifests and
> > just leaping to the conclusion that setting the LoseNT host to full
> > duplex crashes the FreeBSD host. 
> 
> I just tell you what i experienced.

Well, it's suspicious. It gives the impression that setting the LoseNT 
host to full duplex mode somehow angered the computer gods, prompting 
them to make your FreeBSD host spontaneously reboot. Also, there might 
be more to it. For example, if you were running the X Window system on 
the console at the time, then there may have been a panic message 
printed which you weren't able to see: if the kernel panics while the X 
server is running, the console does not get set back to text mode, so any 
panic message would be obscured. Usually the system will wait about 15 
seconds before it reboots after a panic, so the only evidence you will 
have that the kernel has crashed is that the console will be frozen for 
about 15 seconds before the system resets. In order to really see what's
going on, you either need to set up a serial console or enable crash
dumps (the kernel message buffer will get saved as part of the crash
dump, and you can either look at it with gdb or go rummaging around the
vmcore with strings -a).

If you aren't running X on the console, then maybe you've set the system
to something besides virtual console 0: I'm pretty sure that kernel printf()s
(and panic messages) will only be printed on ttyv0, so if you're on ttyv1
you may not get any warning. And if you are on ttyv0 and still it blows
up without warning, then you need to investigate a bit to find the cause.
Observing the LEDs on the hub and on the NICs when you switch modes would
be a good idea. You might also try running tcpdump -i xl0 on the FreeBSD
host and watching what happens. You could also try compiling the kernel
with 'options DDB' to see if you can get the system to drop into the
debugger instead of resetting. Maybe the LoseNT host is sending some 
sort of chernobylgram when it changes modes. Maybe the hub is getting
confused and applying line voltage to the transmit and receive lines in
the FreeBSD host's port.

In general it's rare that the kernel will just reset without any warning;
most runtime kernel errors result in a page fault or some other trap which
will be signalled by a panic.

> > I don't think that's true.
> 
> Why are you ignoring problems. On the one hand you want people to use
> your driver and complain if they switch to other cards and on the other
> hand you comment their problems with: That's not true.

Put yourself in my shoes. You have somebody who claims that setting
his LoseNT machine to full duplex crashes his FreeBSD machine without
really providing enough clues to determine a probable cause. There's
no evidence that the problem is 100% repeatable (did you try it several 
times in a row and observe that it happened every time?) and no supporting
analysis to show any connection between the two events. All we have is:
action A was followed by event B, therefore A caused B. That's not enough:
you have to show that action A always leads to event B, and you have to
find enough evidence to show a plausible connection between A and B. 
Pressing the 'full duplex' button in the 3Com NIC configuration utility 
doesn't send a 'please crash now' message to the FreeBSD host (although 
I'm sure Micro$oft wouldn't mind if it did). There is some sequence of 
events that happens between action A and event B, and since I'm way over 
here I can't automatically know what those are.

As for the collisions, the fact that you have never observed them before
doesn't mean they are unusual. Again, you still don't say what kind of
transfer rates you observe. If the speed is comparable with what you've
observed with other machines (running at the same settings, that is),
then it's hard to say there's something really wrong.
 
> As I said in an earlier mail: If there are too many collisions
> my NT server crashes. Sometimes it doesn't crash but I can't send
> any data over the net either. I think it has something to do with the 
> TCP/IP protocol because I then can't ping the NT machine nor the FreeBSD
> server. I have to reboot NT and everything works fine again. I can
> easily reproduce this error. Even with another NT machine which shows
> the same problem.

You should try to run tcpdump on the FreeBSD host when attempting to
use ping, as well as observe the activity LEDs on the hub. When the problem
happens, try to run ping on the NT machine and see if you observe any
packets with tcpdump on the FreeBSD machine. The problem here that you
state that the machines don't communicate, but we need to know why. The
NT host may be failing to transmit, or it may be failing to receive. Or
the FreeBSD host may be failing to transmit or receive. We need to know
which. Say you have three machines on the network: two NT hosts and the
FreeBSD server. Now you duplicate the problem with one NT host where it
can't ping anymore. Can the second NT host and the FreeBSD server still
communicate? Can the two NT hosts still communicate? If the second NT
machine and the FreeBSD server can still talk to each other, the the 
problem is with the first NT host. If the two NT hosts can still 
communicate, but neither of them can talk to the FreeBSD server then
the problem is with the FreeBSD server. If you have tcpdump running
on the FreeBSD server and you still see traffic coming from the NT hosts,
then the problem is that the FreeBSD server is receiving but not 
transmitting. In this case, you should do ifconfig xl0 and see if the
OACTIVE flag is set. If it is, it means the driver has used up all its
DMA desciptors and the NIC hasn't acknowledged transmissions so it hasn't
been able to free any of them. 


-Bill

-- 
=============================================================================
-Bill Paul            (212) 854-6020 | System Manager, Master of Unix-Fu
Work:         wpaul@ctr.columbia.edu | Center for Telecommunications Research
Home:  wpaul@skynet.ctr.columbia.edu | Columbia University, New York City
=============================================================================
 "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness"
=============================================================================


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message