From owner-freebsd-hackers Tue Jun 1 9:14:28 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from skynet.ctr.columbia.edu (skynet.ctr.columbia.edu [128.59.64.70]) by hub.freebsd.org (Postfix) with SMTP id E58C315084 for ; Tue, 1 Jun 1999 09:13:46 -0700 (PDT) (envelope-from wpaul@skynet.ctr.columbia.edu) Received: (from wpaul@localhost) by skynet.ctr.columbia.edu (8.6.12/8.6.9) id MAA06119; Tue, 1 Jun 1999 12:13:37 -0400 From: Bill Paul Message-Id: <199906011613.MAA06119@skynet.ctr.columbia.edu> Subject: Re: xl driver for 3Com To: maret@axis.de (Alexander Maret) Date: Tue, 1 Jun 1999 12:13:35 -0400 (EDT) Cc: hackers@freebsd.org In-Reply-To: <91DA20EC3C3DD211833400A0245A4EA9BA0E4F@erlangen01.axis.de> from "Alexander Maret" at Jun 1, 99 08:58:30 am X-Mailer: ELM [version 2.4 PL24] Content-Type: text Content-Length: 10765 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Of all the gin joints in all the towns in all the world, Alexander Maret had to walk into mine and say: > Hi, > > > Well maybe FreeBSD is transmitting packets much faster than Linux. :) > > You still haven't actually measured the transfer speed, so there's > > no way for us to know. > > Well, I'll do and report the results to you. Actually, there's another difference between the behavior of the FreeBSD and Linux drivers that could affect this. There may be a similar difference between the FreeBSD and LoseNT drivers too, but I'm only vaguely familiar with how LoseNT (or rather, NDIS miniport) drivers work so I can't be sure about that. Basically, the driver has one particular entry point for initiating packet transmission. In Linux, this entry point gets handed a single packet (stored in an skbuff) and a pointer to the device structure for that particular driver instance. In FreeBSD, the start routine gets handed a pointer to the ifnet structure for the interface (which is associated with the driver). The ifnet structure in turn contains the send queue which may have several packets queued for transmission in the form of mbufs (the maximum number depends on the size of ifq_maxlen, which the driver can set at initialization time). The difference is that in Linux, the driver sets up a single DMA/transmit sequence at a time, because the transmit routine only gets access to one packet at a time. In FreeBSD, the driver has access to the entire send queue, where there may be several packets waiting. The driver can then handle the send queue as it sees fit: it can pop the first packet off the queue and transmit it, then wait for it to complete before moving on to the next packet, or it can pop a whole series of packets off the send queue and set up a large DMA transfer where all of the packets will get transfered at once, rather than one at a time. (The driver may still program the NIC to signal successful transmission of each frame in the transfer just to make sure things are working right, but it may also choose just to have the NIC acknowledge the last packet in the transfer in order to reduce the number of interrupts. The xl driver requests an interrupt only for the last frame in a DMA transfer.) The FreeBSD driver also sets the transmit threshold for best performance. The transmit start threshold specifies how many bytes should be transfered to the NIC's memory before it will begin putting the data on the wire. The idea is that transfer of data from the host to the NIC can proceed simultaneously with transmission of data from the NIC to the wire: as new data arrives in the NIC, it gets dumped onto the network as soon as possible. Note however that this only works well if the host can keep up. With slower systems, you may see transmit underruns where the NIC wants to transmit but the data isn't ready yet. In this case, the driver will increase the transmit start threshold and generate a message telling what happened. Eventually, the threshold will be increased enough that the transmit underrun condition will not appear anymore. This means that it's possible for the FreeBSD driver to transmit a whole bunch of packets at once with very little time in between. If there's another host transmitting back at the same time, this also means that you're more likely to see collisions. However it also means that you get very fast transmissions, which is supposed to be a good thing. Can you throttle back the xl driver? Well, yes, if you want. There are two things you can do: - Use a different default for the transmit start threshold. In xl_init(), the driver initializes sc->xl_tx_thresh to XL_MIN_FRAMELEN, which is 60. You can change this to 120 or 512, or even 1536 if you want to disable the threshold entirely and have the NIC wait until the whole packet has been DMAed into its memory before it starts a transmission. - Make xl_start() only queue one packet at a time. This unforunately requires some code changes (not big ones, but it's more than just changing a setting somewhere). > > Grrr. I'm sorry, but I really don't think you're putting the pieces > > together correctly. Setting the NT machine to full duplex should have > > absolutely no effect on the FreeBSD host. It will completely screw up > > performance since the LoseNT host will then no longer be set to match > > the hub, but that's another problem. I strongly suspect that you're > > not making the proper observations when your problem manifests and > > just leaping to the conclusion that setting the LoseNT host to full > > duplex crashes the FreeBSD host. > > I just tell you what i experienced. Well, it's suspicious. It gives the impression that setting the LoseNT host to full duplex mode somehow angered the computer gods, prompting them to make your FreeBSD host spontaneously reboot. Also, there might be more to it. For example, if you were running the X Window system on the console at the time, then there may have been a panic message printed which you weren't able to see: if the kernel panics while the X server is running, the console does not get set back to text mode, so any panic message would be obscured. Usually the system will wait about 15 seconds before it reboots after a panic, so the only evidence you will have that the kernel has crashed is that the console will be frozen for about 15 seconds before the system resets. In order to really see what's going on, you either need to set up a serial console or enable crash dumps (the kernel message buffer will get saved as part of the crash dump, and you can either look at it with gdb or go rummaging around the vmcore with strings -a). If you aren't running X on the console, then maybe you've set the system to something besides virtual console 0: I'm pretty sure that kernel printf()s (and panic messages) will only be printed on ttyv0, so if you're on ttyv1 you may not get any warning. And if you are on ttyv0 and still it blows up without warning, then you need to investigate a bit to find the cause. Observing the LEDs on the hub and on the NICs when you switch modes would be a good idea. You might also try running tcpdump -i xl0 on the FreeBSD host and watching what happens. You could also try compiling the kernel with 'options DDB' to see if you can get the system to drop into the debugger instead of resetting. Maybe the LoseNT host is sending some sort of chernobylgram when it changes modes. Maybe the hub is getting confused and applying line voltage to the transmit and receive lines in the FreeBSD host's port. In general it's rare that the kernel will just reset without any warning; most runtime kernel errors result in a page fault or some other trap which will be signalled by a panic. > > I don't think that's true. > > Why are you ignoring problems. On the one hand you want people to use > your driver and complain if they switch to other cards and on the other > hand you comment their problems with: That's not true. Put yourself in my shoes. You have somebody who claims that setting his LoseNT machine to full duplex crashes his FreeBSD machine without really providing enough clues to determine a probable cause. There's no evidence that the problem is 100% repeatable (did you try it several times in a row and observe that it happened every time?) and no supporting analysis to show any connection between the two events. All we have is: action A was followed by event B, therefore A caused B. That's not enough: you have to show that action A always leads to event B, and you have to find enough evidence to show a plausible connection between A and B. Pressing the 'full duplex' button in the 3Com NIC configuration utility doesn't send a 'please crash now' message to the FreeBSD host (although I'm sure Micro$oft wouldn't mind if it did). There is some sequence of events that happens between action A and event B, and since I'm way over here I can't automatically know what those are. As for the collisions, the fact that you have never observed them before doesn't mean they are unusual. Again, you still don't say what kind of transfer rates you observe. If the speed is comparable with what you've observed with other machines (running at the same settings, that is), then it's hard to say there's something really wrong. > As I said in an earlier mail: If there are too many collisions > my NT server crashes. Sometimes it doesn't crash but I can't send > any data over the net either. I think it has something to do with the > TCP/IP protocol because I then can't ping the NT machine nor the FreeBSD > server. I have to reboot NT and everything works fine again. I can > easily reproduce this error. Even with another NT machine which shows > the same problem. You should try to run tcpdump on the FreeBSD host when attempting to use ping, as well as observe the activity LEDs on the hub. When the problem happens, try to run ping on the NT machine and see if you observe any packets with tcpdump on the FreeBSD machine. The problem here that you state that the machines don't communicate, but we need to know why. The NT host may be failing to transmit, or it may be failing to receive. Or the FreeBSD host may be failing to transmit or receive. We need to know which. Say you have three machines on the network: two NT hosts and the FreeBSD server. Now you duplicate the problem with one NT host where it can't ping anymore. Can the second NT host and the FreeBSD server still communicate? Can the two NT hosts still communicate? If the second NT machine and the FreeBSD server can still talk to each other, the the problem is with the first NT host. If the two NT hosts can still communicate, but neither of them can talk to the FreeBSD server then the problem is with the FreeBSD server. If you have tcpdump running on the FreeBSD server and you still see traffic coming from the NT hosts, then the problem is that the FreeBSD server is receiving but not transmitting. In this case, you should do ifconfig xl0 and see if the OACTIVE flag is set. If it is, it means the driver has used up all its DMA desciptors and the NIC hasn't acknowledged transmissions so it hasn't been able to free any of them. -Bill -- ============================================================================= -Bill Paul (212) 854-6020 | System Manager, Master of Unix-Fu Work: wpaul@ctr.columbia.edu | Center for Telecommunications Research Home: wpaul@skynet.ctr.columbia.edu | Columbia University, New York City ============================================================================= "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness" ============================================================================= To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message