From owner-freebsd-current@FreeBSD.ORG Mon Nov 28 15:48:02 2005 Return-Path: X-Original-To: freebsd-current@FreeBSD.ORG Delivered-To: freebsd-current@FreeBSD.ORG Received: by hub.freebsd.org (Postfix, from userid 618) id 428F516A422; Mon, 28 Nov 2005 15:48:02 +0000 (GMT) In-Reply-To: <20051128100731.GA9802@cs.rmit.edu.au> from Emil Mikulic at "Nov 28, 2005 09:07:32 pm" To: emil@cs.rmit.edu.au (Emil Mikulic) Date: Mon, 28 Nov 2005 15:48:02 +0000 (GMT) X-Mailer: ELM [version 2.4ME+ PL54 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-Id: <20051128154802.428F516A422@hub.freebsd.org> From: wpaul@FreeBSD.ORG (Bill Paul) Cc: freebsd-current@FreeBSD.ORG, glebius@FreeBSD.ORG Subject: Re: bge driver autoneg failure and system-wide stalls X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Nov 2005 15:48:02 -0000 > On Mon, Nov 28, 2005 at 09:17:58AM +0000, Bill Paul wrote: > > > On Fri, Nov 25, 2005 at 04:22:28PM +0300, Gleb Smirnoff wrote: > > > > On Fri, Nov 25, 2005 at 01:20:41PM +1100, Emil Mikulic wrote: > > In your original e-mail, you write: > > > > > I have a network port with bad wiring in the walls - a cable tester > > > shows only wires 1,2,3 and 6 are actually connected. > > > > Actually, this is not 'bad' wiring. It's correct for 10/100 ethernet > > That was probably the thinking when the office got wired up initially. > > > as long as a) the cabling is actually cat5, and not moldy old cat3 > > or something, and b) the four wires are actually connected in the right > > sequence. Pins 1 and 2 form one pair, and pins 3 and 6 form the second > > pair. A typical installation may have the orange/orange+white pair > > on pins 1 and 2, and the blue/blue+white pair on 3 and 6. And both > > sides must match. If it's not done this way, then while you may have > > a DC path between all 4 pins on each side, you won't be getting the > > proper noise cancellation effect of twisted pair cabling. This can > > cause signal distortion, dropped packets, and possibly botched autoneg. > > > > You didn't say if you checked for this though, so we can't speculate > > if this is really the problem. > > I can't see what's in the walls but I attached the two parts of the > cable tester to both ends of the line and 1-2-3-6 on one end was 1-2-3-6 > on the other end. It sounds correct then. > > A couple things you neglected to mention (and which Gleb failed to > > ask you about): > > (yeah, sorry about that. I spent like an hour with the original > message in a text editor then after I sent it I kept remembering things > I'd forgotten) > > > - Exactly what kind of switch is on the other end of this wiring? > > The switch is managed, and does gigabit. It's a Nortel but I don't > know the exact model off by heart (I can look it up tomorrow if it > matters) It may, only if that helps determine whether the management API lets you turn off the ability to announce gigE support. > > - Is the port that corresponds to this wall jack a gigabit ethernet > > port, or just 10/100? > > All the wiring in the walls is, as you said, for 100Mbit. The switch > and my network card are gigabit. Okay, that's what I suspected. > > If it is a gigE port, then you're being silly. > > Yeah, I thought so. =( > > > 4 pairs are required for gigE. Period. The NWAY autonegotiation > > exchange can take place over just 2 pairs, but the gigE signalling > > scheme requires all 4 pairs to be present in order to establish a > > link. If there's just two pairs connected, both sides will can > > announce that they support gigabit speeds, and both sides will try > > configuring themselves for gigE operation, but no link will ever be > > established. > > I understand that gigabit will never work over the current wiring. > I've accepted that and moved on. =) > > But are you saying the autonegotiation will never work over the current > wiring either? As in, it can't try a slower link speed? Well, that's just it: autoneg is working just fine. Both link partners are happily autonegotiating a gigabit link -- a gigabit link which will never be established thanks to the two missing pairs. So both sides are endlessly retrying over and over without getting anywhere. The autoneg behavior is for both sides to announce all the modes they support, and for both to compare what the link partner supports versus what it supports itself, and choose the best common mode. The selection algorithm is to try 1000Mbps first, then 100 full duplex, then 100 half duplex, then 10 full duplex, and finally 10 half duplex. Since both sides claim they support 1000Mbps, that's the mode they'll try to use. Unfortunately, when you do 'ifconfig bge0 media autoselect' on FreeBSD, the interface will announce that it supports all three speeds (10, 100 and 1000). There's no ifconfig option to make it announce just 10 and 100, which is what you'd need to do since those are the only modes which will work on your existing cabling. And the switch will do the same. So both sides will think they should be trying for the best possible mode, which is 1000Mbps. But that mode won't work since those two pairs aren't available. So somehow, you have to convince one side or the other to only announce 10/100. If the switch has a management option to turn off gigE mode on one of its ports, then that would be one way. If not, you can try modifying brgphy.c as follows: - Find the brgphy_mii_phy_auto() routine. - It looks like this: int ktcr = 0; brgphy_loop(mii); brgphy_reset(mii); ktcr = BRGPHY_1000CTL_AFD|BRGPHY_1000CTL_AHD; if (brgphy_mii_model == MII_MODEL_xxBROADCOM_BCM5701) ktcr |= BRGPHY_1000CTL_MSE|BRGPHY_1000CTL_MSC; PHY_WRITE(mii, BRGPHY_MII_1000CTL, ktcr); ktcr = PHY_READ(mii, BRGPHY_MII_1000CTL); DELAY(1000); PHY_WRITE(mii, BRGPHY_MII_ANAR, BMSR_MEDIA_TO_ANAR(mii->mii_capabilities) | ANAR_CSMA); DELAY(1000); PHY_WRITE(mii, BRGPHY_MII_BMCR, BRGPHY_BMCR_AUTOEN | BRGPHY_BMCR_STARTNEG); PHY_WRITE(mii, BRGPHY_MII_IMR, 0xFF00); return (EJUSTRETURN); - Make it look like this: int ktcr = 0; #define ANNOUNCE_10_100_ONLY brgphy_loop(mii); brgphy_reset(mii); #ifndef ANNOUNCE_10_100_ONLY ktcr = BRGPHY_1000CTL_AFD|BRGPHY_1000CTL_AHD; if (brgphy_mii_model == MII_MODEL_xxBROADCOM_BCM5701) ktcr |= BRGPHY_1000CTL_MSE|BRGPHY_1000CTL_MSC; #endif PHY_WRITE(mii, BRGPHY_MII_1000CTL, ktcr); ktcr = PHY_READ(mii, BRGPHY_MII_1000CTL); DELAY(1000); PHY_WRITE(mii, BRGPHY_MII_ANAR, BMSR_MEDIA_TO_ANAR(mii->mii_capabilities) | ANAR_CSMA); DELAY(1000); PHY_WRITE(mii, BRGPHY_MII_BMCR, BRGPHY_BMCR_AUTOEN | BRGPHY_BMCR_STARTNEG); PHY_WRITE(mii, BRGPHY_MII_IMR, 0xFF00); return (EJUSTRETURN); This should force it to announce that it only supports 10/100 modes. In theory, this should get you a 100Mbps full duplex link with the switch. When you actually get gigE cabling, remove the '#define ANNOUNCE_10_100_ONLY' line. > > If you manually override the autonegotiation in this case, you should > > do "ifconfig bge0 media 100baseTX" only. Do not specify full duplex. > > This won't work. > > > > [...] > > > > If you manually specify full, this will create a duplex mismatch, and > > you'll get rotten throughput. > > 10baseT/UTP and 100baseTX worked in both half-duplex and full-duplex > but I didn't check throughput... > > Important bit: > > > Also, the DELAY(10) here can probably be replaced with a tsleep() or > > something, which will allow the CPU to do other work while waiting for > > the PHY instead of hard busywaiting and blocking up the whole system > > (allowing a reschedule here should not hurt). > > This would be cool! > > I realise that I am doing stupid things with broken wiring and that it > won't work, but if the "periodic lockup" problem could be fixed, that > would be an improvement to the bge(4) driver, IMO. > > When I first noticed this symptom, it took quite a while to go from > "FreeBSD is doing this annoying lockup thing every few seconds and I > keep losing keypresses" to "oh, it's the network card / cabling / > driver" The above change should get you a link and will prevent the periodic lockup from occuring. I need to actually test the tsleep() change before I can give you a patch to try. -Bill -- ============================================================================= -Bill Paul (510) 749-2329 | Senior Engineer, Master of Unix-Fu wpaul@windriver.com | Wind River Systems ============================================================================= you're just BEGGING to face the moose =============================================================================