Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 21 Mar 1999 23:40:20 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        tlambert@primenet.com, hasty@rah.star-gate.com, wes@softweyr.com, ckempf@enigami.com, wpaul@skynet.ctr.columbia.edu, freebsd-hackers@FreeBSD.ORG
Subject:   Re: Gigabit ethernet -- what am I doing wrong?
Message-ID:  <199903220740.XAA16463@apollo.backplane.com>
References:   <199903220545.WAA10719@usr01.primenet.com>

next in thread | previous in thread | raw e-mail | index | archive | help
:What we're talking about here is overloading the equipment, and then
:having it fail in such a way that everyone who is loading it takes
:the hit "fairly".
:
:...
:
:>     head-of-queue blocking, a combination of software on the gigaswitch
:>     and the way the gigabit switch's hardware queues packets for transfer
:>     between cards. 
:
:Sounds like they failed to implement QOS mechanisms and source quench
:properly.  My general response to technology failures is that there is
:a responsible human, somewhere.  I know that they had two gigaswitches
:at one point in time, and it's obvious from a technical point of view
:that two gigaswitches are worse than one gigaswitch.

    The MAE-WEST gigaswitch failure has nothing to do with QOS or 
    source quench.  There is no such thing as source-quench on a
    FDDI/T3 switch.  Nobody in their right mind uses source quench
    in a router or switch matrix.

:>     The problem at MAE-WEST had absolutely nothing to do with this.  The
:>     problem occured *inside* a *single* switch.  If one port overloaded on
:>     that switch, all the ports started having problems due to head-of-line
:>     blocking.
:
:Look.  You can only shove as many bits down a pipe as the pipe will
:take.  If it's one port that's killing you, then you start dropping
:packets to and from that port, and punish the port.
:
:While there were humans engaged in overcommit involved, I really have
:a hard time understanding a design that would allow humans doing what
:humans would obvious do, given the circumstances, to cause problems.
:
:If the thing can't handle N/2 ports running at some speed X on each
:port, then the ports shouldn't be run at speed X.

    The overcommit problem is not trivially solved when the blockage runs
    at the hardware level, because problems can occur without the router
    actually overcommitting the destination card's buffer.

    The scheduling problem relates heavily to avoiding DMA blockages
    in switch matrixes.  DMA blockages occur even with full cross bars
    and it is not a problem that can be solved by bumping up switch
    matrix performance.  The problem occurs when several source cards
    attempt to DMA packets to the same destination card.  Even if the
    destination card has sufficient buffer memory to hold the packets,
    and even if no overcommit ( time wise ) would occur, most cards cannot
    typically handle the bandwidth of several ( > 1 ) other cards sending 
    to it simultaniously.  In fact, most switch matrices can only route one
    source to any given destination at a time -- the parallelism occurs
    because the switch matrix can route several sources to several different
    destinations simultaniously, not route several sources to the same
    destination simultaniously.  

    This creates a situation where a source card can block in DMA.  It is
    statistically possible for packets to be arranged such that switch
    performance is seriously degraded, increasing latency significantly
    ( even beyond what might be considered acceptable ) *EVEN* when buffer
    space is available.

    Switch scheduling is required to avoid this problem -- to prevent multiple
    sources from trying to DMA packets to the same destination at the same
    time in the first place and instead using those time slices to DMA
    packets going to other destinations.

    Buffer management requires scheduling too.  Not only that, it requires
    dynamic queue sizing on the source card, because it is on the source
    card where dropping a packet ( prior to the packet traversing the switch
    matrix and being enqueued in the destination buffer ) yields the best
    fairness when a destination is overcommitted.  Unfortunately, the best
    place to drop a packet when the source and destination speeds are
    mismatched is on the destination buffer queue.  

    The scheduler must deal with these two clashing problems as well in order
    to both speed-match the ports AND to properly drop packets from the correct
    source(s) when the destination buffer overcommitted.  The scheduler in
    a real router/switch must also deal with hardware DMA conflicts 
    ( blockages ), and stabilizing buffer latency under a wide range of
    load conditions and port<->port combinations.

:>     The solution to this at MAE-WEST was to clamp down on the idiots who
:>     were selling transit at MAE-WEST and overcommitting their ports, plus
:
:With respect, technology should operate in the absence of human
:imposition of policy.  It should have been technically impossible
:for the idiots to successfully engage in the behaviour in the first
:place, and if it wasn't, then that's a design problem with the
:gigaswitches.

    With respect, you are assuming that the problems can be solved trivially. 
    These are *NOT* trivial problems.  Very not trivial problems.  Not even
    *CLOSE* to trivial problems.  I can't repeat this enough times.


:					Terry Lambert
:					terry@lambert.org

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199903220740.XAA16463>