From owner-freebsd-net  Fri Aug  4 14:32: 1 2000
Delivered-To: freebsd-net@freebsd.org
Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [18.24.4.193])
	by hub.freebsd.org (Postfix) with ESMTP id 69EDF37B96E
	for <freebsd-net@FreeBSD.ORG>; Fri,  4 Aug 2000 14:31:51 -0700 (PDT)
	(envelope-from wollman@khavrinen.lcs.mit.edu)
Received: (from wollman@localhost)
	by khavrinen.lcs.mit.edu (8.9.3/8.9.3) id RAA33779;
	Fri, 4 Aug 2000 17:31:44 -0400 (EDT)
	(envelope-from wollman)
Date: Fri, 4 Aug 2000 17:31:44 -0400 (EDT)
From: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
Message-Id: <200008042131.RAA33779@khavrinen.lcs.mit.edu>
To: bmah@cisco.com
Cc: freebsd-net@FreeBSD.ORG
Subject: Re: cvs commit: src/sys/netinet tcp_output.c 
In-Reply-To: <200008042107.e74L7Tu13995@bmah-freebsd-0.cisco.com>
References: <200008042014.QAA05452@cholla.INRS-Telecom.UQuebec.CA>
	<200008042107.e74L7Tu13995@bmah-freebsd-0.cisco.com>
Sender: owner-freebsd-net@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

<<On Fri, 04 Aug 2000 14:07:29 -0700, bmah@cisco.com (Bruce A. Mah) said:

> A RED queue would drop a random packet from the full queue.

Actually, no.  A RED queue would drop a packet as it is enqueued, at
random intervals which depend on the current length of the queue.
Many people get this wrong when the idea is first explained to them; I
know I did.  Read the paper for a full explanation of how this works.[1]

> If you can get those flows to back off (e.g. TCP congestion
> control),

Specifically, I think the problem with Archie's patch is that it might
result in TCP not backing off at all.  RED doesn't deal well with
unresponsive flows.

-GAWollman

[1] In essence, when the average length x [2] exceeds threshold
th_min, start dropping packets at probability proportional to (x -
th_min).  When x exceeds threshold th_max, drop all packets.  There
are good mathematical tricks to make this very efficient for packets
which are not dropped.[3]

[2] As computed using a low-pass filter (exponential weighted moving
average).

[3] Here is some code I wrote a long time ago, written along the lines
suggested in the paper.  Some of it is probably wrong, since I never
finished testing it:

static inline int
if_enq_drop(struct ifqueue *ifq, struct mbuf *m)
{
        int marked;
        u_int newlen, new_avglen;

        if (ifq->ifq_instlen) {
                newlen = ifq->ifq_instlen + 1;
                new_avglen = ifq->ifq_avg +
                        ((newlen << (RED_AVG_SCALE - RED_W_Q_SHIFT))
                         - (ifq->ifq_avg >> RED_W_Q_SHIFT));
        } else {
                newlen = 1;
                new_avglen = if_red_unidle(ifq);
        }

        if (new_avglen < RED_MIN_TH_SCALED) {
                ifq->ifq_red_count = -1;
                marked = 0;
        } else {
                marked = if_red_domark(new_avglen, ifq);
        }
        if (marked) {
                ifq->ifq_drops++;
/* XXX DEBUG */
                printf("%p: marked a packet, instlen %u, avglen %u\n",
                       (void *)ifq,
                       ifq->ifq_instlen, ifq->ifq_avg);
/* XXX END DEBUG */
                return 0;
        }
        m->m_nextpkt = 0;
        if (ifq->ifq_tail == 0)
                ifq->ifq_head = m;
        else
                ifq->ifq_tail->m_nextpkt = m;
        ifq->ifq_tail = m;
        ifq->ifq_instlen = newlen;
        if (newlen > ifq->ifq_max_instlen)
                ifq->ifq_max_instlen = newlen;
        ifq->ifq_avg = new_avglen;
        return 1;
}

/*
 * We are called knowing that avglen is already over the threshold.
 * NB: the calculations involving R_OVER_PB are a bit suspect.  XXX
 * should test.
 */
int
if_red_domark(avglen, ifq)
        u_int avglen;
        struct ifqueue *ifq;
{
        u_int mark_prob;
        int mark_shift;
        int marked = 0;

        if (avglen < RED_MAX_TH_SCALED) {
                ifq->ifq_red_count++;
                mark_prob = (ifq->ifq_avg >> RED_PROB_C_1_SHIFT)
                                - RED_PROB_C_2_SCALED;
                /*
                 * We want to quickly approximate R/p_b, where
                 * R is a fixed-point between 0 and 1 with scale shift
                 * RED_RANDOM_SCALE, and p_b is a fixed-point between
                 * 0 and 1 with scale shift RED_AVG_SCALE.
                 * fls() gives us the highest-order bit set (numbered
                 * 1(LSB) to 32(MSB)) or zero if no bits are set.  This
                 * -log2(p_b), but for the scale factor.  We avoid using
                 * a zero value directly to make things less complicated.
                 */
                if (mark_prob == 0) mark_prob = 1; /* smallest representable */
                mark_shift = fls(mark_prob) - 1;
                /*
                 * Now, mark_shift is in the range [0..RED_AVG_SCALE];
                 * log2(p_b) would be in the range [-RED_AVG_SCALE..0],
                 * which is to say, it is mark_shift - RED_AVG_SCALE.
                 * Still with me?  C shifts only work in one direction,
                 * so what we really need is -log2(p_b), which is thus
                 * RED_AVG_SCALE - mark_shift.  But, at the same time,
                 * we also need to unscale R, so we subtract RED_RANDOM_SCALE
                 * from that shift factor (to avoid overflow).  That can
                 * sometimes turn the sign negative, so we have to be able
                 * to do two-way shifts anyway.
                 */
                mark_shift = RED_AVG_SCALE - mark_shift - RED_RANDOM_SCALE;

#define R_OVER_PB(R)    (int)(mark_shift > 0 ? \
                              (R) << mark_shift : (R) >> -mark_shift)
                if (ifq->ifq_red_count > 0 
                    && ifq->ifq_red_count >= R_OVER_PB(ifq->ifq_random)) {
                        marked = 1;
                        ifq->ifq_red_count = 0;
                }
                if (ifq->ifq_red_count == 0)
                        ifq->ifq_random = random() & RED_RANDOM_MASK;
        } else {
                marked = 1;
                ifq->ifq_red_count = -1;
        }

        return marked;
}



--
Garrett A. Wollman   | O Siem / We are all family / O Siem / We're all the same
wollman@lcs.mit.edu  | O Siem / The fires of freedom 
Opinions not those of| Dance in the burning flame
MIT, LCS, CRS, or NSA|                     - Susan Aglukark and Chad Irschick


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message