From owner-freebsd-net Fri Aug 4 14:32: 1 2000 Delivered-To: freebsd-net@freebsd.org Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [18.24.4.193]) by hub.freebsd.org (Postfix) with ESMTP id 69EDF37B96E for ; Fri, 4 Aug 2000 14:31:51 -0700 (PDT) (envelope-from wollman@khavrinen.lcs.mit.edu) Received: (from wollman@localhost) by khavrinen.lcs.mit.edu (8.9.3/8.9.3) id RAA33779; Fri, 4 Aug 2000 17:31:44 -0400 (EDT) (envelope-from wollman) Date: Fri, 4 Aug 2000 17:31:44 -0400 (EDT) From: Garrett Wollman Message-Id: <200008042131.RAA33779@khavrinen.lcs.mit.edu> To: bmah@cisco.com Cc: freebsd-net@FreeBSD.ORG Subject: Re: cvs commit: src/sys/netinet tcp_output.c In-Reply-To: <200008042107.e74L7Tu13995@bmah-freebsd-0.cisco.com> References: <200008042014.QAA05452@cholla.INRS-Telecom.UQuebec.CA> <200008042107.e74L7Tu13995@bmah-freebsd-0.cisco.com> Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org < A RED queue would drop a random packet from the full queue. Actually, no. A RED queue would drop a packet as it is enqueued, at random intervals which depend on the current length of the queue. Many people get this wrong when the idea is first explained to them; I know I did. Read the paper for a full explanation of how this works.[1] > If you can get those flows to back off (e.g. TCP congestion > control), Specifically, I think the problem with Archie's patch is that it might result in TCP not backing off at all. RED doesn't deal well with unresponsive flows. -GAWollman [1] In essence, when the average length x [2] exceeds threshold th_min, start dropping packets at probability proportional to (x - th_min). When x exceeds threshold th_max, drop all packets. There are good mathematical tricks to make this very efficient for packets which are not dropped.[3] [2] As computed using a low-pass filter (exponential weighted moving average). [3] Here is some code I wrote a long time ago, written along the lines suggested in the paper. Some of it is probably wrong, since I never finished testing it: static inline int if_enq_drop(struct ifqueue *ifq, struct mbuf *m) { int marked; u_int newlen, new_avglen; if (ifq->ifq_instlen) { newlen = ifq->ifq_instlen + 1; new_avglen = ifq->ifq_avg + ((newlen << (RED_AVG_SCALE - RED_W_Q_SHIFT)) - (ifq->ifq_avg >> RED_W_Q_SHIFT)); } else { newlen = 1; new_avglen = if_red_unidle(ifq); } if (new_avglen < RED_MIN_TH_SCALED) { ifq->ifq_red_count = -1; marked = 0; } else { marked = if_red_domark(new_avglen, ifq); } if (marked) { ifq->ifq_drops++; /* XXX DEBUG */ printf("%p: marked a packet, instlen %u, avglen %u\n", (void *)ifq, ifq->ifq_instlen, ifq->ifq_avg); /* XXX END DEBUG */ return 0; } m->m_nextpkt = 0; if (ifq->ifq_tail == 0) ifq->ifq_head = m; else ifq->ifq_tail->m_nextpkt = m; ifq->ifq_tail = m; ifq->ifq_instlen = newlen; if (newlen > ifq->ifq_max_instlen) ifq->ifq_max_instlen = newlen; ifq->ifq_avg = new_avglen; return 1; } /* * We are called knowing that avglen is already over the threshold. * NB: the calculations involving R_OVER_PB are a bit suspect. XXX * should test. */ int if_red_domark(avglen, ifq) u_int avglen; struct ifqueue *ifq; { u_int mark_prob; int mark_shift; int marked = 0; if (avglen < RED_MAX_TH_SCALED) { ifq->ifq_red_count++; mark_prob = (ifq->ifq_avg >> RED_PROB_C_1_SHIFT) - RED_PROB_C_2_SCALED; /* * We want to quickly approximate R/p_b, where * R is a fixed-point between 0 and 1 with scale shift * RED_RANDOM_SCALE, and p_b is a fixed-point between * 0 and 1 with scale shift RED_AVG_SCALE. * fls() gives us the highest-order bit set (numbered * 1(LSB) to 32(MSB)) or zero if no bits are set. This * -log2(p_b), but for the scale factor. We avoid using * a zero value directly to make things less complicated. */ if (mark_prob == 0) mark_prob = 1; /* smallest representable */ mark_shift = fls(mark_prob) - 1; /* * Now, mark_shift is in the range [0..RED_AVG_SCALE]; * log2(p_b) would be in the range [-RED_AVG_SCALE..0], * which is to say, it is mark_shift - RED_AVG_SCALE. * Still with me? C shifts only work in one direction, * so what we really need is -log2(p_b), which is thus * RED_AVG_SCALE - mark_shift. But, at the same time, * we also need to unscale R, so we subtract RED_RANDOM_SCALE * from that shift factor (to avoid overflow). That can * sometimes turn the sign negative, so we have to be able * to do two-way shifts anyway. */ mark_shift = RED_AVG_SCALE - mark_shift - RED_RANDOM_SCALE; #define R_OVER_PB(R) (int)(mark_shift > 0 ? \ (R) << mark_shift : (R) >> -mark_shift) if (ifq->ifq_red_count > 0 && ifq->ifq_red_count >= R_OVER_PB(ifq->ifq_random)) { marked = 1; ifq->ifq_red_count = 0; } if (ifq->ifq_red_count == 0) ifq->ifq_random = random() & RED_RANDOM_MASK; } else { marked = 1; ifq->ifq_red_count = -1; } return marked; } -- Garrett A. Wollman | O Siem / We are all family / O Siem / We're all the same wollman@lcs.mit.edu | O Siem / The fires of freedom Opinions not those of| Dance in the burning flame MIT, LCS, CRS, or NSA| - Susan Aglukark and Chad Irschick To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message