Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 5 May 2002 14:05:52 -0700 (PDT)
From:      Brett Glass <brett@lariat.org>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   bin/37777: A single lost packet on a (userland) PPP connection causes long-term disruption and a "storm" of requests and acknowledgements on the CCP layer
Message-ID:  <200205052105.g45L5qxP081138@nwww.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         37777
>Category:       bin
>Synopsis:       A single lost packet on a (userland) PPP connection causes long-term disruption and a "storm" of requests and acknowledgements on the CCP layer
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun May 05 14:10:02 PDT 2002
>Closed-Date:
>Last-Modified:
>Originator:     Brett Glass
>Release:        4.5-RELEASE-P4
>Organization:
LARIAT
>Environment:
All machines tested are using 4.5-RELEASE-P4.
>Description:
We are using PPPoE over a wireless connection which very occasionally
drops packets -- typically a few an hour. However, whenever a dropped
packet occurs, the link (which is using "deflate" compression) stalls
for long periods (sometimes seconds, sometimes tens of minutes). Turning
on logging of CCP (set log +ccp) reveals that, instead of resetting the
compression dictionary and gracefully continuing, the two nodes are sending a "storm" of redundant refresh requests and cascading errors. The problem does not seem to be limited to PPPoE or wireless communications, but rather crops up more obviously in these situations because the underlying transport is not reliable. (Modern modems virtually always correct errors unless the connection is dropped completely.) The same problem could crop up on a congested hardwired Ethernet. PPP does not assume a reliable transport, and so should be sufficiently resilient to deal with occasional lost packets.

When a packet is dropped and CCP logging is enabled, the PPP log shows a long stream of messages such as the following. (Note that the first indicates the dropped packet.) The cascade of errors can last for a long time. The glitch from which the messages below are an excerpt continued for more than 10 minutes. (Apologies for the long log excerpt, but it helps to show the severity and longevity of the outage and the thrashing which is occurring.)

May  5 13:19:42 <daemon.info> workhorse ppp[1067]: CCP: DeflateInput: Seq error: Got 397, expected 396
May  5 13:19:42 <daemon.info> workhorse ppp[1067]: CCP: deflink: SendResetReq(5) state = Opened
May  5 13:19:42 <daemon.info> workhorse ppp[1067]: CCP: deflink: SendResetReq(5) state = Opened
May  5 13:19:42 <daemon.info> workhorse ppp[1067]: CCP: deflink: RecvResetAck(5) state = Opened
May  5 13:19:42 <daemon.info> workhorse ppp[1067]: CCP: Deflate: Input channel reset
May  5 13:19:42 <daemon.info> workhorse ppp[1067]: CCP: deflink: RecvResetAck(5) state = Opened
May  5 13:19:42 <daemon.info> workhorse ppp[1067]: CCP: deflink: Duplicate ResetAck (resetting again)
May  5 13:19:42 <daemon.info> workhorse ppp[1067]: CCP: Deflate: Input channel reset
May  5 13:20:05 <daemon.info> workhorse ppp[1067]: CCP: deflink: RecvResetReq(187) state = Opened
May  5 13:20:05 <daemon.info> workhorse ppp[1067]: CCP: Deflate: Output channel reset
May  5 13:20:05 <daemon.info> workhorse ppp[1067]: CCP: deflink: SendResetAck(187) state = Opened
May  5 13:20:05 <daemon.info> workhorse ppp[1067]: CCP: deflink: RecvResetReq(187) state = Opened
May  5 13:20:05 <daemon.info> workhorse ppp[1067]: CCP: Deflate: Output channel reset
May  5 13:20:05 <daemon.info> workhorse ppp[1067]: CCP: deflink: SendResetAck(187) state = Opened
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: DeflateInput: Seq error: Got 383, expected 382
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: SendResetReq(6) state = Opened
May  5 13:20:19 <daemon.info> workhorse last message repeated 3 times
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: RecvResetReq(188) state = Opened
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: Deflate: Output channel reset
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: SendResetAck(188) state = Opened
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: SendResetReq(6) state = Opened
May  5 13:20:19 <daemon.info> workhorse last message repeated 5 times
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: RecvResetReq(188) state = Opened
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: Deflate: Output channel reset
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: SendResetAck(188) state = Opened
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: RecvResetAck(6) state = Opened
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: Deflate: Input channel reset
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: RecvResetAck(6) state = Opened
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: Duplicate ResetAck (resetting again)
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: Deflate: Input channel reset
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: RecvResetAck(6) state = Opened
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: Unexpected ResetAck (id 6) ignored
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: RecvResetAck(6) state = Opened
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: Unexpected ResetAck (id 6) ignored
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: RecvResetAck(6) state = Opened
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: Unexpected ResetAck (id 6) ignored
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: RecvResetAck(6) state = Opened
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: Unexpected ResetAck (id 6) ignored
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: RecvResetAck(6) state = Opened
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: Unexpected ResetAck (id 6) ignored
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: RecvResetAck(6) state = Opened
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: Unexpected ResetAck (id 6) ignored
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: RecvResetAck(6) state = Opened
May  5 13:20:19 <daemon.info> workhorse ppp[1067]: CCP: deflink: Unexpected ResetAck (id 6) ignored

[Snip]


>How-To-Repeat:
Establish a PPPoE or PPP-over-UDP link between two computers and disrupt the link for a brief period. (Alternatively, one may instrument the code so that a packet is blocked now and then on the way to an interface to achieve the same effect.)
>Fix:
None known at this time. At first, we theorized that the problem was specific to the (preferred) "deflate" compression algorithm. However, while connections established using the "predictor-1" algorithm were better behaved, sometimes showed some thrashing behavior as well, with multiple renegotiations after only a single dropped packet.

Turning off ALL compression (by disabling and denying all available algorithms) is a workaround, but a highly inefficent one. (One of our primary motivations for the use of PPPoE is to compress the data stream, of which approximately 30%, according to our measurements, is highly compressible HTML.) Our tests seem to indicate that the problem is most likely in the compression portion of the code -- perhaps the state machine(s) related to CCP. They may also suggest a need for a frame retransmission mechanism within PPP that does not require a complete reset of the compression layer.
>Release-Note:
>Audit-Trail:
>Unformatted:

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200205052105.g45L5qxP081138>