Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Aug 1999 08:46:20 +0100
From:      Brian Somers <brian@FreeBSD.org.uk>
To:        Rowan Crowe <rowan@sensation.net.au>
Cc:        freebsd-isp@FreeBSD.ORG
Subject:   Re: ppp + multilink problem - followup 
Message-ID:  <199908100746.IAA00773@keep.lan.Awfulhak.org>
In-Reply-To: Your message of "Tue, 10 Aug 1999 00:32:21 %2B1000." <Pine.BSF.4.01.9908100014430.16934-100000@velvet.sensation.net.au> 

next in thread | previous in thread | raw e-mail | index | archive | help
> Just as a followup here's the relevant part of ppp.log on the server and
> my commentary...
> 
> Aug 10 00:14:22 core1-mel ppp[29387]: Phase: Using interface: tun0
> Aug 10 00:14:22 core1-mel ppp[29387]: Phase: deflink: Created in closed state
> Aug 10 00:14:22 core1-mel ppp[29213]: Phase: Using interface: tun1
> Aug 10 00:14:22 core1-mel ppp[29387]: Phase: bundle: Establish
> Aug 10 00:14:22 core1-mel ppp[29213]: Phase: deflink: Created in closed state
> Aug 10 00:14:22 core1-mel ppp[29387]: Phase: deflink: closed -> opening
> Aug 10 00:14:22 core1-mel ppp[29387]: Phase: 1: Cloned in closed state
> Aug 10 00:14:22 core1-mel ppp[29387]: Phase: 2: Cloned in closed state
> Aug 10 00:14:22 core1-mel ppp[29387]: Warning: remove: Cannot delete links that aren't closed
> Aug 10 00:14:22 core1-mel ppp[29387]: Warning: link deflink remove: Failed 2
> 
> ...is this a warning I can ignore or an indication of a problem?

It's as it says.  When ppp starts in -direct mode, it uses descriptor 
0 as the incoming link.  The default name is ``deflink''.  You're 
trying to delete this link, but it's open so ppp refuses.

You shouldn't be trying to delete it - and you shouldn't be cloning 
links wither unless you're going to use them.  This'll stop ppp from 
exiting....

> Aug 10 00:14:22 core1-mel ppp[29387]: Phase: PPP Started (direct mode).
> Aug 10 00:14:22 core1-mel ppp[29213]: Phase: bundle: Establish
> Aug 10 00:14:22 core1-mel ppp[29213]: Phase: deflink: closed -> opening
> Aug 10 00:14:22 core1-mel ppp[29213]: Phase: 1: Cloned in closed state
> Aug 10 00:14:22 core1-mel ppp[29213]: Phase: 2: Cloned in closed state
> Aug 10 00:14:22 core1-mel ppp[29213]: Warning: remove: Cannot delete links that aren't closed
> Aug 10 00:14:22 core1-mel ppp[29213]: Warning: link deflink remove: Failed 2
> Aug 10 00:14:22 core1-mel ppp[29213]: Phase: PPP Started (direct mode).
> Aug 10 00:14:23 core1-mel ppp[29387]: Phase: deflink: Connected!
> Aug 10 00:14:23 core1-mel ppp[29387]: Phase: deflink: opening -> lcp
> Aug 10 00:14:23 core1-mel ppp[29213]: Phase: deflink: Connected!
> Aug 10 00:14:23 core1-mel ppp[29213]: Phase: deflink: opening -> lcp
> Aug 10 00:14:23 core1-mel ppp[29387]: Phase: mp: Listening on /var/run/ppp--01-6d70
> Aug 10 00:14:23 core1-mel ppp[29387]: Phase:     First link: deflink
> Aug 10 00:14:23 core1-mel ppp[29387]: Phase: deflink: lcp -> open
> Aug 10 00:14:23 core1-mel ppp[29387]: Phase: bundle: Network
> Aug 10 00:14:24 core1-mel ppp[29213]: Phase: mp: Transfer link on /var/run/ppp--01-6d70
> Aug 10 00:14:24 core1-mel ppp[29387]: Phase: Receiving datalink
> Aug 10 00:14:24 core1-mel ppp[29213]: Phase: Transmitting datalink deflink
> Aug 10 00:14:24 core1-mel ppp[29213]: Phase: bundle: Dead
> Aug 10 00:14:24 core1-mel ppp[29387]: Phase: Rename link deflink to deflink-1
> Aug 10 00:14:24 core1-mel ppp[29387]: Phase: deflink-1: Transferred in lcp state
> Aug 10 00:14:24 core1-mel ppp[29387]: Phase: deflink-1: lcp -> open
> Aug 10 00:14:24 core1-mel ppp[29866]: Phase: 29213 -> 29866: Passed session control
> 
> Multilink successfully established, the 2 PIDs of ppp are 29387 and 29866

What really happened:

pid 29213 transferred it's link to pid 29387.  29387 now does all the 
work.  Normally, if you hadn't cloned things, ppp would now change 
it's process name ``session owner'' and wait to be killed by 29387 
(when it's finished with the link).

However, because 29213 still owns some other links (the ones you 
cloned), it fork()s and 29213 becomes the session owner, but the new 
pid (29866) continues to run with the cloned links.... after all, 
these could subsequently be changed to auto links and another 
connection could be established etc etc.

> *pulls out cable of one of the modems at the client end*
> 
> The client end has detected the carrier drop after a few seconds, but the
> server end is still sitting there doing nothing. According to 'who' the
> modem is still connected at the server end.

What's supposed to happen under these circumstances:

  29387 does a TIOCMGET 1 second later and notices that the link has no 
  carrier.  It does the equivalent of a ``down'' and realizes that the 
  link has a process id associated with it.  It sends a signal to 29213.

  29213 exits allowing the getty to be resurrected by init on the link. 
  Because the link is type ``direct'', 29387 automatically removes the 
  link.

  The end result:  There's a getty running, 29387 knows nothing about 
  the link and 29866 continues to fester.

> As soon as the first link drops all IP traffic stops, and it will not
> resume until both the stray PPP processes at the server are killed off and
> the entire thing is redialled.

This is because ppp still thinks there are two links and it's queued 
data on the dead link.  Until this data arrives at the other end for 
reassembly (which it never will), the data stream will be blocked.

This is the crux of the prolem - if we ignore the stray process 
that's the result of the ``clone''.

I've asked over and over that you enable debug logging so that we can 
see if ppp thinks that the link supports carrier.  It would also be 
nice if you could enable a diagnostic port (``set server /tmp/tun%d "" 
0177'' would be good) and connect to the relevant process and do a 
``link deflink-1 show physical''.  It should say near the top if 
carrier was found on the link.

> After a few minutes the lack of LQR replies causes this...
> 
> Aug 10 00:19:57 core1-mel ppp[29387]: Phase: deflink-1: ** Too many LQR packets lost **
> Aug 10 00:19:57 core1-mel ppp[29387]: Phase: deflink-1: open -> lcp
> Aug 10 00:19:57 core1-mel ppp[29387]: Phase: deflink-1: Disconnected!
> Aug 10 00:19:57 core1-mel ppp[29387]: Phase: deflink-1: Connect time: 333 secs: 1075 octets in, 781 octets out
> Aug 10 00:19:57 core1-mel ppp[29387]: Phase:  total 5 bytes/sec, peak 43 bytes/sec on Tue Aug 10 00:19:57 1999
> Aug 10 00:19:57 core1-mel ppp[29387]: Phase: deflink-1: lcp -> closed
> 
> ppp pid 29387 has detected the link is down, but it's still running:
> 
> 29387  a1  Ss+    0:01.39 /usr/sbin/ppp -direct mp

29387 should still has the original link that it was started with - 
that wasn't disconnected was it ?

> I'm wondering why it doesn't exit, am I doing something wrong in the
> config? Perhaps it's sitting there in passive mode waiting for a new
> connection to start? (even though it never will as long as it's attached
> to cuaa1 which is now an offline modem)
> 
> If it helps this is what top says about 29387...
> 
>   PID USERNAME PRI NICE SIZE    RES STATE    TIME   WCPU    CPU COMMAND
> 29387 sbruns    2   0   852K  1368K select   0:02  0.00%  0.00% ppp
> 
> Thank you all for your suggestions so far. By the way, this is the ppp
> that came with 3.2-RELEASE (recompiled on the 2.2.8-R machines) -
> unfortunately it didn't fix the problem.
> 
> BTW, why does the originating ppp occasionally do strange things like
> initiate a *new* dial 15-20 sec after I kill -TERM it? Is this the correct
> way to shut it down? (I just did it then and it's dialled about 4-6 times
> before exiting)

Depends on your link mode.  If you connect to the diagnostic socket 
(created with ``set server'' above) and do a ``show links'' or ``link 
* show physical'' it'll tell you the mode.  ``link * show datalink'' 
should tell you about redials and stuff.

> 
> Cheers.
> 
> 
> --
> Rowan Crowe                              http://www.rowan.sensation.net.au/
> Sensation Internet Services                    http://www.sensation.net.au/
> Melbourne, Australia                                 Phone: +61-3-9388-9260

-- 
Brian <brian@Awfulhak.org>                        <brian@FreeBSD.org>
      <http://www.Awfulhak.org>;                   <brian@OpenBSD.org>
Don't _EVER_ lose your sense of humour !          <brian@FreeBSD.org.uk>




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-isp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199908100746.IAA00773>