Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 22 Mar 2002 08:59:41 -0800 (PST)
From:      Todd Hayton <thayton@torrentnet.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/36209: read() system call never returns in some cases
Message-ID:  <200203221659.g2MGxfu43089@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         36209
>Category:       kern
>Synopsis:       read() system call never returns in some cases
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Mar 22 09:00:06 PST 2002
>Closed-Date:
>Last-Modified:
>Originator:     Todd Hayton
>Release:        FreeBSD 2.2.2
>Organization:
Ericsson IPI
>Environment:
>Description:
Essentially the problem is this - about 1 in every 100 copies using
scp (scp somefile user@remosthost:somefile), the entire machine will
hang. Pings will work for a short time, but you can't telnet into
the machine. Wait long enough and the pings will eventually fail as
well.

I started doing ktrace's of each scp, and the last system call to
get recorded is always read() and it's always a read on a file
descriptor created by pipe():

ktrace 1
--------
cognac> kdump -f scp_ktrace.out | tail -20
  3725 scp      CALL  old.sigaction(0xd,0x7fbfdd00,0x7fbfdcf4)
  3725 scp      RET   old.sigaction 0
  3725 scp      CALL  pipe
  3725 scp      RET   pipe 9
  3725 scp      CALL  pipe
  3725 scp      RET   pipe 11/0xb
  3725 scp      CALL  pipe
  3725 scp      RET   pipe 13/0xd
  3725 scp      CALL  close(0x9)
  3725 scp      RET   close 0
  3725 scp      CALL  close(0xa)
  3725 scp      RET   close 0
  3725 scp      CALL  fork
  3725 scp      RET   fork 3726/0xe8e
  3725 scp      CALL  close(0xb)
  3725 scp      RET   close 0
  3725 scp      CALL  close(0xe)
  3725 scp      RET   close 0
  3725 scp      CALL  read(0xd,0x7fbfdcbf,0x1)
  3725 scp      PSIG  SIGINT SIG_DFL

ktrace 2
--------
cognac> kdump -f scp_ktrace2.out | tail -20
  2456 scp      RET   chdir 0
  2456 scp      CALL  old.sigaction(0xd,0x7fbfdd30,0x7fbfdd24)
  2456 scp      RET   old.sigaction 0
  2456 scp      CALL  pipe
  2456 scp      RET   pipe 9
  2456 scp      CALL  pipe
  2456 scp      RET   pipe 11/0xb
  2456 scp      CALL  pipe
  2456 scp      RET   pipe 13/0xd
  2456 scp      CALL  close(0x9)
  2456 scp      RET   close 0
  2456 scp      CALL  close(0xa)
  2456 scp      RET   close 0
  2456 scp      CALL  fork
  2456 scp      RET   fork 2457/0x999
  2456 scp      CALL  close(0xb)
  2456 scp      RET   close 0
  2456 scp      CALL  close(0xe)
  2456 scp      RET   close 0
  2456 scp      CALL  read(0xd,0x7fbfdcef,0x1)

ktrace 3
--------
cognac> kdump -f scp_ktrace3.out | tail -20
   727 scp      RET   chdir 0
   727 scp      CALL  old.sigaction(0xd,0x7fbfdd30,0x7fbfdd24)
   727 scp      RET   old.sigaction 0
   727 scp      CALL  pipe
   727 scp      RET   pipe 9
   727 scp      CALL  pipe
   727 scp      RET   pipe 11/0xb
   727 scp      CALL  pipe
   727 scp      RET   pipe 13/0xd
   727 scp      CALL  close(0x9)
   727 scp      RET   close 0
   727 scp      CALL  close(0xa)
   727 scp      RET   close 0
   727 scp      CALL  fork
   727 scp      RET   fork 728/0x2d8
   727 scp      CALL  close(0xb)
   727 scp      RET   close 0
   727 scp      CALL  close(0xe)
   727 scp      RET   close 0
   727 scp      CALL  read(0xd,0x7fbfdcef,0x1)

We're using F-secure's version of SSH. I looked through the
code and it looks as if SSH actually gets some random noise from
the output of commands: it forks() off some children to run some
commands, and then uses the pipes to read the results of the commands
from the child back into the parent. It's these reads that seem to 
never return (and eventually hang the system).

I glanced around on the web and have seen references to pipe_read()
and pipe_write() containing race conditions:

http://www.geocrawler.com/mail/msg.php3?msg_id=2172760&list=159

So, I'm wondering if this is a manifestation of that...the fact that
it takes so many repeated attempts for the lockup to occur would
seem to be characteristic of a race condition...
>How-To-Repeat:
      
>Fix:
      
>Release-Note:
>Audit-Trail:
>Unformatted:

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200203221659.g2MGxfu43089>