Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 6 Jan 1999 17:35:14 -0500 (EST)
From:      Alfred Perlstein <bright@hotjobs.com>
To:        "C. Stephen Gunn" <csg@physics.purdue.edu>
Cc:        freebsd-hackers@FreeBSD.ORG, ajk@physics.purdue.edu, crh@physics.purdue.edu, jonsmith@physics.purdue.edu, bp@physics.purdue.edu, ab@eas.purdue.edu
Subject:   Re: NFS problems under 3.0-RELEASE
Message-ID:  <Pine.BSF.4.05.9901061732110.37756-100000@bright.fx.genx.net>
In-Reply-To: <199901062142.QAA13257@galileo.physics.purdue.edu>

next in thread | previous in thread | raw e-mail | index | archive | help

On Wed, 6 Jan 1999, C. Stephen Gunn wrote:

> 
> We've been experiencing crashes/hangs with NFS here recently, and
> finally got the chance to try and debug it today, here's what we
> know so far:
> 
>  1) It happens during NFS writes. In our specific case, it happens
>     when writing files to my home directory automounted off my
>     machine when I roam to other workstations in our department.
> 
>  2) If you increase the frequency of the writes, you can make it
>     crash (actually hang) easier.  While our method of choice was
>     to run Netscape (the DB file I/O kills it usually) I've had
>     it hang a couple of time when writing files with vi, or sending
>     mail.  Again, it only pertains to NFS.
> 
> I finally got the chance today to install a DDB kernel, and get a
> dump/backtrace after the system hung.  The backtrace showed that
> the kernel was in the middle of an nfs_vinvalbuf() call which in
> turn called vinvalbuf().
> 
> Here's the deal, the backtrace showed a valid parameters for the
> call to vinvalbuf() from inside nfs_vinvalbuf(), but somehow the
> vnode parameter to vinvalbuf() apparently got smashed.
> 
> We'd hang in the middle of the tsleep() loop at the beginning of
> vinvalbuf() since we weren't paying attention to tsleep()'s error
> code of ERESTART.  I merged 2-3 lines from current to check the
> error, and the hangs go away.
> 
> This still doesn't address the problem though.  Someone is smashing
> this vnode pointer, usually with 0x0100, as far as I can tell.
> I've not had the time to digest all of the nfs/vfs changes that
> have happened since 3.0 release, but the CVS logs didn't seem to
> indicate changes that might address this.

Yes, the hang was fixed, i haven't seen any data corruption here or
crashes since the fix went in.  Are you sure your debugging was correct or
that the fix you applied didn't fix the vnode smashing?

You may still experiance hangs if a program is paged out and recieves a
signal while trying to page in off of NFS.  I'm unsure if this is fixed.

Alfred Perlstein - Programmer, HotJobs Inc. - www.hotjobs.com
-- There are operating systems, and then there's FreeBSD.
-- http://www.freebsd.org/                        3.0-current

> At least it doesn't crash now, but it's probably a bug thats still
> out there.
> 
>  - Steve
> 
> --
> C. Stephen Gunn, Computer Systems Engineer         <csg@physics.purdue.edu>
> Physics Computer Network, Purdue University    
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-hackers" in the body of the message
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.05.9901061732110.37756-100000>