Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 5 Nov 1998 09:55:46 -0500 (EST)
From:      stark@FreeBSD.ORG
To:        FreeBSD-gnats-submit@FreeBSD.ORG
Subject:   kern/8580: Hanging NFS pagein in nfs_bio.c (2.2.7, workaround patch)
Message-ID:  <199811051455.JAA27573@starkhome.cs.sunysb.edu>

next in thread | raw e-mail | index | archive | help

>Number:         8580
>Category:       kern
>Synopsis:       Hanging NFS pagein in nfs_bio.c (2.2.7, workaround patch)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:
>Keywords:
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Nov  5 07:20:00 PST 1998
>Last-Modified:
>Originator:     Gene Stark
>Organization:
SUNY at Stony Brook CS Dept.
>Release:        FreeBSD 2.2.7-RELEASE i386
>Environment:

	FreeBSD 2.2.7-RELEASE on a system with user files located
	on an NFS server.  Filesystem buffer size 8192.

>Description:

	A race condition involving ptrace and NFS causes the system
	to loop in nfs_bioread() while attempting to page in text segment.

>How-To-Repeat:

	Run a certain program under gdb
	Interrupt the program by ^C
	Install a breakpoint
	Re-run the program (so that it is killed and restarts
		right away)
	System loops in kernel trying to page in (over NFS)
		the text page that has the breakpoint in it.

The bug is timing-dependent -- I can reproduce the hang a certain
fraction of the time, but it seems to require other system load
to be present.  To get a core dump I had to turn on RB_NOSYNC,
because otherwise the system hangs syncing the FS.

The looping occurs between "again:" and the "goto again" in the
following portion of nfs_bioread():

		if (not_readin && n > 0) {
			if (on < bp->b_validoff || (on + n) > bp->b_validend) {
				bp->b_flags |= B_NOCACHE;
				if (bp->b_dirtyend > 0) {
				    if ((bp->b_flags & B_DELWRI) == 0)
					panic("nfsbioread");
				    if (VOP_BWRITE(bp) == EINTR)
					return (EINTR);
				} else
				    brelse(bp);
				goto again;
			}
		}

The system hangs because it is spinning here -- maybe for general
robustness the CPU should be yielded if possible in this loop.

The reason for the looping is that the desired page of bp is outside
the valid range.  Stepping through the code with DDB (below), shows
that each time around the loop, the buffer bp is not in the cache,
and is getting reconstituted.  I am using a bsize of 8192, so there
are two pages per buffer.  The second page is the one that is desired.
Examining the VM pages in the buffer shows the first one valid,
but the second one invalid, with m->busy == 1, but with PG_BUSY not
set.  Thus, it seems that pagein is supposed to be occurring.
Either the pagein never got started, or else somehow it's completion
was missed or forgotten.

What seems to be the case is that when the breakpoint is set,
a COW occurs, moving? the VM page from the underlying object
(assocated with a vnode/NFS pager) to the top-level, anonymous
object.  Then this process terminates, and a new process is
restarted, freeing the top-level object but leaving no VM page
in the underlying object.  When the new process tries to fetch
from this page, it faults.  Somehow there is a race condition
involving the pagein of this page, leading to the observed looping.

(DDB trace showing loop follows)
---------
After 72 instructions (0 loads, 0 stores),
Stopped at      _vm_map_simplify_entry+0x175:   call    _vm_object_deallocate
db> trace
_vm_map_simplify_entry(f09f5f00,f1466e80) at _vm_map_simplify_entry+0x175
__vm_map_clip_start(f09f5f00,f1466e80,f3172000) at __vm_map_clip_start+0x14
_vm_map_delete(f09f5f00,f3172000,f3174000,f03983ec,efbffd14) at _vm_map_delete+0x3a
_bfreekva(f2d567b4) at _bfreekva+0x22
_brelse(f2d567b4) at _brelse+0x300
_nfs_bioread(f13d3780,efbffe44,8,f102f780,1) at _nfs_bioread+0x716
_nfs_getpages(efbffe80) at _nfs_getpages+0xc8
_vnode_pager_getpages(f1478f80,efbfff38,1,0,efbfff58) at _vnode_pager_getpages+0x64
_vm_pager_get_pages(f1478f80,efbfff38,1,0) at _vm_pager_get_pages+0x1f
_vm_fault(f13eb000,a000,1,0,0) at _vm_fault+0x5be
_trap_pfault(efbfffbc,1) at _trap_pfault+0xe8
_trap(27,27,9f68,a748,efbfd0fc) at _trap+0x12f
calltrap() at calltrap+0x15
--- trap 0xc, eip = 0x2000e8a9, esp = 0xefbfd0c8, ebp = 0xefbfd0fc ---
db> step ,100
Stopped at      __vm_map_clip_end+0xaf: call    _vm_object_reference
db> trace
__vm_map_clip_end(f09f5f00,f1466e80,f3174000) at __vm_map_clip_end+0xaf
_vm_map_delete(f09f5f00,f3172000,f3174000,f03983ec,efbffd14) at _vm_map_delete+0x93
_bfreekva(f2d567b4) at _bfreekva+0x22
_brelse(f2d567b4) at _brelse+0x300
_nfs_bioread(f13d3780,efbffe44,8,f102f780,1) at _nfs_bioread+0x716
_nfs_getpages(efbffe80) at _nfs_getpages+0xc8
_vnode_pager_getpages(f1478f80,efbfff38,1,0,efbfff58) at _vnode_pager_getpages+0x64
_vm_pager_get_pages(f1478f80,efbfff38,1,0) at _vm_pager_get_pages+0x1f
_vm_fault(f13eb000,a000,1,0,0) at _vm_fault+0x5be
_trap_pfault(efbfffbc,1) at _trap_pfault+0xe8
_trap(27,27,9f68,a748,efbfd0fc) at _trap+0x12f
calltrap() at calltrap+0x15
--- trap 0xc, eip = 0x2000e8a9, esp = 0xefbfd0c8, ebp = 0xefbfd0fc ---
db> unti
After 10 instructions (0 loads, 0 stores),
Stopped at      _vm_object_reference+0x4e:      ret
db> 
After 6 instructions (0 loads, 0 stores),
Stopped at      __vm_map_clip_end+0xbb: ret
db> 
After 28 instructions (0 loads, 0 stores),
Stopped at      _vm_map_delete+0x157:   call    _pmap_remove
db> 
After 19 instructions (0 loads, 0 stores),
Stopped at      _pmap_remove+0x46:      call    _get_ptbase
db> 
After 11 instructions (0 loads, 0 stores),
Stopped at      _get_ptbase+0x30:       ret
db> 
After 51 instructions (0 loads, 0 stores),
Stopped at      _pmap_remove+0xd3:      ret
db> 
After 4 instructions (0 loads, 0 stores),
Stopped at      _vm_map_delete+0x163:   call    _vm_map_entry_delete
db> 
After 20 instructions (0 loads, 0 stores),
Stopped at      _vm_map_entry_delete+0x3b:      call    _vm_object_deallocate
db> 
After 20 instructions (0 loads, 0 stores),
Stopped at      _vm_object_deallocate+0x1b3:    ret
db> 
After 4 instructions (0 loads, 0 stores),
Stopped at      _vm_map_entry_delete+0x45:      call    _vm_map_entry_dispose
db> 
After 18 instructions (0 loads, 0 stores),
Stopped at      _vm_map_entry_dispose+0x81:     ret
db> 
After 5 instructions (0 loads, 0 stores),
Stopped at      _vm_map_entry_delete+0x50:      ret
db> 
After 16 instructions (0 loads, 0 stores),
Stopped at      _vm_map_delete+0x185:   ret
db> 
After 4 instructions (0 loads, 0 stores),
Stopped at      _bfreekva+0x2d: ret
db> 
After 18 instructions (0 loads, 0 stores),
Stopped at      _brelse+0x45b:  ret
db> 
After 21 instructions (0 loads, 0 stores),
Stopped at      _nfs_bioread+0x5d6:     call    _nfs_getcacheblk
db> 
After 20 instructions (0 loads, 0 stores),
Stopped at      _nfs_getcacheblk+0x81:  call    _getblk
db> 
After 24 instructions (0 loads, 0 stores),
Stopped at      _getblk+0x5f:   call    _gbincore
db> 
After 23 instructions (0 loads, 0 stores),
Stopped at      _gbincore+0x37: ret
db> 
After 9 instructions (0 loads, 0 stores),
Stopped at      _getblk+0x13c:  call    _getnewbuf
db> 
After 19 instructions (0 loads, 0 stores),
Stopped at      _getnewbuf+0x41:        call    _bremfree
db> 
After 33 instructions (0 loads, 0 stores),
Stopped at      _bremfree+0x7e: ret
db> 
After 46 instructions (0 loads, 0 stores),
Stopped at      _getnewbuf+0x2d4:       call    _bfreekva
db> 
After 9 instructions (0 loads, 0 stores),
Stopped at      _bfreekva+0x2d: ret
db> 
After 7 instructions (0 loads, 0 stores),
Stopped at      _getnewbuf+0x2e7:       call    _vm_map_findspace
db> 
After 202 instructions (0 loads, 0 stores),
Stopped at      _vm_map_findspace+0xb6: ret
db> 
After 21 instructions (0 loads, 0 stores),
Stopped at      _getnewbuf+0x339:       call    _vm_map_insert
db> 
After 26 instructions (0 loads, 0 stores),
Stopped at      _vm_map_insert+0x5c:    call    _vm_map_lookup_entry
db> 
After 38 instructions (0 loads, 0 stores),
Stopped at      _vm_map_lookup_entry+0x6e:      ret
db> 
After 66 instructions (0 loads, 0 stores),
Stopped at      _vm_map_insert+0x250:   ret
db> 
After 12 instructions (0 loads, 0 stores),
Stopped at      _getnewbuf+0x356:       ret
db> 
After 12 instructions (0 loads, 0 stores),
Stopped at      _getblk+0x196:  calll   *%eax
db> 
After 10 instructions (0 loads, 0 stores),
Stopped at      _nfs_islocked+0x1b:     ret
db> 
After 6 instructions (0 loads, 0 stores),
Stopped at      _getblk+0x1a3:  call    _gbincore
db> 
After 23 instructions (0 loads, 0 stores),
Stopped at      _gbincore+0x37: ret
db> 
After 9 instructions (0 loads, 0 stores),
Stopped at      _getblk+0x1cf:  call    _bgetvp
db> 
After 40 instructions (0 loads, 0 stores),
Stopped at      _bgetvp+0x8c:   ret
db> 
After 40 instructions (0 loads, 0 stores),
Stopped at      _getblk+0x25f:  call    _allocbuf
db> 
After 90 instructions (0 loads, 0 stores),
Stopped at      _allocbuf+0x412:        call    _vm_page_lookup
db> 
After 52 instructions (0 loads, 0 stores),
Stopped at      _vm_page_lookup+0x96:   ret
db> 
After 30 instructions (0 loads, 0 stores),
Stopped at      _allocbuf+0x51a:        call    _vfs_buf_set_valid
db> 
After 67 instructions (0 loads, 0 stores),
Stopped at      _vfs_buf_set_valid+0x87:        ret
db> 
After 3 instructions (0 loads, 0 stores),
Stopped at      _allocbuf+0x523:        call    _vm_page_wire
db> 
After 13 instructions (0 loads, 0 stores),
Stopped at      _vm_page_wire+0x27:     call    _vm_page_unqueue
db> 
After 34 instructions (0 loads, 0 stores),
Stopped at      _vm_page_unqueue+0x7a:  ret
db> 
After 16 instructions (0 loads, 0 stores),
Stopped at      _vm_page_wire+0x60:     ret
db> 
After 32 instructions (0 loads, 0 stores),
Stopped at      _allocbuf+0x412:        call    _vm_page_lookup
db> 
After 59 instructions (0 loads, 0 stores),
Stopped at      _vm_page_lookup+0x96:   ret
db> 
After 30 instructions (0 loads, 0 stores),
Stopped at      _allocbuf+0x51a:        call    _vfs_buf_set_valid
db> 
After 32 instructions (0 loads, 0 stores),
Stopped at      _vfs_buf_set_valid+0x87:        ret
db> 
After 3 instructions (0 loads, 0 stores),
Stopped at      _allocbuf+0x523:        call    _vm_page_wire
db> 
After 13 instructions (0 loads, 0 stores),
Stopped at      _vm_page_wire+0x27:     call    _vm_page_unqueue
db> 
After 33 instructions (0 loads, 0 stores),
Stopped at      _vm_page_unqueue+0x7a:  ret
db> 
After 16 instructions (0 loads, 0 stores),
Stopped at      _vm_page_wire+0x60:     ret
db> 
After 27 instructions (0 loads, 0 stores),
Stopped at      _allocbuf+0x585:        call    _pmap_qenter
db> 
After 53 instructions (0 loads, 0 stores),
Stopped at      _pmap_qenter+0x54:      ret
db> 
After 24 instructions (0 loads, 0 stores),
Stopped at      _allocbuf+0x5d2:        ret
db> 
After 7 instructions (0 loads, 0 stores),
Stopped at      _getblk+0x26d:  ret
db> 
After 14 instructions (0 loads, 0 stores),
Stopped at      _nfs_getcacheblk+0xaa:  ret
db> 
After 41 instructions (0 loads, 0 stores),
Stopped at      _nfs_bioread+0x711:     call    _brelse
db> 
After 77 instructions (0 loads, 0 stores),
Stopped at      _brelse+0x1d1:  call    _vm_page_test_dirty
db> 
After 9 instructions (0 loads, 0 stores),
Stopped at      _vm_page_test_dirty+0x21:       ret
db> 
After 32 instructions (0 loads, 0 stores),
Stopped at      _brelse+0x1d1:  call    _vm_page_test_dirty
db> 
After 8 instructions (0 loads, 0 stores),
Stopped at      _vm_page_test_dirty+0x10:       call    _pmap_is_modified
db> 
After 5 instructions (0 loads, 0 stores),
Stopped at      _pmap_is_modified+0x8:  call    _pmap_testbit
db> 
After 37 instructions (0 loads, 0 stores),
Stopped at      _pmap_testbit+0x100:    ret
db> 
After 2 instructions (0 loads, 0 stores),
Stopped at      _pmap_is_modified+0xe:  ret
db> 
After 5 instructions (0 loads, 0 stores),
Stopped at      _vm_page_test_dirty+0x21:       ret
db> 
After 7 instructions (0 loads, 0 stores),
Stopped at      _brelse+0x1e4:  call    _vm_page_set_invalid
db> 
After 37 instructions (0 loads, 0 stores),
Stopped at      _vm_page_set_invalid+0x81:      ret
db> 
After 8 instructions (0 loads, 0 stores),
Stopped at      _brelse+0x1fd:  call    _pmap_page_protect
db> 
After 10 instructions (0 loads, 0 stores),
Stopped at      _pmap_page_protect+0x21:        call    _pmap_remove_all
db> 
After 39 instructions (0 loads, 0 stores),
Stopped at      _pmap_remove_all+0x18b: ret
db> 
After 2 instructions (0 loads, 0 stores),
Stopped at      _pmap_page_protect+0x27:        ret
db> 
After 21 instructions (0 loads, 0 stores),
Stopped at      _brelse+0x263:  call    _vfs_vmio_release
db> 
After 14 instructions (0 loads, 0 stores),
Stopped at      _vfs_vmio_release+0x2b: call    _vm_page_unwire
db> 
After 29 instructions (0 loads, 0 stores),
Stopped at      _vm_page_unwire+0x75:   ret
db> 
After 27 instructions (0 loads, 0 stores),
Stopped at      _vfs_vmio_release+0x2b: call    _vm_page_unwire
db> 
After 29 instructions (0 loads, 0 stores),
Stopped at      _vm_page_unwire+0x75:   ret
db> 
After 17 instructions (0 loads, 0 stores),
Stopped at      _vfs_vmio_release+0x11a:        call    _pmap_qremove
db> 
After 27 instructions (0 loads, 0 stores),
Stopped at      _pmap_qremove+0x32:     ret
db> 
After 8 instructions (0 loads, 0 stores),
Stopped at      _vfs_vmio_release+0x141:        call    _brelvp
db> 
After 36 instructions (0 loads, 0 stores),
Stopped at      _brelvp+0x7d:   ret
db> 
After 6 instructions (0 loads, 0 stores),
Stopped at      _vfs_vmio_release+0x14d:        ret
db> 
After 35 instructions (0 loads, 0 stores),
Stopped at      _brelse+0x2fb:  call    _bfreekva
db> 
After 12 instructions (0 loads, 0 stores),
Stopped at      _bfreekva+0x1d: call    _vm_map_delete
db> 
After 12 instructions (0 loads, 0 stores),
Stopped at      _vm_map_delete+0x14:    call    _vm_map_lookup_entry
db> 
After 26 instructions (0 loads, 0 stores),
Stopped at      _vm_map_lookup_entry+0x6e:      ret
db> 
After 10 instructions (0 loads, 0 stores),
Stopped at      _vm_map_delete+0x35:    call    __vm_map_clip_start
db> 
After 9 instructions (0 loads, 0 stores),
Stopped at      __vm_map_clip_start+0xf:        call    _vm_map_simplify_entry
db> 
After 72 instructions (0 loads, 0 stores),
Stopped at      _vm_map_simplify_entry+0x175:   call    _vm_object_deallocate


>Fix:
	
WORKAROUND

I have been running the following patch for about a month on a moderately
loaded system that currently averages about 60 users during peak periods
each day.  There can be up to 15 or 20 students doing debugging with
GDB at a time.  The printf is exercised several times per day, more
frequently if lots of people are doing debugging.  The system has not
wedged since the patch was installed, and I have noticed no other ill
effects.

						Gene Stark
						stark@freebsd.org


# cvs diff -c nfs_bio.c
Index: nfs_bio.c
===================================================================
RCS file: /A/cvs/src/sys/nfs/nfs_bio.c,v
retrieving revision 1.28.2.7
diff -c -r1.28.2.7 nfs_bio.c
*** nfs_bio.c   1998/01/28 00:26:54     1.28.2.7
--- nfs_bio.c   1998/10/07 00:16:17
***************
*** 356,363 ****
                                        panic("nfsbioread");
                                    if (VOP_BWRITE(bp) == EINTR)
                                        return (EINTR);
!                               } else
                                    brelse(bp);
                                goto again;
                        }
                }
--- 356,387 ----
                                        panic("nfsbioread");
                                    if (VOP_BWRITE(bp) == EINTR)
                                        return (EINTR);
!                               } else {
! #ifdef NFS_PTRACE_BUG
!                                 /*
!                                  * E. Stark - 9/30/98
!                                  * Avoid spinning here on a partial
!                                  * buffer, by invalidating the buffer
!                                  * and yielding the CPU for awhile.
!                                  * There is a race condition involving
!                                  * ptrace writes to text segment that
!                                  * seems to exercise this bug.
!                                  */
!                                   int ret;
!                                   bp->b_flags |= B_INVAL;
                                    brelse(bp);
+                                   if((ret =
+                                       tsleep (0xdeadbeef, PRIBIO | PCATCH,
+                                               "nfsptbug", hz/10)) != 0) {
+                                     if(ret != EWOULDBLOCK) {
+                                       printf("nfs_bioread: nfs_ptrace_bug (ret=0x%x)\n", ret);
+                                       return(ret);
+                                     }
+                                   }
+ #else
+                                   brelse(bp);
+ #endif
+                               }
                                goto again;
                        }
                }

>Audit-Trail:
>Unformatted:

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199811051455.JAA27573>