Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 May 2002 12:37:36 -0700 (PDT)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        "Robert Blayzor" <rblayzor@inoc.net>
Cc:        <freebsd-stable@FreeBSD.ORG>
Subject:   Re: Swap_pager error
Message-ID:  <200205281937.g4SJbalA024380@apollo.backplane.com>
References:   <008201c20673$37ac9c60$6f00000a@z0.inoc.net>

next in thread | previous in thread | raw e-mail | index | archive | help
    This message:

:swap_pager: indefinite wait buffer:  #amrd/0x20001, blkno:272, size:4096

    Occurs when the kernel tries to write a page of memory to swap and
    the write is still not complete after 20 seconds.  This type of
    error typically occurs if the hard drive has gone all flaky or if
    hard errors exist in the swap partition.  If so, the 'dmesg' output
    should show a hard disk I/O warning or error message.

    fsck only checks filesystems, and then only for corrupt data (it doesn't
    check for bad blocks).  Fsck does not check swap.  I recommend that you
    scan your hard drive partitions for errors using dd.  'pstat -s' will
    tell you what your swap is mounted on.  For example:

    apollo:/usr/src/sys> pstat -s
    Device          1K-blocks     Used    Avail Capacity  Type
    /dev/rda0s1b      1048448      244  1048204     0%    Interleaved

    To read every block on a partition use 'dd' on the partition:

	dd if=/dev/da0s1b of=/dev/null bs=32k

	(long wait.  The drive light should be saturated.  Run 'iostat da0 1'
	in another window to observe the disk transfer activity).

    You can do this on any partition, including filesystem partitions, and
    the system can be live when you do it (since all you are doing is reading
    the raw blocks off the disk).  You can also run 'dd' on the entire disk
    (e.g. /dev/da0 rather then /dev/da0s<X><Y>) but then if you get errors
    you may not be able to figure out which logical partition they occured in.

    In anycase, if the machine is otherwise idle you should see a fairly
    uniform data transfer rate in the iostat output while the dd is going
    on.  For example on one of my machines I get:

iostat da0 1 
...
 tin tout  KB/t tps  MB/s  us ni sy in id
   9  765  0.00   0  0.00   0  0  0  0100
   9  461  8.00   1  0.01   2  0  0  0 98
   3   49  0.00   0  0.00   0  0  0  0100
   0   43  0.00   0  0.00   0  0  0  0100
      tty             da0             cpu
 tin tout  KB/t tps  MB/s  us ni sy in id
   4  169 31.69 388 12.01   0  0  2  0 98	<<< start dd test
   0   42 31.88 999 31.11   0  0  1  0 99
   0   43 32.00 1043 32.58   2  0  1  2 96
   0   44 32.00 1006 31.43   0  0  6  0 94
...
   1   75 32.00 1050 32.83   1  0  2  1 96
   0   43 32.00 1051 32.86   0  0  2  1 97
   2   44 32.00 1042 32.55   0  0  3  2 95
   6  223 32.00 1053 32.92   0  0  1  0 99
   0   44 32.00 1051 32.86   1  0  2  1 96
   0   43 31.98 1033 32.25   0  0  2  1 98
   0  174 32.00 906 28.31   0  0  2  1 97	<<< dd finishes
   0   43  0.00   0  0.00   1  0  0  0 99

    If you see it suddenly drop down in the middle of the dd operation and
    then pick up again the hard drive may have soft errors internally but
    is still able to finally retrieve the block.  If the kernel ('dmesg'
    program and '/var/log/messages' log file) reports disk errors during
    your dd then you may have a problem with one or more drives.

						-Matt

: ( from "Robert Blayzor" <rblayzor@inoc.net> )
:We have a Dell PowerEdge 2550 server.  It's running FreeBSD4-stable
:(up'd just a couple of weeks ago).  It's an SMP box, 1GB of RAM, two
:3com Tigon2 Gigabit NIC cards and a PERC3/QC controller.
:
:We have two logical drives.  One is a RAID1 set of two 9GB drives which
:holds the operating system only.  The other is a 300GB RAID10 array.
:
:The box had been running fine for months when suddenly the box got hosed
:as we received tons of these errors on the console. (nothing logged to
:/var/log/messages)
:
:swap_pager: indefinite wait buffer:  #amrd/0x20001, blkno:272, size:4096
:
:The box only runs as an NFS/Samba server and nothing else.  It
:eventually just became useless and we had to reset the box hard.
:
:We ran FSCK and it reported no errors and the box came up normally.  We
:were considering running scanning on the OS disk containing the swap,
:but feel there really is no need to as the RAID controller is reporting
:no problems as well.
:
:Anyone have any suggestions on where to start looking for this problem?
:We've had this unit in service almost six months and this is the first
:time we've seen this.  Is there a way to "test" swap space in production
:other than writing something to gobble up memory and forcing the box to
:swap?


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200205281937.g4SJbalA024380>