FreeBSD Mail Archives

Date:      Mon, 3 Jun 2013 15:34:26 -0700
From:      Jeremy Chadwick <jdc@koitsu.org>
To:        Ross Alexander <rwa@athabascau.ca>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: 9.1-current disk throughput stalls ?
Message-ID:  <20130603223425.GA51402@icarus.home.lan>
In-Reply-To: <alpine.BSF.2.00.1306031433130.1926@autopsy.pc.athabascau.ca>
References:  <alpine.BSF.2.00.1306030844360.79095@auwow.bogons> <20130603203146.GB49602@icarus.home.lan> <alpine.BSF.2.00.1306031433130.1926@autopsy.pc.athabascau.ca>

On Mon, Jun 03, 2013 at 03:48:30PM -0600, Ross Alexander wrote:
> On Mon, 3 Jun 2013, Jeremy Chadwick wrote:
> 
> >1. There is no such thing as 9.1-CURRENT.  Either you meant 9.1-STABLE
> >(what should be called stable/9) or -CURRENT (what should be called
> >head).
> 
> >I wrote:
> >>The oldest kernel I have that shows the syndrome is -
> >>
> >>    FreeBSD aukward.bogons 9.1-STABLE FreeBSD 9.1-STABLE #59 r250498:
> >>    Sat May 11 00:03:15 MDT 2013
> >>    toor@aukward.bogons:/usr/obj/usr/src/sys/GENERIC  amd64
> 
> See above.  You're right, I shouldn't post after a 07:00 dentist's
> appt while my spouse is worrying me about the ins adjustor's report
> on the car damage :(.  Hey, I'm very fallible.  I'll try harder.
> 
> >2. Is there some reason you excluded details of your ZFS setup?
> >"zpool status" would be a good start.
> 
> Thanks for the useful hint as to what info you need to diagnose.
> 
> One of the machines ran a 5 drive zraid-1 pool (Mnemosyne).
> 
> Another was a 2 drive gmirror, in the simplest possible gpart/gmirror setup.
> (Mnemosyne-sub-1.)
> 
> The third is a 2 drive ZFS raid-1, again in the simplest possible
> gpart/gmirror manner (Aukward).
> 
> The fourth is a conceptually identical 2 drive ZFS raid-1, swapping
> to a zvol (Griffon.)
> 
> If you look on the FreeBSD wiki, the pages that say "bootable zfs
> gptzfsboot" and "bootable mirror" -
> 
> 	  https://wiki.freebsd.org/RootOnZFS
> 	  http://www.freebsdwiki.net/index.php/RAID1,_Software,_How_to_setup
> 
> Well, I just followed those in cookbook style (modulo device and pool
> names).  Didn't see any reason to be creative; I build for
> reliability, not performance.
> 
> Aukward is gpart/zfs raid-1 box #1:
> 
>     aukward:/u0/rwa > ls -l /dev/gpt
>     total 0
>     crw-r-----  1 root  operator  0x91 Jun  3 10:18 vol0
>     crw-r-----  1 root  operator  0x8e Jun  3 10:18 vol1
> 
>     aukward:/u0/rwa > zpool list -v
>     NAME           SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
>     ult_root       111G   108G  2.53G    97%  1.00x  ONLINE  -
>       mirror       111G   108G  2.53G         -
> 	gpt/vol0      -      -      -         -
> 	gpt/vol1      -      -      -         -
> 
>     aukward:/u0/rwa > zpool status
>       pool: ult_root
>      state: ONLINE
>       scan: scrub repaired 0 in 1h13m with 0 errors on Sun May  5 04:29:30 2013
>     config:
> 
> 	    NAME          STATE     READ WRITE CKSUM
> 	    ult_root      ONLINE       0     0     0
> 	      mirror-0    ONLINE       0     0     0
> 		gpt/vol0  ONLINE       0     0     0
> 		gpt/vol1  ONLINE       0     0     0
> 
>     errors: No known data errors
> 
> (Yes, that machine has no swap.  Has NEVER had swap, has 16 GB and
> uses maybe 10% at max load.  Has been running 9.x since prerelease
> days, FWTW.  The ARC is throttled to 2 GB; zfs-stats says I never get
> near using even that.  It's just the box that drives the radios,
> a ham radio hobby machine.)
> 
> Griffon is also gpart/zfs raid-1 -
> 
>     griffon:/u0/rwa > uname -a
> 	FreeBSD griffon.cs.athabascau.ca 9.1-STABLE FreeBSD 9.1-STABLE #25 r251062M:
> 	Tue May 28 10:39:13 MDT 2013
> 	toor@griffon.cs.athabascau.ca:/usr/obj/usr/src/sys/GENERIC
> 	amd64
> 
>     griffon:/u0/rwa > ls -l /dev/gpt
>     total 0
>     crw-r-----  1 root  operator  0x7b Jun  3 08:38 disk0
>     crw-r-----  1 root  operator  0x80 Jun  3 08:38 disk1
>     crw-r-----  1 root  operator  0x79 Jun  3 08:38 swap0
>     crw-r-----  1 root  operator  0x7e Jun  3 08:38 swap1
> 
> and the pool is fat and happy -
> 
>     griffon:/u0/rwa > zpool status -v
>       pool: pool0
>      state: ONLINE
>       scan: none requested
>     config:
> 
> 	    NAME           STATE     READ WRITE CKSUM
> 	    pool0          ONLINE       0     0     0
> 	      mirror-0     ONLINE       0     0     0
> 		gpt/disk0  ONLINE       0     0     0
> 		gpt/disk1  ONLINE       0     0     0
> 
>     errors: No known data errors
> 
> Note that swap is through ZFS zvol;
> 
>     griffon:/u0/rwa > cat /etc/fstab
>     # Device        Mountpoint      FStype  Options         Dump    Pass#
>     #
>     #
>     /dev/zvol/pool0/swap none       swap    sw              0       0
> 
>     pool0           /               zfs     rw              0       0
>     pool0/tmp       /tmp            zfs     rw              0       0
>     pool0/var       /var            zfs     rw              0       0
>     pool0/usr       /usr            zfs     rw              0       0
>     pool0/u0        /u0             zfs     rw              0       0
> 
>     /dev/cd0        /cdrom          cd9660  ro,noauto       0       0
>     /dev/ada2s1d    /mnt0           ufs     rw,noauto       0       0
>     /dev/da0s1      /u0/rwa/camera  msdosfs rw,noauto       0       0
> 
> The machine has 32 GB and never swaps.  It runs virtualbox loads, anything
> from one to forty virtuals (little OpenBSD images.)  Load is always light.
> 
> As for the zraid-5 box (Mnemosyne), I first replaced the ZFS pool with
> a simple gpart/gmirror.  The drives gmirrored are known to be good.  That
> *also* ran like mud.  Then I downgraded to 8.4-STABLE, GENERIC kernel,
> and it's just fine now thanks.
> 
> I have the five zraid-1 disks that were pulled sitting in a second 4
> core server chassis, on my desk, and they fail in that machine in the
> same way that the production box died.  I'm 150 km away and the power
> went down over the weekend at the remote site so I'll have to wait
> until tomorrow to send you those details.
> 
> For now, think cut-and-paste from freebsd wiki, nothing clever,
> everything as simple as possible.  Film at 11.
> 
> >3. Do any of your filesystems/pools have ZFS compression enabled, or
> >have in the past?
> 
> No; disk is too cheap to bother with that.
> 
> >4. Do any of your filesystems/pools have ZFS dedup enabled, or have in
> >the past?
> 
> No; disk is too cheap to bother with that.
> 
> >5. Does the problem go away after a reboot?
> 
> It goes away for a few minutes, and then comes back on little cat feet.
> Gradual slowdown.
> 
> >6. Can you provide smartctl -x output for both ada0 and ada1?  You will
> >need to install ports/sysutils/smartmontools for this.  The reason I'm
> >asking for this is there may be one of your disks which is causing I/O
> >transactions to stall for the entire pool (i.e. "single point of
> >annoyance").
> 
> Been down that path, good call, Mnemosyne (zraid-1) checked clean as a
> whistle. (Later) Griffon checks out clean, too.  Both -x and -a.
> Aukward might have an iffy device, I will sched some self tests and
> post everything, all neatly tabulated.
> 
> I've already fought a bad disk, and also just-slighly-iffy cables,
> in a ZFS context and that time was nothing like this one.
> 
> >7. Can you remove ZFS from the picture entirely (use UFS only) and
> >re-test?  My guess is that this is ZFS behaviour, particularly the ARC
> >being flushed to disk, and your disks are old/slow.  (Meaning: you have
> >16GB RAM + 4 core CPU but with very old disks).
> 
> Already did that.  A gmirror 9.1 (Mnemosyne-sub-1) box slowly choked
> and died just like the ZFS instance did.  An 8.4-STABLE back-rev
> without hardware changes was the fix.
> 
> Also: I noticed that when I mounted the 9.1 zraid from an 8.4 flash
> fixit disk, everything ran quickly and stably.  I did copies of about
> 635 GB worth of ~3 GB sized .pcap files out of the zraid onto a SCSI
> UFS and the ZFS disks were all about 75 to 80% busy for the ~8000
> seconds the copy was running.  No slowdowns, no stalls.
> 
> BTW, I'd like to thank you for your kind interest, and please forgive
> my poor reporting skills - I'm at home, work is 150 k away, the phone
> keeps ringing, there are a lot of boxes, I'm sleep deprived, whine &
> snivel, grumble & moan ;)

All the above information is almost "too much".  There are now multiple
machines with multiple hardware devices (disks, controllers, etc.) and
different setups to try and figure out.  Each/every situation (system,
etc.) needs to be analysed individually.

So let's please focus on the one you called "aukward", because it's the
one we have some details for.  Please do not involve the other systems
at this point in time.

What we know at this point:

1. OS is amd64 [1]

2. System uses a 4-core CPU and has 16GB RAM [1]

3. Uses AHCI, driven by an ATI/AMD IXP700 [1]

4. Has two disks: ada0 and ada1, both of which are very old/slow
WD1200JD (120GB, SATA150, 8MB cache, 512-byte sectors, 7200rpm); I
have used these disks, so I speak from experience when I say old/slow [1]

5. Both disks use GPT partitioning [2], but partition layout we don't
know the layout ("gpart show {ada0,ada1}" would be helpful),

6. ZFS is involved [1][2]

7. ZFS setup is a mirror (RAID-1-like),

8. Root filesystem uses ZFS, but we don't know what your filesystem
layouts look like ("zfs get all" and "df -k" would be helpful) [2]

9. Compression nor dedup are used (good!!!) [2]

10. System does not use swap [2]

11. ARC is "throttled" to 2GB, but we don't know how you did this.  I
really need to see your sysctl.conf and loader.conf tunings [2]

12. Rolling back to 8.4-STABLE (date/build unknown) apparently fixes
your issue (I would appreciate you running the system for 72 hours
before making this statement, and doing the *exact same things* on it
that cause the problem with 9.1-STABLE) [2]

13. Rebooting the system causes I/O to be fast again, for a little
while, then gradually gets worse [2].

Pending things:

i) Need data from #5 above

ii) Need data from #8 above

iii) Need data from #11 above

iv) I still want to see smartctl -x output.  I do not need you to "run
self-tests" -- respectfully please just do what I ask.  Most people do
not know how to interpret/understand SMART results

v) I really wish you would not have rolled this system back to
8.4-STABLE.  For anyone to debug this, we need the system in a
consistent state.  Changing kernels/etc. 

vi) Would appreciate seeing "sysctl -a | grep zfs" when the I/O is
fast (immediately after a reboot is fine) and again when the I/O is
very slow.  I do not care about "zfs-stats".

vii) dmesg would also be useful (put it up on pastebin if you want).

Please be aware the FreeBSD Wiki on ZFS is known to be outdated in many
regards.  I won't go into details.

I have so many gut feelings at this point about your problem that is it
almost unbearable.  The possibilities are near endless at this point.
Answers to the above could help narrow them down.

Finally, wanted to briefly mention that your [2] repeatedly says "load"
with no indication if you mean CPU load or disk load.  Your phrasing
indicate you're referring to CPU load, which is unrelated to disk load.
For disk load, use "gstat -I500ms" (please ignore the busy% column).
systat will not show you this in a coherent manner; you need to have
two windows up (preferably one with "top -s 1" the other with gstat).
You may be surprised at what's going on behind the scenes with disk
load.

[1]: http://lists.freebsd.org/pipermail/freebsd-stable/2013-June/073654.html
[2]: http://lists.freebsd.org/pipermail/freebsd-stable/2013-June/073662.html

-- 
| Jeremy Chadwick                                   jdc@koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130603223425.GA51402>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation