From owner-freebsd-bugs@FreeBSD.ORG Sun May 26 20:00:00 2013 Return-Path: Delivered-To: freebsd-bugs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id DB139FA3 for ; Sun, 26 May 2013 20:00:00 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id BCE91AC8 for ; Sun, 26 May 2013 20:00:00 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r4QK00Ra031781 for ; Sun, 26 May 2013 20:00:00 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r4QK00M4031766; Sun, 26 May 2013 20:00:00 GMT (envelope-from gnats) Resent-Date: Sun, 26 May 2013 20:00:00 GMT Resent-Message-Id: <201305262000.r4QK00M4031766@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Klaus Weber Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DBC63E8E for ; Sun, 26 May 2013 19:51:50 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from oldred.FreeBSD.org (oldred.freebsd.org [8.8.178.121]) by mx1.freebsd.org (Postfix) with ESMTP id BCDBFA95 for ; Sun, 26 May 2013 19:51:50 +0000 (UTC) Received: from oldred.FreeBSD.org ([127.0.1.6]) by oldred.FreeBSD.org (8.14.5/8.14.5) with ESMTP id r4QJpnlE071713 for ; Sun, 26 May 2013 19:51:49 GMT (envelope-from nobody@oldred.FreeBSD.org) Received: (from nobody@localhost) by oldred.FreeBSD.org (8.14.5/8.14.5/Submit) id r4QJpn9Z071712; Sun, 26 May 2013 19:51:49 GMT (envelope-from nobody) Message-Id: <201305261951.r4QJpn9Z071712@oldred.FreeBSD.org> Date: Sun, 26 May 2013 19:51:49 GMT From: Klaus Weber To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Subject: kern/178997: Heavy disk I/O may hang system X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 May 2013 20:00:00 -0000 >Number: 178997 >Category: kern >Synopsis: Heavy disk I/O may hang system >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun May 26 20:00:00 UTC 2013 >Closed-Date: >Last-Modified: >Originator: Klaus Weber >Release: Tested with 9/STABLE, verified in 10-CURRENT >Organization: - >Environment: FreeBSD filepile 9.1-STABLE FreeBSD 9.1-STABLE #5 r250475: Sun May 12 19:14:21 CEST 2013 root@filepile:/usr/obj/usr/src/sys/FILEPILE amd64 (Kernel config has some drivers removed and debugging options added. Tested with GENERIC as well) >Description: During heavy disk I/O (two bonnie++ processes working on the same disk simultaneously) causes an extreme degradation in disk throughput (combined throughput as observed in iostat drops to ~1-3 MB/sec). The problem shows when both bonnie++ processes are in the "Rewriting..." phase. Sometimes, the systems "hangs" (throughput reaches 0 MB/sec); in that case, the system does not recover from this state. "Hang" means that the system responds to ping etc., processes that are already running will continue, but every command that requires disk accesses (even on disks other than the disk being tested) just hangs. If shutdown can be started without disk access (i.e. still in cache), the system will begin the shutdown, but fails to sync all buffers. Most testing was done on 9/Stable and an Areca ARC-1882 controller (arcmsr), but I have test-booted a 10.0-current kernel from early May ("kernel-10.0-CURRENT-5.5.2013"), and verified that the problem still exists there. I have also tested with with an on-board SATA port ("Intel Cougar Point AHCI SATA controller"); with this setup, I can only observe extreme throughput degradation, but no hangs (see below for details). Test-setup: da0 is a 600 GB volume on the Areca controller: da0 at arcmsr0 bus 0 scbus0 target 0 lun 0 da0: Fixed Direct Access SCSI-5 device da0: 600.000MB/s transfers da0: Command Queueing enabled da0: 572202MB (1171870720 512 byte sectors: 255H 63S/T 72945C) gpart create -s gpt da0 gpart add -s 500G -t freebsd-ufs -l test0 -a 1M da0 newfs -b 64k -f 8k /dev/da0p1 mount /dev/da0p1 /mnt && mkdir /mnt/t{1,2} && chmod g+w /mnt/t{1,2} To observe the problem, I start two bonnie++ instances (installed from ports) like this: 1) bonnie++ -s 64g -n 0 -d /mnt/t1 2) bonnie++ -s 64g -n 0 -d /mnt/t2 (in order to better see the various stages, I start the second bonnie++ about a minute after the first. There is nothing magic about the 64gb, the size should be large enough to keep the disk busy for a few minutes.) I then observe disk throughput for da0 with iostat -dC -w 1 da0 da0 cpu KB/t tps MB/s us ni sy in id [... first bonnie++ starts, "writing 1 byte at a time" phase ...] 64.00 6 0.37 0 0 12 0 88 64.00 4 0.25 0 0 13 0 87 64.00 6 0.37 0 0 12 0 88 [first bonnie++ enters "writing intelligently" phase] 63.97 3847 240.33 0 0 12 0 87 64.00 20824 1301.51 0 0 14 0 85 64.00 20449 1278.04 0 0 14 1 85 64.00 17840 1115.01 0 0 14 1 85 64.00 16595 1037.21 0 0 13 1 85 [this is my baseline for upper limit on throughput: ~1 GB/sec] [...second bonnie++ starts, no big changes] 64.00 12869 804.30 0 0 15 1 84 [second bonnie++ gets to "Writing intelligently" phase] 64.00 7773 485.78 0 0 9 0 90 64.00 7310 456.86 0 0 7 1 92 64.00 7430 464.40 0 0 10 0 89 64.00 7151 446.93 0 0 6 1 93 64.00 7410 463.10 0 0 8 0 91 [...] 64.00 7349 459.29 0 0 10 0 90 64.00 7478 467.35 0 0 7 0 93 64.00 7153 447.06 0 0 9 0 90 64.00 7335 458.41 0 0 7 0 93 64.00 7918 494.88 0 0 10 0 90 [first bonnie++ gets to "Rewriting" phase] 64.00 7426 464.10 0 0 12 1 87 64.00 7251 453.17 0 0 9 0 90 64.00 7036 439.75 0 0 6 1 93 64.00 7296 455.98 0 0 8 1 90 64.00 6808 425.51 0 0 6 1 93 64.00 7531 470.66 0 0 8 1 91 64.00 7323 457.67 0 0 8 0 91 64.00 7666 479.14 0 0 6 0 94 64.00 11921 745.07 0 0 12 1 87 64.00 12043 752.69 0 0 12 1 87 64.00 10814 675.89 0 0 11 1 89 64.00 10925 682.82 0 0 12 0 88 [...] [second bonnie++ starts "Rewriting" phase] 64.00 10873 679.56 0 0 10 1 89 64.00 12242 765.12 0 0 13 1 86 64.00 10739 671.19 0 0 11 0 88 64.00 3486 217.91 0 0 8 0 91 64.00 6327 395.42 0 0 14 0 86 64.00 6637 414.84 0 0 9 0 90 64.00 4931 308.19 0 0 10 0 89 64.00 4874 304.63 0 0 13 0 87 64.00 4322 270.11 0 0 18 0 82 64.00 4616 288.52 0 0 19 0 80 64.00 3567 222.96 0 0 20 0 80 64.00 3395 212.16 0 0 30 0 70 64.00 1694 105.89 0 0 33 0 67 64.00 1898 118.63 0 0 36 0 64 0.00 0 0.00 0 0 12 0 88 0.00 0 0.00 0 0 5 0 95 0.00 0 0.00 0 0 5 0 95 0.00 0 0.00 0 0 5 0 95 0.00 0 0.00 0 0 5 0 95 0.00 0 0.00 0 0 5 0 95 0.00 0 0.00 0 0 5 0 95 0.00 0 0.00 0 0 5 0 95 0.00 0 0.00 0 0 4 0 96 0.00 0 0.00 0 0 5 0 95 0.00 0 0.00 0 0 5 0 95 [etc. The system does not recover from this state. All attempts to do disk I/O hang.] When both bonnie++ processes are in their "Rewriting" phase, the system hangs within a few seconds. Both bonnie++ processes are in state "nbufkv". bufdaemon takes about 40% CPU time and is in state "qsleep" when not active. I have also briefly tested a single disk instead of a RAID volume. In that case, I could not get the system to hang, put performance drops from ~150 MB/sec to ~1-3 MB/sec. I do not know if that indicates a problem with the arcmsr driver, or whether I cannot create "enough" I/O load with a single disk: [same setup as above, but on single disk connected to onboard SATA port.] Baseline: 150 MB/sec (first bonnie++ in "Writing intelligently" phase): ada2 cpu KB/t tps MB/s us ni sy in id 128.00 1221 152.60 0 0 2 0 97 128.00 1173 146.60 0 0 2 0 98 128.00 1171 146.35 0 0 2 0 98 [... first bonnie++ in "Rewriting", second in "Writing intelligently" phase:] 127.83 1140 142.30 0 0 2 0 97 128.00 1121 140.11 0 0 2 0 97 128.00 1114 139.24 0 0 1 0 98 127.29 904 112.39 0 0 5 0 95 [second bonnie++ starts rewriting] 120.24 66 7.74 0 0 0 0 100 128.00 59 7.37 0 0 0 0 100 122.43 69 8.24 0 0 1 0 99 128.00 222 27.72 0 0 0 0 100 128.00 1122 140.23 0 0 3 0 97 128.00 1080 134.99 0 0 3 0 97 128.00 1157 144.61 0 0 3 0 97 127.75 504 62.94 0 0 2 0 98 128.00 59 7.37 0 0 1 0 99 128.00 708 88.54 0 0 2 0 98 128.00 969 121.13 0 0 11 0 89 127.89 1161 144.99 0 0 3 0 97 127.60 479 59.63 0 0 1 0 99 128.00 596 74.55 0 0 2 0 98 128.00 1169 146.10 0 0 3 0 97 127.88 1076 134.37 0 0 2 0 97 127.89 1202 150.10 0 0 3 0 97 128.00 1074 134.24 0 0 4 0 96 128.00 1039 129.87 0 0 8 0 92 128.00 913 114.14 0 0 10 0 90 127.73 896 111.78 0 0 11 0 89 128.00 930 116.26 0 0 10 0 90 127.93 909 113.57 0 0 11 0 89 128.00 887 110.89 0 0 13 0 87 128.00 743 92.91 0 0 18 0 81 128.00 548 68.56 0 0 23 0 76 128.00 523 65.43 0 0 24 0 76 127.89 587 73.36 0 0 21 0 79 127.89 586 73.24 0 0 22 0 78 128.00 465 58.07 0 0 28 0 72 127.89 558 69.74 0 0 25 0 75 127.79 595 74.30 0 0 21 0 79 128.00 593 74.18 0 0 23 0 77 128.00 546 68.31 0 0 22 0 78 127.70 428 53.32 0 0 32 0 68 128.00 561 70.18 0 0 25 0 75 128.00 645 80.67 0 0 22 0 78 127.78 582 72.68 0 0 23 0 77 114.68 518 58.07 0 0 27 0 73 103.33 497 50.20 0 0 29 0 71 110.00 128 13.74 0 0 35 0 65 65.94 33 2.12 0 0 37 0 63 74.97 286 20.92 0 0 36 0 64 105.83 306 31.59 0 0 34 0 66 95.12 399 37.03 0 0 30 0 70 108.57 336 35.59 0 0 29 0 71 127.22 246 30.53 0 0 41 0 59 104.16 255 25.91 0 0 32 0 68 103.68 528 53.51 0 0 23 0 77 127.14 373 46.27 0 0 29 0 71 90.65 341 30.16 0 0 35 0 65 127.30 368 45.70 0 0 30 0 70 66.56 25 1.62 0 0 37 0 63 66.00 32 2.06 0 0 38 0 62 94.99 95 8.80 0 0 38 0 62 94.00 433 39.71 0 0 35 0 65 107.66 666 70.06 0 0 26 0 74 119.43 478 55.69 0 0 27 0 73 95.64 441 41.15 0 0 28 0 72 116.45 449 51.01 0 0 29 0 71 102.22 503 50.26 0 0 28 0 72 127.62 509 63.50 0 0 29 0 71 111.84 388 42.33 0 0 29 0 71 (i.e. disk throughput jumps wildly between full speed and ~1% of maximum, despite continuous I/O pressure) Unfortunately, I cannot use kdb on this machine - "ALT-CTRL-ESC" does nothing and "sysctl debug.kdb.enter=1" just says "debug.kdb.enter: 0 -> 0". "sysctl debug.kdb.panic=1" does panic the machine, but the keyboard is not working. I have, however, logged the sysctl vas subtree every second while the problem occurs: As long as both bonnie++ are in "Writing intelligently..." phase, vfs.numdirtybuffers stays mostly between 80-100. When the first bonnie++ goes to "Rewriting..." (at 3:40 below) starts to increase (about 150-350 new dirty buffers per second). When the second bonnie++ goes to "Rewriting..." as well (at 4:24), this speeds up a lot (about 3000-5000 new dirty buffers/sec). A few seconds later, vfs.numdirtybuffers reaches its limit, and remains there until the machine is rebooted: 00-03-28.log:vfs.numdirtybuffers: 107 00-03-29.log:vfs.numdirtybuffers: 106 00-03-30.log:vfs.numdirtybuffers: 105 00-03-31.log:vfs.numdirtybuffers: 105 00-03-32.log:vfs.numdirtybuffers: 110 00-03-33.log:vfs.numdirtybuffers: 112 00-03-34.log:vfs.numdirtybuffers: 110 00-03-35.log:vfs.numdirtybuffers: 115 00-03-36.log:vfs.numdirtybuffers: 117 00-03-37.log:vfs.numdirtybuffers: 118 00-03-38.log:vfs.numdirtybuffers: 118 00-03-39.log:vfs.numdirtybuffers: 118 00-03-40.log:vfs.numdirtybuffers: 118 [first bonnie++ starts Rewriting] 00-03-41.log:vfs.numdirtybuffers: 294 00-03-42.log:vfs.numdirtybuffers: 404 00-03-43.log:vfs.numdirtybuffers: 959 00-03-44.log:vfs.numdirtybuffers: 1071 00-03-45.log:vfs.numdirtybuffers: 1155 00-03-46.log:vfs.numdirtybuffers: 1253 00-03-47.log:vfs.numdirtybuffers: 1366 00-03-48.log:vfs.numdirtybuffers: 1457 00-03-49.log:vfs.numdirtybuffers: 1549 00-03-50.log:vfs.numdirtybuffers: 1680 00-03-51.log:vfs.numdirtybuffers: 1904 00-03-52.log:vfs.numdirtybuffers: 2136 00-03-53.log:vfs.numdirtybuffers: 2363 00-03-54.log:vfs.numdirtybuffers: 2503 00-03-55.log:vfs.numdirtybuffers: 3023 00-03-56.log:vfs.numdirtybuffers: 3268 00-03-57.log:vfs.numdirtybuffers: 3601 00-03-58.log:vfs.numdirtybuffers: 3743 00-03-59.log:vfs.numdirtybuffers: 3904 00-04-00.log:vfs.numdirtybuffers: 4106 00-04-01.log:vfs.numdirtybuffers: 4325 00-04-02.log:vfs.numdirtybuffers: 4461 00-04-03.log:vfs.numdirtybuffers: 4643 00-04-04.log:vfs.numdirtybuffers: 5063 00-04-05.log:vfs.numdirtybuffers: 5202 00-04-06.log:vfs.numdirtybuffers: 5456 00-04-07.log:vfs.numdirtybuffers: 5593 00-04-08.log:vfs.numdirtybuffers: 5819 00-04-09.log:vfs.numdirtybuffers: 6092 00-04-10.log:vfs.numdirtybuffers: 6249 00-04-11.log:vfs.numdirtybuffers: 6412 00-04-12.log:vfs.numdirtybuffers: 6591 00-04-13.log:vfs.numdirtybuffers: 6698 00-04-14.log:vfs.numdirtybuffers: 6853 00-04-15.log:vfs.numdirtybuffers: 7031 00-04-16.log:vfs.numdirtybuffers: 7192 00-04-17.log:vfs.numdirtybuffers: 7387 00-04-18.log:vfs.numdirtybuffers: 7609 00-04-19.log:vfs.numdirtybuffers: 7757 00-04-20.log:vfs.numdirtybuffers: 7888 00-04-21.log:vfs.numdirtybuffers: 8156 00-04-22.log:vfs.numdirtybuffers: 8362 00-04-23.log:vfs.numdirtybuffers: 8558 [second bonnie goes Rewriting as well] 00-04-24.log:vfs.numdirtybuffers: 11586 00-04-25.log:vfs.numdirtybuffers: 16325 00-04-26.log:vfs.numdirtybuffers: 24333 00-04-27.log:vfs.numdirtybuffers: 29058 00-04-28.log:vfs.numdirtybuffers: 33752 00-04-29.log:vfs.numdirtybuffers: 38792 00-04-30.log:vfs.numdirtybuffers: 42663 00-04-31.log:vfs.numdirtybuffers: 47305 00-04-33.log:vfs.numdirtybuffers: 52070 00-04-34.log:vfs.numdirtybuffers: 52066 00-04-35.log:vfs.numdirtybuffers: 52063 00-04-37.log:vfs.numdirtybuffers: 52098 00-04-39.log:vfs.numdirtybuffers: 52098 00-04-41.log:vfs.numdirtybuffers: 52097 00-04-44.log:vfs.numdirtybuffers: 52097 00-04-46.log:vfs.numdirtybuffers: 52098 00-04-49.log:vfs.numdirtybuffers: 52096 00-04-52.log:vfs.numdirtybuffers: 52098 00-04-54.log:vfs.numdirtybuffers: 52096 00-04-57.log:vfs.numdirtybuffers: 52098 00-05-00.log:vfs.numdirtybuffers: 52096 [ etc. ] vfs.hidirtybuffers: 52139 vfs.lodirtybuffers: 26069 (the machine has 32 GB RAM) (I could upload the archive of the vfs logs somewhere, so you can check other values over time, if that would be helpful) I have found http://freebsd.1045724.n5.nabble.com/7-1-RELEASE-I-O-hang-td3966945.html , which seems to describe the same issue (also in combination with a RAID controller, albeit a different one). The symptoms described in r189878 (issues in bufdaemon, causing processes to hang in nbufkv) sound simliar. Please let me know if there is any other info that I can provide. >How-To-Repeat: gpart create -s gpt da0 gpart add -s 500G -t freebsd-ufs -l test0 -a 1M da0 newfs -b 64k -f 8k /dev/da0p1 mount /dev/da0p1 /mnt && mkdir /mnt/t{1,2} && chmod g+w /mnt/t{1,2} Start two bonnie++ instances (installed from ports) like this: 1) bonnie++ -s 64g -n 0 -d /mnt/t1 2) bonnie++ -s 64g -n 0 -d /mnt/t2 Observe disk throughput for da0 with iostat -dC -w 1 da0 in particular when both bonnie++ processes are in their "Rewriting..." phase. >Fix: No fix known. >Release-Note: >Audit-Trail: >Unformatted: