Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 2 Sep 2013 13:29:29 +0200
From:      Nils Pascal Illenseer <ni@vm.ag>
To:        freebsd-questions@freebsd.org
Subject:   Re: System hangs for several minutes (disk IO related)
Message-ID:  <4A7374B9-1940-4380-A306-4804B7C93188@vm.ag>
In-Reply-To: <20130730171938.GA3602@aurora.oekb.co.at>
References:  <20130730171938.GA3602@aurora.oekb.co.at>

Next in thread | Previous in thread | Raw E-Mail | Index | Archive | Help

--Apple-Mail=_157D8768-B8BA-42D5-8F96-375ED0DD5270
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252

Hi,

I see similar hangs on one of our Supermicro servers.
We have a ZFS RAID (mirrored stripped vdevs) and when I use "zfs =
receive" to receive snapshots the whole system hangs for up to ten or =
even more minutes at the end.

Kernel: latest (9.2-RC3)
Adaptec 6805 RAID-Controller provides disks for ZFS via JBOD

/var/log/messages and dmesg do not show anything related to the hangs.

I hope this helps to analyze that issue any further.

Regards,
Nils Pascal Illenseer


------------------------------ < Cut here > =
------------------------------

Copyright (c) 1992-2013 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights =
reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 9.2-RC3 #0 r254795: Sat Aug 24 20:25:04 UTC 2013
    root@bake.isc.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
gcc version 4.2.1 20070831 patched [FreeBSD]
CPU: AMD Opteron(tm) Processor 6376                  (2300.05-MHz =
K8-class CPU)
  Origin =3D "AuthenticAMD"  Id =3D 0x600f20  Family =3D 0x15  Model =3D =
0x2  Stepping =3D 0
  =
Features=3D0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE=
,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  =
Features2=3D0x3e98320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,POP=
CNT,AESNI,XSAVE,OSXSAVE,AVX,F16C>
  AMD Features=3D0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
  AMD =
Features2=3D0x1ebbfff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW=
,IBS,XOP,SKINIT,WDT,LWP,FMA4,<b17>,NodeId,TBM,Topology,<b23>,<b24>>
  Standard Extended Features=3D0x8
  TSC: P-state invariant, performance statistics
real memory  =3D 137438953472 (131072 MB)
avail memory =3D 133006090240 (126844 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <050713 APIC1654>
FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs
FreeBSD/SMP: 1 package(s) x 16 core(s)
=85
aacraid0: <Adaptec RAID Controller> mem =
0xfd800000-0xfdbfffff,0xfd7bf800-0xfd7bffff,0xfd7bf400-0xfd7bf4ff irq 28 =
at device 0.0 on pci1
aacraid0: Enable Raw I/O
aacraid0: Enable 64-bit array
aacraid0: New comm. interface type1 enabled
aacraid0: Adaptec 6805, aacraid driver 3.1.1-1
aacraidp0 on aacraid0
aacraidp1 on aacraid0
aacraidp2 on aacraid0
aacraidp3 on aacraid0

------------------------------ < Cut here > =
------------------------------


Am 30.07.2013 um 19:19 schrieb Ewald Jenisch <a@jenisch.at>:

> Hi,
>=20
> I'm seeing rather strange behavior on an HP DL585 G5 wrt. disk IO:
>=20
> When there's any disk io the machine completely freezes, i.e. no
> console input possible, no screen output - complete hang. After some
> minutes the box comes back to normal again - but sure enough with the
> next disk io it freezes again.
>=20
> To give you a typical example: While a "portsnap fetch extract" was
> running I did a "sync". Normally this should complete in a matter of
> milliseconds to seconds in the worst case - but dig this:
>=20
> # date;time sync;date
> Tue Jul 30 09:57:38 CEST 2013
> 0.000u 0.311s 9:54.69 0.0%      4+161k 0+1287io 0pf+0w
> Tue Jul 30 10:07:38 CEST 2013
> #
>=20
> No, this is not a typo - it really took nearly ten minutes (!) for the
> sync to complete. In the meantime - every windows, all activity
> (console, screen-output etc.) is completely blocked. ('portsnap fetch
> extract' was only given as an example here - the lockup occurs
> whenever there is disk io like for example tar, etc).
>=20
> We're speaking about a machine with decent hardware here, here's an
> excerpt from "dmesg":
>=20
> ------------------------------ < Cut here > =
------------------------------
>=20
> FreeBSD 9.2-BETA2 #0 r253750: Mon Jul 29 11:07:04 CEST 2013
>    root@sniff-rz2:/usr/obj/usr/src/sys/GENERIC amd64
> gcc version 4.2.1 20070831 patched [FreeBSD]
> CPU: Quad-Core AMD Opteron(tm) Processor 8358 SE (2411.16-MHz K8-class =
CPU)
>  Origin =3D "AuthenticAMD"  Id =3D 0x100f23  Family =3D 0x10  Model =3D =
0x2  Stepping =3D 3
>  =
Features=3D0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE=
,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>  Features2=3D0x802009<SSE3,MON,CX16,POPCNT>
>  AMD =
Features=3D0xee400800<SYSCALL,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>=

>  AMD =
Features2=3D0x7ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS=
>
>  TSC: P-state invariant
> real memory  =3D 137438953472 (131072 MB)
> avail memory =3D 132973432832 (126813 MB)
> Event timer "LAPIC" quality 400
> ACPI APIC Table: <HP     ProLiant>
> FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs
> ...
> ciss0: <HP Smart Array P400> port 0x3000-0x30ff mem =
0xd9e00000-0xd9efffff,0xd9df0000-0xd9df0fff irq 16 at device 0.0 on pci8
> ciss0: PERFORMANT Transport
> ...
> da0 at ciss0 bus 0 scbus2 target 0 lun 0
> da0: <COMPAQ RAID 1(1+0) OK> Fixed Direct Access SCSI-5 device=20
> da0: 135.168MB/s transfers
> da0: Command Queueing enabled
> da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C)
> da0: quirks=3D0x1<NO_SYNC_CACHE>
>=20
> ------------------------------ < Cut here > =
------------------------------
>=20
> Kernel: Latest kernel as of yesterday (9.2Beta)
>=20
> BIOS: is at the latest level (Support pack as of Spring 2013)
> installed which updated BIOS, iLO etc. Aside from that I reset BIOS to
> default values just to be sure.=20
>=20
> SmartArray P400 - Firmware 7.24 (latest)
>=20
> Harddisks: Two 146GB HDs running in Raid1-mode.  Already tried
> hot-swapping the disks - didn't change anything.
>=20
> Needless to say - no error message etc. in neither dmesg nor
> /var/log/messages :-(
>=20
> To me it looks like this is some sort of timing problem - but where
> should I start looking?
>=20
> Thanks much in advance for any help,
> -ewald
>=20


--Apple-Mail=_157D8768-B8BA-42D5-8F96-375ED0DD5270
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJSJHadAAoJEKwiCCnO1v5scqwP/1GOL7pFnGGraoevuRg6jlfU
b0bcI9cN1YXnGp1DwgHcMtTutWWF6GBFyYe0otp1gUyfygRwvhudLQLpNCJIJoTT
YE2O0vJe0wp5bsRHDDfXZpAvDLa9mO2RIpOn+QoABx5nRm0P0d9SPxkh1e9+po8P
A6IKmIHlh49LjRwsvOl7z2FxnI7ThN2lxepgu5+EqztMau+fbAUEdN+WESHmDJVL
Tvo3RH3m4A5hSxoGdALEdcIfSV/NNH/7R9XfYYaJ7/bYg0VmbmLHViVEhDLSbmkf
D25SmnUboEzEvOEPFwPwgw3qBt3R3knN3siuHXqOi5kbEjYLLmEjmibzd66+Ttt0
fAJcsjvGfYaRCq/Z3hHw5qHNdQjOD1gu3t7jHCH+6wWh5UHVRKwHB3VwkX+UZvqA
1ZstEbVzLG8ZV3t5ZZtGQRIP0EbHJoHnDVBpBbFFbjEUbFlFMAdpImpJodRU60W/
dsnqovmH/HIiBGkDnSQAS3SVab3AVfUWXckb3wy8SMhHxAos4qR973eBu/OSbO7b
ZNdrKu5hDgetj0g6DGgNkSF5HtCeGN0iSjCwyeidXoAkpr9pG4W+A9T4iwhz2rQJ
mdghrnf6Xd0gu9CTXHK5NoVgyif+cHfMu1YokNhw1zaSQRyZc+1PBQpI3LcRItp9
X8b8G2uOuw/a/Ds8QuAR
=XgIN
-----END PGP SIGNATURE-----

--Apple-Mail=_157D8768-B8BA-42D5-8F96-375ED0DD5270--



Want to link to this message? Use this URL: <http://docs.FreeBSD.org/cgi/mid.cgi?4A7374B9-1940-4380-A306-4804B7C93188>