From owner-freebsd-questions@FreeBSD.ORG  Tue Jul 30 17:19:49 2013
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 3F82822E
 for <freebsd-questions@freebsd.org>; Tue, 30 Jul 2013 17:19:49 +0000 (UTC)
 (envelope-from prvs=916e580da=a@jenisch.at)
Received: from mgaterz2.oekb.co.at (mgaterz2.oekb.co.at [143.245.5.112])
 (using TLSv1 with cipher RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 7CE6E2046
 for <freebsd-questions@freebsd.org>; Tue, 30 Jul 2013 17:19:48 +0000 (UTC)
Received: from exchhubcas2.oekb.co.at ([143.245.3.66])
 by mgaterz2.oekb.co.at with ESMTP/TLS/AES128-SHA; 30 Jul 2013 19:19:39 +0200
Received: from aurora.oekb.co.at (143.245.9.16) by
 internal-relay-exchhubcas2.oekb.co.at (143.245.3.65) with Microsoft SMTP
 Server id 14.2.318.4; Tue, 30 Jul 2013 19:19:38 +0200
Received: from aurora.oekb.co.at (localhost [127.0.0.1])	by aurora.oekb.co.at
 (8.14.7/8.14.7) with ESMTP id r6UHJcxX003645;
 Tue, 30 Jul 2013 19:19:38 +0200 (CEST)	(envelope-from a@jenisch.at)
Received: (from ej@localhost)	by aurora.oekb.co.at (8.14.7/8.14.7/Submit) id
 r6UHJcC5003644;	Tue, 30 Jul 2013 19:19:38 +0200 (CEST)	(envelope-from
 a@jenisch.at)
X-Authentication-Warning: aurora.oekb.co.at: ej set sender to a@jenisch.at
 using -f
Date: Tue, 30 Jul 2013 19:19:38 +0200
From: Ewald Jenisch <a@jenisch.at>
To: <freebsd-questions@freebsd.org>
Subject: System hangs for several minutes (disk IO related)
Message-ID: <20130730171938.GA3602@aurora.oekb.co.at>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Jul 2013 17:19:49 -0000

Hi,

I'm seeing rather strange behavior on an HP DL585 G5 wrt. disk IO:

When there's any disk io the machine completely freezes, i.e. no
console input possible, no screen output - complete hang. After some
minutes the box comes back to normal again - but sure enough with the
next disk io it freezes again.

To give you a typical example: While a "portsnap fetch extract" was
running I did a "sync". Normally this should complete in a matter of
milliseconds to seconds in the worst case - but dig this:

# date;time sync;date
Tue Jul 30 09:57:38 CEST 2013
0.000u 0.311s 9:54.69 0.0%      4+161k 0+1287io 0pf+0w
Tue Jul 30 10:07:38 CEST 2013
#

No, this is not a typo - it really took nearly ten minutes (!) for the
sync to complete. In the meantime - every windows, all activity
(console, screen-output etc.) is completely blocked. ('portsnap fetch
extract' was only given as an example here - the lockup occurs
whenever there is disk io like for example tar, etc).

We're speaking about a machine with decent hardware here, here's an
excerpt from "dmesg":

------------------------------ < Cut here > ------------------------------

FreeBSD 9.2-BETA2 #0 r253750: Mon Jul 29 11:07:04 CEST 2013
    root@sniff-rz2:/usr/obj/usr/src/sys/GENERIC amd64
gcc version 4.2.1 20070831 patched [FreeBSD]
CPU: Quad-Core AMD Opteron(tm) Processor 8358 SE (2411.16-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x100f23  Family = 0x10  Model = 0x2  Stepping = 3
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x802009<SSE3,MON,CX16,POPCNT>
  AMD Features=0xee400800<SYSCALL,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
  AMD Features2=0x7ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS>
  TSC: P-state invariant
real memory  = 137438953472 (131072 MB)
avail memory = 132973432832 (126813 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <HP     ProLiant>
FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs
...
ciss0: <HP Smart Array P400> port 0x3000-0x30ff mem 0xd9e00000-0xd9efffff,0xd9df0000-0xd9df0fff irq 16 at device 0.0 on pci8
ciss0: PERFORMANT Transport
...
da0 at ciss0 bus 0 scbus2 target 0 lun 0
da0: <COMPAQ RAID 1(1+0) OK> Fixed Direct Access SCSI-5 device 
da0: 135.168MB/s transfers
da0: Command Queueing enabled
da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C)
da0: quirks=0x1<NO_SYNC_CACHE>

------------------------------ < Cut here > ------------------------------

Kernel: Latest kernel as of yesterday (9.2Beta)

BIOS: is at the latest level (Support pack as of Spring 2013)
installed which updated BIOS, iLO etc. Aside from that I reset BIOS to
default values just to be sure. 

SmartArray P400 - Firmware 7.24 (latest)

Harddisks: Two 146GB HDs running in Raid1-mode.  Already tried
hot-swapping the disks - didn't change anything.

Needless to say - no error message etc. in neither dmesg nor
/var/log/messages :-(

To me it looks like this is some sort of timing problem - but where
should I start looking?

Thanks much in advance for any help,
-ewald