Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 1 Sep 2006 18:42:38 +0200
From:      "Patrick M. Hausen" <hausen@punkt.de>
To:        freebsd-stable@freebsd.org
Subject:   LSI/amr driver controller cache problem?
Message-ID:  <20060901164238.GA66726@hugo10.ka.punkt.de>

next in thread | raw e-mail | index | archive | help
Hi, all!

We just set a brand new Intel SSR212CC box into production.
This is basically a standard server with 2 LSI SATA RAID
controllers and 12 drive bays in 2 rack units height.

Intel sells it as a storage product. There's a variant
of Windows 2003 server that turns this box into an iSCSI
target.

We want to use it for disk based backup with Amanda.
The system runs 6-STABLE at the moment.

amr0: <LSILogic MegaRAID 1.53> mem 0xfbef0000-0xfbefffff,
	0xfcd00000-0xfcdfffff irq 72 at device 14.0 on pci6
amr0: <LSILogic Intel(R) RAID Controller SRCS28X>
	Firmware 814C, BIOS H431, 128MB RAM
amr1: <LSILogic MegaRAID 1.53> mem 0xfbff0000-0xfbffffff,
	0xfcf00000-0xfcffffff irq 96 at device 14.0 on pci8
amr1: <LSILogic Intel(R) RAID Controller SRCS28X>
	Firmware 814C, BIOS H431, 128MB RAM
amrd0: <LSILogic MegaRAID logical drive> on amr0
amrd0: 1907348MB (3906248704 sectors) RAID 5 (optimal)

Since the two RAID controllers come with a battery backup for
their cache memory, I configured the logical drive with write
back cache policy and the individual disk drives' write caches
off.

After cvsup and build/installworld, I noticed strange
Sendmail failures (signal 11) on the box.

Reinstalling Sendmail fixed the problem. Just to make sure
I did installworld again, rebooted - Sendmail signal 11.

Then it dawned at me that Sendmail is the last binary installed
and written to the logical drive in the installworld process.
I can reproduce the problem any time: installworld, reboot,
Sendmail broken. Installworld or just reinstall Sendmail, don't
reboot, everything's fine. No matter if I use "reboot" or
"shutdown -r".

Is it possible that the amr driver does not issue the necessary
flush command to the controller (probably first part of the
problem) and additionally the controller loses it's cache
content at the following system reset despite it's BBU
(second part of problem - iir controllers by ICP Vortex handle
a system reset just fine, syncing the drives during boot)?

And ideas? I don't have a different explanation. A coworker
suggested a possible yet unknown UFS2 problem with large
filesystems, but /usr is not large on this box. /var is.

The last couple of writes before a system reboot are lost.
Reliably. I will set the controller's cache policy back to
"write through", but I'm still not sleeping well ...


Thanks,
Patrick

P.S. As a side note: no problems at all with the em(4) driver so
     far on this one.
-- 
punkt.de GmbH         Internet - Dienstleistungen - Beratung
Vorholzstr. 25        Tel. 0721 9109 -0 Fax: -100
76137 Karlsruhe       http://punkt.de



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060901164238.GA66726>