From owner-freebsd-geom@FreeBSD.ORG  Sat Sep 27 14:57:41 2008
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DF3AC1065686
	for <freebsd-geom@freebsd.org>; Sat, 27 Sep 2008 14:57:41 +0000 (UTC)
	(envelope-from auryn@zirakzigil.org)
Received: from aurynhome1sv1.zirakzigil.org (mail.zirakzigil.org
	[82.63.178.63]) by mx1.freebsd.org (Postfix) with SMTP id DD5AC8FC14
	for <freebsd-geom@freebsd.org>; Sat, 27 Sep 2008 14:57:39 +0000 (UTC)
	(envelope-from auryn@zirakzigil.org)
Received: (qmail 98748 invoked by uid 98); 27 Sep 2008 14:30:57 -0000
Received: from 192.168.229.11 by aurynhome1sv1.zirakzigil.org (envelope-from
	<auryn@zirakzigil.org>, uid 89) with qmail-scanner-1.25 ( 
	Clear:RC:1(192.168.229.11):. 
	Processed in 0.043597 secs); 27 Sep 2008 14:30:57 -0000
X-Qmail-Scanner-Mail-From: auryn@zirakzigil.org via
	aurynhome1sv1.zirakzigil.org
X-Qmail-Scanner: 1.25 (Clear:RC:1(192.168.229.11):. Processed in 0.043597 secs)
Received: from unknown (HELO aurynhome1ws2.zirakzigil.org)
	(postmaster@zirakzigil.org@192.168.229.11)
	by 0 with SMTP; 27 Sep 2008 14:30:57 -0000
Message-ID: <48DE439C.4050505@zirakzigil.org>
Date: Sat, 27 Sep 2008 16:30:52 +0200
From: Giulio Ferro <auryn@zirakzigil.org>
User-Agent: Thunderbird 2.0.0.0 (X11/20070513)
MIME-Version: 1.0
To: freebsd-geom@freebsd.org,  pjd@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Problems with journal?
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 27 Sep 2008 14:57:42 -0000

I'm experiencing very serious delay issues in 2 production servers.

The server are dual intel quad core
o.s. : freebsd 7 stable (yesterday) amd64, mfi driver for 2 physical 
controllers:
1) two sas disks in mirror (mfi0)
2) one dell md1000 disk enclosure with 15 sata disks in raid 60 (total 
~7 TB) (mfi1)

I created a single partition for the second disk, I put the geom_journal 
on it,
I formatted it with ufs and mounted it on /data directory.

from /etc/fstab
/dev/mfid1.journal      /data           ufs     rw,async        2       0

Everything seemed to work nice until I put the servers in production under
a heavy load.

In fact, I found that something was wrong when the standard maintenance 
script
/etc/periodic/daily/450.status-security
would very nearly freeze the server at night.
When this happened the server would get incredibly slow, and it took minutes
for me to ssh login and kill the process hanging the system.

The process which hung the system was a simple find on the /data partition.
During this the machine became unresponsive, and the db which stays on the
same partition didn't serve requests any more.

During this hang a "top" command would show a 20% system and 100% for
the find process.

In order to check whether the problem was with the journal, I enabled 
debugging:
sysctl kern.geom.journal.debug=1
I started the "killer" script and waited.

Nothing happened for several minutes, then the problem would show its 
ugly face
again and here is what I got in the logs:

(normal behavior)
Sep 27 15:20:38 dcsdata1 kernel: GEOM_JOURNAL[1]: Sync time of /data: 
0.043653s
Sep 27 15:20:38 dcsdata1 kernel: GEOM_JOURNAL[1]: BIO_FLUSH time of 
mfid1: 0.000001s
Sep 27 15:20:38 dcsdata1 kernel: GEOM_JOURNAL[1]: Suspend time of /data: 
0.087174s
Sep 27 15:20:38 dcsdata1 kernel: GEOM_JOURNAL[1]: Starting copy of journal.
Sep 27 15:20:38 dcsdata1 kernel: GEOM_JOURNAL[1]: Switch time of mfid1: 
0.003278s
Sep 27 15:20:38 dcsdata1 kernel: GEOM_JOURNAL[1]: Entire switch time: 
0.172709s
Sep 27 15:20:38 dcsdata1 kernel: GEOM_JOURNAL[1]: Data has been copied.
Sep 27 15:20:48 dcsdata1 kernel: GEOM_JOURNAL[1]: Entire switch time: 
0.000005s
Sep 27 15:20:58 dcsdata1 kernel: GEOM_JOURNAL[1]: Entire switch time: 
0.000005s
Sep 27 15:21:08 dcsdata1 kernel: GEOM_JOURNAL[1]: Msync time of /data: 
0.029507s
Sep 27 15:21:08 dcsdata1 kernel: GEOM_JOURNAL[1]: Sync time of /data: 
0.053562s
Sep 27 15:21:08 dcsdata1 kernel: GEOM_JOURNAL[1]: BIO_FLUSH time of 
mfid1: 0.000002s
Sep 27 15:21:08 dcsdata1 kernel: GEOM_JOURNAL[1]: Suspend time of /data: 
0.086662s
Sep 27 15:21:08 dcsdata1 kernel: GEOM_JOURNAL[1]: Starting copy of journal.
Sep 27 15:21:08 dcsdata1 kernel: GEOM_JOURNAL[1]: Switch time of mfid1: 
0.003178s
Sep 27 15:21:08 dcsdata1 kernel: GEOM_JOURNAL[1]: Entire switch time: 
0.183835s
Sep 27 15:21:08 dcsdata1 kernel: GEOM_JOURNAL[1]: Data has been copied.

(starts abnormal behavior)
Sep 27 15:24:40 dcsdata1 kernel: GEOM_JOURNAL[1]: Suspend time of /data: 
181.380868s
Sep 27 15:24:40 dcsdata1 kernel: GEOM_JOURNAL[1]: Starting copy of journal.
Sep 27 15:24:40 dcsdata1 kernel: GEOM_JOURNAL[1]: Switch time of mfid1: 
0.003105s
Sep 27 15:24:40 dcsdata1 kernel: GEOM_JOURNAL[1]: Entire switch time: 
181.473032s
Sep 27 15:24:40 dcsdata1 kernel: GEOM_JOURNAL[1]: Data has been copied.
Sep 27 15:24:50 dcsdata1 kernel: GEOM_JOURNAL[1]: Entire switch time: 
0.000007s
Sep 27 15:25:00 dcsdata1 kernel: GEOM_JOURNAL[1]: Msync time of /data: 
0.029305s
Sep 27 15:25:00 dcsdata1 kernel: GEOM_JOURNAL[1]: Sync time of /data: 
0.045913s
Sep 27 15:25:00 dcsdata1 kernel: GEOM_JOURNAL[1]: BIO_FLUSH time of 
mfid1: 0.000001s


After this no more logs would show, and after a long time I finally 
killed the "find" process.
In the logs I got this:
Sep 27 15:42:30 dcsdata1 kernel: GEOM_JOURNAL[1]: Suspend time of /data: 
1050.197022s
Sep 27 15:42:30 dcsdata1 kernel: GEOM_JOURNAL[1]: Starting copy of journal.
Sep 27 15:42:30 dcsdata1 kernel: GEOM_JOURNAL[1]: Switch time of mfid1: 
0.003276s
Sep 27 15:42:30 dcsdata1 kernel: GEOM_JOURNAL[1]: Entire switch time: 
1050.282609s
Sep 27 15:42:30 dcsdata1 kernel: GEOM_JOURNAL[1]: Data has been copied.


and then everything got normal again:
Sep 27 15:42:40 dcsdata1 kernel: GEOM_JOURNAL[1]: Msync time of /data: 
0.026546s
Sep 27 15:42:40 dcsdata1 kernel: GEOM_JOURNAL[1]: Sync time of /data: 
0.043528s
Sep 27 15:42:40 dcsdata1 kernel: GEOM_JOURNAL[1]: BIO_FLUSH time of 
mfid1: 0.000002s
Sep 27 15:42:40 dcsdata1 kernel: GEOM_JOURNAL[1]: Suspend time of /data: 
0.041300s
Sep 27 15:42:40 dcsdata1 kernel: GEOM_JOURNAL[1]: Starting copy of journal.
Sep 27 15:42:40 dcsdata1 kernel: GEOM_JOURNAL[1]: Switch time of mfid1: 
0.005283s
Sep 27 15:42:40 dcsdata1 kernel: GEOM_JOURNAL[1]: Entire switch time: 
0.133709s
Sep 27 15:42:40 dcsdata1 kernel: GEOM_JOURNAL[1]: Data has been copied.
Sep 27 15:42:50 dcsdata1 kernel: GEOM_JOURNAL[1]: Msync time of /data: 
0.025201s
Sep 27 15:42:50 dcsdata1 kernel: GEOM_JOURNAL[1]: Sync time of /data: 
0.036892s
Sep 27 15:42:50 dcsdata1 kernel: GEOM_JOURNAL[1]: BIO_FLUSH time of 
mfid1: 0.000001s
Sep 27 15:42:50 dcsdata1 kernel: GEOM_JOURNAL[1]: Suspend time of /data: 
0.036874s
Sep 27 15:42:50 dcsdata1 kernel: GEOM_JOURNAL[1]: Starting copy of journal.
Sep 27 15:42:50 dcsdata1 kernel: GEOM_JOURNAL[1]: Switch time of mfid1: 
0.002075s
Sep 27 15:42:50 dcsdata1 kernel: GEOM_JOURNAL[1]: Entire switch time: 
0.108044s
Sep 27 15:42:50 dcsdata1 kernel: GEOM_JOURNAL[1]: Data has been copied.
Sep 27 15:43:01 dcsdata1 kernel: GEOM_JOURNAL[1]: Msync time of /data: 
0.026738s
Sep 27 15:43:01 dcsdata1 kernel: GEOM_JOURNAL[1]: Sync time of /data: 
0.036973s
Sep 27 15:43:01 dcsdata1 kernel: GEOM_JOURNAL[1]: BIO_FLUSH time of 
mfid1: 0.000001s
Sep 27 15:43:01 dcsdata1 kernel: GEOM_JOURNAL[1]: Suspend time of /data: 
0.035504s
Sep 27 15:43:01 dcsdata1 kernel: GEOM_JOURNAL[1]: Starting copy of journal.
Sep 27 15:43:01 dcsdata1 kernel: GEOM_JOURNAL[1]: Switch time of mfid1: 
0.003822s
Sep 27 15:43:01 dcsdata1 kernel: GEOM_JOURNAL[1]: Entire switch time: 
0.110049s
Sep 27 15:43:01 dcsdata1 kernel: GEOM_JOURNAL[1]: Data has been copied.


 From what I can see the mfi driver and the hardware below it are not to 
blame for the abysmal performance,
but the journal probably is.

I hope any of you can help me look in the right direction, and point me 
to any further tests to try or tunable
to set...

Thanks in advance.