Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Jul 2012 00:00:24 -0400
From:      Charles Owens <cowens@greatbaysoftware.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: ? IO performance regression, post 8.1
Message-ID:  <4FFBA8D8.3040604@greatbaysoftware.com>
In-Reply-To: <201206221022.57632.jhb@freebsd.org>
References:  <4FDABA0B.5030702@greatbaysoftware.com> <201206150804.46341.jhb@freebsd.org> <4FE3DA14.9090506@greatbaysoftware.com> <201206221022.57632.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

Charles Owens
Great Bay Software, Inc.
v: 603.617.4844   m: 603.866.0860

On 6/22/12 10:22 AM, John Baldwin wrote:
> On Thursday, June 21, 2012 10:36:04 pm Charles Owens wrote:
>> On 6/15/12 8:04 AM, John Baldwin wrote:
>>> On Friday, June 15, 2012 12:28:59 am Charles Owens wrote:
>>>> Hello FreeBSD folk,
>>>>
>>>> We're seeing what appears to be a storage performance regression as we
>>>> try to move from 8.1 (i386) to 8.3.   We looked at 8.2 also and it
>>>> appears that the regression happened between 8.1 and 8.2.
>>>>
>>>> Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core CPUs.
>>>> Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10
>>>> configuration, using UFS + geom_journal for filesystem.
>>>>
>>>> Postgresql performance, as seen via pgbench, dropped by approx 20%.
>>>> This testing was done with our usual PAE-enabled kernels.  We then went
>>>> back to GENERIC kernels and did comparisons using "bonnie", results
>>>> below.  Following that is a kernel boot log.
>>>>
>>>> Notably, we're seeing this regression only with our RAID mfi(4) based
>>>> systems.  Notably, from looking at FreeBSD source changelogs it appears
>>>> that the mfi(4) code has seen some changes since 8.1.
>>> Between 8.1 and 8.2 mfi has not had any significant changes.  The only changes
>>> made to sys/dev/mfi were to add a new constant:
>>>
>>>> svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi
>>> svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi
>>> Index: mfireg.h
>>> ===================================================================
>>> --- mfireg.h    (.../8.1/sys/dev/mfi)   (revision 237134)
>>> +++ mfireg.h    (.../8.2/sys/dev/mfi)   (revision 237134)
>>> @@ -975,7 +975,9 @@
>>>           MFI_PD_STATE_OFFLINE = 0x10,
>>>           MFI_PD_STATE_FAILED = 0x11,
>>>           MFI_PD_STATE_REBUILD = 0x14,
>>> -       MFI_PD_STATE_ONLINE = 0x18
>>> +       MFI_PD_STATE_ONLINE = 0x18,
>>> +       MFI_PD_STATE_COPYBACK = 0x20,
>>> +       MFI_PD_STATE_SYSTEM = 0x40
>>>    };
>>>    
>>>    union mfi_ld_ref {
>>>
>>> The difference in write performance must be due to something else.  You
>>> mentioned you are using UFS + gjournal.  I think gjournal uses BIO_FLUSH, so I
>>> wonder if this is related:
>>>
>>> ------------------------------------------------------------------------
>>> r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61 lines
>>>
>>> MFC 212160:
>>>
>>> Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic.
>>> Add the BIO_ORDERED flag for struct bio and update bio clients to use it.
>>>
>>> The barrier semantics of bioq_insert_tail() were broken in two ways:
>>>
>>>    o In bioq_disksort(), an added bio could be inserted at the head of
>>>      the queue, even when a barrier was present, if the sort key for
>>>      the new entry was less than that of the last queued barrier bio.
>>>
>>>    o The last_offset used to generate the sort key for newly queued bios
>>>      did not stay at the position of the barrier until either the
>>>      barrier was de-queued, or a new barrier (which updates last_offset)
>>>      was queued.  When a barrier is in effect, we know that the disk
>>>      will pass through the barrier position just before the
>>>      "blocked bios" are released, so using the barrier's offset for
>>>      last_offset is the optimal choice.
>>>
>>> sys/geom/sched/subr_disk.c:
>>> sys/kern/subr_disk.c:
>>>           o Update last_offset in bioq_insert_tail().
>>>
>>>           o Only update last_offset in bioq_remove() if the removed bio is
>>>             at the head of the queue (typically due to a call via
>>>             bioq_takefirst()) and no barrier is active.
>>>
>>>           o In bioq_disksort(), if we have a barrier (insert_point is non-NULL),
>>>             set prev to the barrier and cur to it's next element.  Now that
>>>             last_offset is kept at the barrier position, this change isn't
>>>             strictly necessary, but since we have to take a decision branch
>>>             anyway, it does avoid one, no-op, loop iteration in the while
>>>             loop that immediately follows.
>>>
>>>           o In bioq_disksort(), bypass the normal sort for bios with the
>>>             BIO_ORDERED attribute and instead insert them into the queue
>>>             with bioq_insert_tail().  bioq_insert_tail() not only gives
>>>             the desired command order during insertion, but also provides
>>>             barrier semantics so that commands disksorted in the future
>>>             cannot pass the just enqueued transaction.
>>>
>>> sys/sys/bio.h:
>>>           Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio.
>>>
>>> sys/cam/ata/ata_da.c:
>>> sys/cam/scsi/scsi_da.c
>>>           Use an ordered command for SCSI/ATA-NCQ commands issued in
>>>           response to bios with the BIO_ORDERED flag set.
>>>
>>> sys/cam/scsi/scsi_da.c
>>>           Use an ordered tag when issuing a synchronize cache command.
>>>
>>>           Wrap some lines to 80 columns.
>>>
>>> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>> sys/geom/geom_io.c
>>>           Mark bios with the BIO_FLUSH command as BIO_ORDERED.
>>>
>>> Sponsored by:   Spectra Logic Corporation
>>> ------------------------------------------------------------------------
>>>
>>> Can you try perhaps commenting out the 'bp->bio_flags |= BIO_ORDERED' line
>>> changed in geom_io.c in 8.2?  That would be effectively reverting this
>>> portion of the diff:
>>>
>>> Index: geom_io.c
>>> ===================================================================
>>> --- geom_io.c   (.../8.1/sys/geom)      (revision 237134)
>>> +++ geom_io.c   (.../8.2/sys/geom)      (revision 237134)
>>> @@ -265,6 +265,7 @@
>>>           g_trace(G_T_BIO, "bio_flush(%s)", cp->provider->name);
>>>           bp = g_alloc_bio();
>>>           bp->bio_cmd = BIO_FLUSH;
>>> +       bp->bio_flags |= BIO_ORDERED;
>>>           bp->bio_done = NULL;
>>>           bp->bio_attribute = NULL;
>>>           bp->bio_offset = cp->provider->mediasize;
>>>
>> John... thanks for the suggestion.  I've built and tested a kernel with
>> this change made.  Result:  no change (same performance as with
>> 8.2-GENERIC). Any thoughts as to where to go next?
> Hmm.  That seemed the most plausible candidate when I looked at this.
>
> Do you use quotas (there is one change in UFS related to quotas)?
>
> There are 5 changes that involve sys/kern/vfs_bio.c in 8.2:
> 209459, 212229, 212562, 212583, and 213890.
>
> Can you possibly test out kernels from stable/8 at those revisions on an 8.1
> world and see if you can narrow it down futher?
>
> Barring that, can you do a binary search of kernels from stable/8 between 8.1
> and 8.2 on an 8.1 world to see which commit caused the change in write
> performance?
>

I've been sidetracked for a bit... and am now starting to work through 
the revisions you've suggested.  Ahead of that, I've reinstalled the 
system with filesystems configured *without* geom_journal and repeated 
tests with both 8.1 and 8.2 kernels. Finding:  no issue -- in fact 8.2 
performs slightly better than 8.1, as we might hope for in general.  
(bonnie output below)  Not sure if this really helps us narrow things 
down, but geom_journal is certainly part of the story.

I'll have more results in the next day or so.

8.1 GENERIC
               -------Sequential Output-------- ---Sequential Input-- --Random--
               -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
           100 196050 99.9 344414 43.4 535679 43.2 152666 99.7 3233854 100.0 251145.9 215.2

8.2 GENERIC
               -------Sequential Output-------- ---Sequential Input-- --Random--
               -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
           100 190687 97.8 436428 43.0 537990 42.9 155766 98.3 4268089 100.0 231347.6 240.6





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FFBA8D8.3040604>