Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 2 May 2011 16:36:01 -0700
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Jan Koum <jan@whatsapp.com>
Cc:        freebsd-fs@freebsd.org, Chris Peiffer <chris@whatsapp.com>
Subject:   Re: very strange IO issue with FreeBSD 8 and SSD
Message-ID:  <20110502233601.GA29710@icarus.home.lan>
In-Reply-To: <BANLkTin-qEoxxFbjJkDaA_-UZMkza08NNQ@mail.gmail.com>
References:  <BANLkTin-qEoxxFbjJkDaA_-UZMkza08NNQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, May 02, 2011 at 03:28:23PM -0700, Jan Koum wrote:
> hello,
> 
> we are seeing some strange activity on our FreeBSD systems running
> 8.2-PRERELEASE snapshot from early december
> 
> our system has 4 Intel SSD drives (64GB each) connected directly into
> motherboard through AHCI:
> 
> ad4: setting UDMA100
> ad4: 61057MB <SSDSA2SH064G1GC INTEL 045C8860> at ata2-master UDMA100 SATA
> 3Gb/s
> ad4: 125045424 sectors [124053C/16H/63S] 16 sectors/interrupt 1 depth queue
> [...]
> ad7: setting UDMA100
> ad7: 61057MB <SSDSA2SH064G1GC INTEL 045C8860> at ata3-slave UDMA100 SATA
> 3Gb/s
> ad7: 125045424 sectors [124053C/16H/63S] 16 sectors/interrupt 1 depth queue
> 
> $ df -h
> Filesystem     Size    Used   Avail Capacity  Mounted on
> /dev/ad4s1a     57G     24G     29G    45%    /
> /dev/ad5a       58G     17G     36G    32%    /d2
> /dev/ad7a       58G     17G     36G    32%    /d4
> /dev/ad6a       58G     17G     36G    32%    /d3
> 
> so far - so good, right?  this is where things get very bizarre:  our
> application receives data from network and writes to disk.   on average the
> file size grows to about 7Kbytes while an average file append is 300-400
> bytes.
> 
> netstat shows about 700-800Kbytes of input and our application log shows we
> write about 500Kbytes each second.  however, when i run iostat i we see
> upwards of 10MB a second written to disk (if not more).  for example:
> 
> $ iostat -KC -x 1
>                         extended device statistics             cpu
> device     r/s   w/s    kr/s    kw/s wait svc_t  %b  us ni sy in id
> ad4        9.0 423.3    45.2  4410.1    0  84.3  11   5  0  5  1 89
> ad5        9.0 420.7    44.9  4237.4    0  82.3  11
> ad6        9.0 420.6    45.1  4254.4    0  81.1  11
> ad7        9.0 420.3    44.9  4225.7    0  83.8  11
>                         extended device statistics             cpu
> device     r/s   w/s    kr/s    kw/s wait svc_t  %b  us ni sy in id
> ad4       14.9 157.9    79.5  1108.4    0  31.7  18   8  0  5  1 86
> ad5       15.9 1480.8    63.6 18886.1    0  36.4  19
> ad6       20.9 154.9    93.4  1032.9    0   7.4   4
> ad7       19.9 216.5    63.6  1450.0    0   9.2   4
>                         extended device statistics             cpu
> device     r/s   w/s    kr/s    kw/s wait svc_t  %b  us ni sy in id
> ad4       20.9 169.2   115.4  1271.7    0  39.3  13   9  0  4  1 85
> ad5       21.9 1179.1   129.4 11598.1    0  34.6  14
> ad6       14.9 140.3    39.8   925.4    0   9.4   3
> ad7       15.9 213.9    33.8  1610.0    0   7.9   3
>                         extended device statistics             cpu
> device     r/s   w/s    kr/s    kw/s wait svc_t  %b  us ni sy in id
> ad4       15.9 403.6    53.7  3208.6    0  30.0  10   8  0  6  1 85
> ad5       16.9 709.7    47.7  4691.6    0  20.2   9
> ad6       23.9 321.1    97.4  2262.3    0  12.9   7
> ad7       14.9 421.4    51.7  3437.2    0  13.3   7
> 
> (apologies in advance for bad formatting)
> 
> so, here are we are, looking at iostat output and trying to figure out how
> it can be this bad and where the discrepancy is coming from.  a few things
> to get out of the way: no, we do not have TRIM enabled yet, we would need to
> upgrade OS for that, but we don't think TRIM would make such a big
> different.  also we know that we can newfs with -b 512 -f 4096 but again, we
> also dont think that it would account for such a large IO discrepancy.
>
> any thoughts to what this could be?  has anybody seen anything similar
> before?  10MB of metadata for 500K worth of disk writes?  that can't be....
> right?

I would recommend trying ahci.ko instead of ataahci.ko.  Your device
names will change (ad4 --> ada0, ad5 --> ada1, etc.).  Just add
ahci_load="yes" to /boot/loader.conf and reboot into single-user, fix
/etc/fstab and related configuration files, and that's all you should
have to do.

We use Intel SSDs (X25-M 80GB) in our servers, also backed by UFS2 with
softupdates.  Controllers are Intel ICH7R (in AHCI mode) and Intel ICH9R
(also in AHCI mode).  We *did not* apply any 4K alignment when making
the partitions.  We use ahci.ko.  I haven't tested write speeds and all
that, but the disks work fine.

You might also try comparing iostat output to gstat output, though gstat
refreshes the screen continually making this a little difficult.

I would recommend "gstat -I500ms -f '^ad[0-9]$' and watch closely.
Change the regex, of course, if you switch to ahci.ko.

If you want to compare benchmarks, I need to know exactly what to do to
reproduce the issue you're stating.  I would prefer the traffic not come
off the network (e.g. use dd or bonnie++ or something) to rule out
problems there.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110502233601.GA29710>