Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Sep 2012 09:00:22 +0100
From:      Matt Burke <mattblists@icritical.com>
To:        <freebsd-fs@FreeBSD.org>
Subject:   [patch] DTrace disk IO
Message-ID:  <5056D896.8060607@icritical.com>

next in thread | raw e-mail | index | archive | help
I've recently been trying to get a grip on measuring disk IO latency (per transaction), and have found it to be rather difficult given the asynchronous nature of the beast, and also I can't find a way of translating the bio start of transaction timestamps to anything I can use in DTrace when pulling them out.

So I knocked up this little patch against releng/9.1 to put a couple of DTrace probes in the right places to pick up crucial data like the now+then timestamps while they're present.

The predicate on the probe is needed to pick up the right firing - for reasons I've not been able to fathom because gstat et al give correct data, devstat_end_transaction is called multiple times for a given operation - from g_disk_done(), then g_io_deliver() - without anything useful in the bio struct (device name, number, etc). There also seem to be a lot of firings coming from the following path which I don't understand, again without anything useful in the bio:
              kernel`devstat_end_transaction+0x13b
              kernel`g_io_deliver+0x1b0
              kernel`g_io_schedule_up+0xa6
              kernel`g_up_procbody+0x5c
              kernel`fork_exit+0x11f
              kernel`0xffffffff80c1c3fe

Catching flushes is also proving problematic. It would seem that devstat_end_transaction_bio() is called, but the bio and devstat structs are virtually empty. bp->bio_dev, bp->bio_disk, ds->device_name, ds->device_number, ds_unit_number are all null/empty, so I know that one disk has flushed, and I know how long it took, but I can't find out which disk it was.


Thoughts?



Index: sys/kern/subr_devstat.c
===================================================================
--- sys/kern/subr_devstat.c	(revision 240481)
+++ sys/kern/subr_devstat.c	(working copy)
@@ -29,6 +29,7 @@
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
+#include "opt_kdtrace.h"
 #include <sys/param.h>
 #include <sys/kernel.h>
 #include <sys/systm.h>
@@ -41,9 +42,22 @@
 #include <sys/conf.h>
 #include <vm/vm.h>
 #include <vm/pmap.h>
+#include <sys/sdt.h>
 
 #include <machine/atomic.h>
 
+SDT_PROVIDER_DEFINE(devstat);
+SDT_PROBE_DEFINE(devstat, subr_devstat, devstat_end_transaction, stat, stat);
+SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, stat, 0, "struct devstat *");
+SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, stat, 1, "uint32_t");
+SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, stat, 2, "struct bintime *");
+SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, stat, 3, "struct bintime *");
+SDT_PROBE_DEFINE(devstat, subr_devstat, devstat_end_transaction_bio, stat, stat);
+SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction_bio, stat, 0, "struct devstat *");
+SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction_bio, stat, 1, "struct bio *");
+
+
+
 static int devstat_num_devs;
 static long devstat_generation = 1;
 static int devstat_version = DEVSTAT_VERSION;
@@ -312,6 +326,8 @@
 
 	ds->end_count++;
 	atomic_add_rel_int(&ds->sequence0, 1);
+
+	SDT_PROBE(devstat, subr_devstat, devstat_end_transaction, stat, ds, bytes, now, then, 0);
 }
 
 void
@@ -332,6 +348,8 @@
 	else 
 		flg = DEVSTAT_NO_DATA;
 
+	SDT_PROBE(devstat, subr_devstat, devstat_end_transaction_bio, stat, ds, bp, 0, 0, 0);
+
 	devstat_end_transaction(ds, bp->bio_bcount - bp->bio_resid,
 				DEVSTAT_TAG_SIMPLE, flg, NULL, &bp->bio_t0);
 }




Sample dtrace script:
=====================

BEGIN
{
        bio_cmds[1] =  "READ";
        bio_cmds[2] =  "WRITE";
        bio_cmds[4] =  "DELETE";
        bio_cmds[8] =  "GETATTR";
        bio_cmds[16] = "FLUSH";
}

devstat::devstat_end_transaction_bio:
{
        self->bio = args[1];
}

devstat::devstat_end_transaction:
/self->bio && args[0]->device_number/
{
        diff_frac = args[2]->frac - args[3]->frac;
        diff_ufrac = (diff_frac < 0) ? (args[3]->frac - args[2]->frac) : diff_frac;
        diff = (1000000000*(diff_ufrac>>32))>>32;

        printf("%d\t%s%d\t%s\t%d\t0.%09d\n", timestamp,
                args[0]->device_name, args[0]->unit_number,
                bio_cmds[self->bio->bio_cmd],
                args[1],
                diff
                );
}


-- 
The information contained in this message is confidential and intended for the addressee only. If you have received this message in error, or there are any problems with its content, please contact the sender. 

iCritical is a trading name of Critical Software Ltd. Registered in England: 04909220.
Registered Office: IC2, Keele Science Park, Keele, Staffordshire, ST5 5NH.

This message has been scanned for security threats by iCritical. www.icritical.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5056D896.8060607>