From owner-p4-projects@FreeBSD.ORG Tue Jul 3 21:23:07 2007 Return-Path: X-Original-To: p4-projects@freebsd.org Delivered-To: p4-projects@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 32767) id 2313516A469; Tue, 3 Jul 2007 21:23:07 +0000 (UTC) X-Original-To: perforce@FreeBSD.org Delivered-To: perforce@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C4AC516A41F for ; Tue, 3 Jul 2007 21:23:06 +0000 (UTC) (envelope-from lulf@FreeBSD.org) Received: from repoman.freebsd.org (repoman.freebsd.org [69.147.83.41]) by mx1.freebsd.org (Postfix) with ESMTP id B350E13C487 for ; Tue, 3 Jul 2007 21:23:06 +0000 (UTC) (envelope-from lulf@FreeBSD.org) Received: from repoman.freebsd.org (localhost [127.0.0.1]) by repoman.freebsd.org (8.13.8/8.13.8) with ESMTP id l63LN67A027569 for ; Tue, 3 Jul 2007 21:23:06 GMT (envelope-from lulf@FreeBSD.org) Received: (from perforce@localhost) by repoman.freebsd.org (8.13.8/8.13.8/Submit) id l63LN66F027566 for perforce@freebsd.org; Tue, 3 Jul 2007 21:23:06 GMT (envelope-from lulf@FreeBSD.org) Date: Tue, 3 Jul 2007 21:23:06 GMT Message-Id: <200707032123.l63LN66F027566@repoman.freebsd.org> X-Authentication-Warning: repoman.freebsd.org: perforce set sender to lulf@FreeBSD.org using -f From: Ulf Lilleengen To: Perforce Change Reviews Cc: Subject: PERFORCE change 122803 for review X-BeenThere: p4-projects@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: p4 projects tree changes List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Jul 2007 21:23:07 -0000 http://perforce.freebsd.org/chv.cgi?CH=122803 Change 122803 by lulf@lulf_carrot on 2007/07/03 21:22:13 - Clean up code and remove debug printfs. - Add initial testplan draft. I also have some testscripts that will be comitted when they're ready. - Add my scratchpad TODO. Affected files ... .. //depot/projects/soc2007/lulf/TESTPLAN#1 add .. //depot/projects/soc2007/lulf/TODO#3 edit .. //depot/projects/soc2007/lulf/gvinum_fixup/sbin/gvinum/gvinum.c#11 edit .. //depot/projects/soc2007/lulf/gvinum_fixup/sys/geom/vinum/geom_vinum_create.c#2 edit Differences ... ==== //depot/projects/soc2007/lulf/TODO#3 (text+ko) ==== @@ -8,3 +8,148 @@ 6. Make sure other parts is function correctly, and implement what perhaps is not implemented yet. 7. Run some tests to make sure the new gvinum code-base is good enough. + + +### MY internal scratchpad. This might be outdated, but I'm trying to mark the +things I've finished with DONE. + +1. Find out if we can support adding subdisks to a raid5 plex. + +2. Need a way to tell if we're adding a subdisk to a degraded raid5 plex or not. +This means we only add subdisks to a raid5 plex if the original size is larger +than the size during attachment. One way is to have a original size of the plex. +Another way is to allow a subdisk to attach itself to a plex _if_ the plex is +degraded and all other subdisks are up. The best would probably be to have a +pled->origsize field to keep the original size of the plex intact. This could +expand if we later want to expand a raid5 array. DONE. + +3. Must make sure that we don't use a subdisks plex if it's detached (same with +plex and volume). NO + +4. Make sure gv_plex_size uses the smallest stripe... makes it possible to have +unequal sized subdisks, but only parts of it is used... ?NO + +5. To support mounted raid5 rebuild... we need to check if both rebuild and +normal request BIO's can work side by side... As long as it is in the degraded +state perhaps? + +6. To get syncing working... we need a way to read BIOs from one plex and write +them to another plex. One way of doing this is that we issue a read-bio for one +plex, and when that bio returns, it will issue a BIO with the data to be written +to a destination plex. Either, issue all reads at once, or issue one read, and +let the 'done'-routines handle issueing of the next bios. +In both ways, we need a way to tell the destination of a write, or where the +original data came from. DONE + +Todo: +* Need to check if it works to use bio_caller1. Doesn't, its used, but + bio_caller2 works... for now +* Try to make send_parity work for sync as well. NO, created another one. +* Restructure gv_sync_completed and split it a bit up to create a better + abstraction. DONE +* Test it. DONE +* Make sure we don't have write between syncing. + Krav: + - Skal kunne lese av mirror mens mountet. + - Skal kunne skrive til mirror mens mountet. + - Må delaye writes. + - Reads som kommer når vi har noe i queuen må også delayes. + + To achieve this, we can - for each BIO that comes down to a volume - + check that if one plex is syncing, we put the BIO in a wait-queue that + is only processed when NO plexes are in GV_PLEX_SYNCING state. + +7. Investigate problem with setting plex state to up even with forcing. Fixed. +8. Check why panic when using mirror. + It's because a BIO is shared between all plexes for a write. Se my + comment in the hack in gv_bio_done for more explanation. +9. Check weird states showing up when doing rebuild/sync etc and prevent from +config being listed up all the time. Config not listed anymore. +10. Make sure we can use bio_caller2 in sync. +11. Change event structure to include non-pointer parameters for flags etc. +12. Since GV_SD_INITIALIZING is used by gv_sync and gv_rebuild, we should +perhaps update s->initialized, or create a new state called syncing or +something. Because gv_list gives a weird output when s->initialized is not used. +Perhaps have another state. + +13. Don't crash when we try rebuild on a up plex with only down/stale subdisks. +Perhaps it's okay since we're forcing the state anyway and the user literally +asks for it. + +14. When a raid5 plex is created, the subdisks are stale and the plex is down, +but it shouldn't. When a mirror is created, the second one is stale, but it +shouldn't. + +15. Implement initialization. Done + +16. Initially, parity is wrong... so must remember to rebuild, but should this +be necessary? No, init is used + + +* What may this problem be? Me using bio_caller1? No, we don't have enough bits +* for our flags :) fixed. +* Got a recursing lock issue. Fixed. + +* Syncing problem. Lukas proposed: +> +> Well, that's even easier than the RAID5 rebuild problem: reads are only +> served from the "good" plex, writes go through to all plexes, as the new +> data would be written anyway (after a read from the "good" plex). Some +> locking would be advisable, though. + +The flaw with this is that if a write is done on a plex, it can be overwritten +by a sync write, so it will have to wait for the sync to be done, and then +overwrite the sync. We could perhaps examine if the offset is greater than +current synced offset, and then we again have to make sure all plexes are beyond +that offset. + +Suggestion for handling mounted rebuilds: +Imagine this situation: you have rebuilt the parity up to offset X of +the RAID5 plex. I/O that requests something from below X can go through +normally, I/O that requests something beyond X needs to run like in +degraded mode. The trick is to get the locking right, so that the +rebuild process and the normal I/O don't interfere when they are around +the some offset. + +* It's already checked that if a plex is syncing and synced > the offset + + length, then the request can be served. +* Otherwise, the request will be served like a degraded or no-parity. The + problem is that if NOPARITY or REBUILD is detected (begause s->state is not + UP), data will be written without parity, and we'd have to rebuild parity + or write the data afterwards. + +15. Must test and debug! Check why hanging while rebuilding and copying at the +same time (initiating copy first) + +* Bug #1: When trying to newfs on a degraded volume where the _first_ subdisk is + down, i get cg 0: bad magic number. Hmmm.... + Actually, it's a bug in RAID5 degraded write/read code. We got a data + corruption bug. Fixed, we forgot to write correct parity data. + + +* Check if we need to check for REBUILD flag when syncing. + +ISSUES: +* We don't but delayed requests due to rebuild on the global queue, since + A) They shouldn't be run by all plexes. + B) They were issued before new I/Os, so they should be sendt down before + them. +* Delayed requests due to REBUILD currently use it's own BIO queue. Perhaps we + should rework the way plex queues are used. + +18. Rewrite rename, move. DONE. + + +19. Concat/mirror/stripe + * Right now, we ask the kernel for each drive. Another way could be to + tell the kernel to send the whole config, and create parse routines + for it. + + * Fix padding in gv_volume for userland. DONE + +20. Make sync/rebuild requests also get delayed due to rebuild requests? + +20. When to trondheim. Write papers for status report. + +21. Updateman-page + ==== //depot/projects/soc2007/lulf/gvinum_fixup/sbin/gvinum/gvinum.c#11 (text+ko) ==== @@ -375,7 +375,6 @@ gvinum_concat(int argc, char **argv) { - printf("Preparing\n"); if (argc < 2) { warnx("usage:\tconcat [-fv] [-n name] drives\n"); return; @@ -396,7 +395,6 @@ plexes = subdisks = volumes = 0; drives = 1; - printf("Trying to create drive on %s\n", device); /* Strip away eventual /dev/ in front. */ if (strncmp(device, "/dev/", 5) == 0) @@ -406,7 +404,6 @@ if (drivename == NULL) return (NULL); - fprintf(stderr, "Drivename %s is okay\n", drivename); req = gctl_get_handle(); gctl_ro_param(req, "class", -1, "VINUM"); gctl_ro_param(req, "verb", -1, "create"); @@ -422,12 +419,10 @@ gctl_ro_param(req, "volumes", sizeof(int), &volumes); gctl_ro_param(req, "plexes", sizeof(int), &plexes); gctl_ro_param(req, "subdisks", sizeof(int), &subdisks); - fprintf(stderr, "Creating drive request sent to kernel...\n"); errstr = gctl_issue(req); if (errstr != NULL) warnx("error creating drive: %s", errstr); gctl_free(req); - printf("Done creating drive %s on %s\n", drivename, device); return (drivename); } @@ -474,11 +469,8 @@ } /* Find a free volume name. */ - if (volname == NULL) { - fprintf(stderr, "Finding name\n"); + if (volname == NULL) volname = find_name("gvinumvolume", GV_TYPE_VOL, GV_MAXVOLNAME); - fprintf(stderr, "Found name %s\n", volname); - } /* Then we send a request to actually create the volumes. */ gctl_ro_param(req, "verb", -1, verb); @@ -523,7 +515,6 @@ comment[0] = '\0'; /* Find a name. Fetch out configuration first. */ - printf("Fetching configuration\n"); req = gctl_get_handle(); gctl_ro_param(req, "class", -1, "VINUM"); gctl_ro_param(req, "verb", -1, "getconfig"); @@ -536,7 +527,6 @@ } gctl_free(req); - printf("Got configuration:\n"); printf(buf); begin = 0; len = strlen(buf); @@ -553,7 +543,6 @@ if (buf[i] == '\n' || buf[i] == '\0') { ptr = buf + begin; strlcpy(line, ptr, (i - begin) + 1); - printf("Processing line: .%s.\n", line); begin = i + 1; switch (type) { case GV_TYPE_DRIVE: @@ -572,9 +561,7 @@ } if (name == NULL) continue; - printf("Found a name: .%s.\n", name); if (!strcmp(sname, name)) { - printf("Conflicts, try next\n"); conflict = 1; /* XXX: Could quit the loop earlier. */ } @@ -928,45 +915,6 @@ if (errstr) warnx("%s\n", errstr); gctl_free(req); -/* do { - rv = 0; - req = gctl_get_handle(); - gctl_ro_param(req, "class", -1, "VINUM"); - gctl_ro_param(req, "verb", -1, "parityop"); - gctl_ro_param(req, "flags", sizeof(int), &flags); - gctl_ro_param(req, "rebuild", sizeof(int), &rebuild); - gctl_rw_param(req, "rv", sizeof(int), &rv); - gctl_rw_param(req, "offset", sizeof(off_t), &offset); - gctl_ro_param(req, "plex", -1, argv[0]); - errstr = gctl_issue(req); - if (errstr) { - warnx("%s\n", errstr); - gctl_free(req); - break; - } - gctl_free(req); - if (flags & GV_FLAG_V) { - printf("\r%s at %s ... ", msg, - gv_roughlength(offset, 1)); - } - if (rv == 1) { - printf("Parity incorrect at offset 0x%jx\n", - (intmax_t)offset); - if (!rebuild) - break; - } - fflush(stdout); - - Clear the -f flag. - flags &= ~GV_FLAG_F; - } while (rv >= 0);*/ - -/* if ((rv == 2) && (flags & GV_FLAG_V)) { - if (rebuild) - printf("Rebuilt parity on %s\n", argv[0]); - else - printf("%s has correct parity\n", argv[0]); - }*/ } void ==== //depot/projects/soc2007/lulf/gvinum_fixup/sys/geom/vinum/geom_vinum_create.c#2 (text+ko) ==== @@ -54,18 +54,15 @@ sc = gp->softc; dcount = 0; - printf("Starting to create concat device\n"); vol = gctl_get_param(req, "name", NULL); if (vol == NULL) { gctl_error(req, "volume's not given"); return; } - printf("Got first parameter name: %s\n", vol); flags = gctl_get_paraml(req, "flags", sizeof(*flags)); drives = gctl_get_paraml(req, "drives", sizeof(*drives)); - printf("Got both flags and drives\n"); if (drives == NULL) { gctl_error(req, "drives not given"); return; @@ -85,18 +82,11 @@ p->stripesize = 0; gv_post_event(sc, GV_EVENT_CREATE_PLEX, p, NULL, 0, 0); - /* XXX: We'll might have a problem making sure all of these things exists - * when we come into the event loop. */ - - printf("We have created the volume and plex. Time to create subdisks\n"); - /* Drives are first (only right now) priority */ + /* Drives are first (right now) priority */ for (dcount = 0; dcount < *drives; dcount++) { snprintf(buf, sizeof(buf), "drive%d", dcount); drive = gctl_get_param(req, buf, NULL); - printf("We're looking for drive %s\n", drive); d = gv_find_drive(sc, drive); - printf("Creating subdisk for drive %s on device %s\n", d->name, - d->device); if (d == NULL) { gctl_error(req, "No such drive '%s'", drive); continue; @@ -110,6 +100,7 @@ s->size = -1; gv_post_event(sc, GV_EVENT_CREATE_SD, s, NULL, 0, 0); } + gv_post_event(sc, GV_EVENT_SAVE_CONFIG, sc, NULL, 0, 0); } /* @@ -130,26 +121,21 @@ dcount = 0; scount = 0; pcount = 0; - printf("Starting to create mirror device\n"); vol = gctl_get_param(req, "name", NULL); if (vol == NULL) { gctl_error(req, "volume's not given"); return; } - printf("Got first parameter name: %s\n", vol); flags = gctl_get_paraml(req, "flags", sizeof(*flags)); drives = gctl_get_paraml(req, "drives", sizeof(*drives)); - printf("Got both flags and drives\n"); - /* XXX: Remove when we have support for drivegroups. */ if (drives == NULL) { gctl_error(req, "drives not given"); return; } /* We must have an even number of drives. */ - /* XXX: We need supported for striped mirror as well. */ if (*drives % 2 != 0) { gctl_error(req, "must have an even number of drives"); return; @@ -186,10 +172,7 @@ for (dcount = pcount; dcount < *drives; dcount += 2) { snprintf(buf, sizeof(buf), "drive%d", dcount); drive = gctl_get_param(req, buf, NULL); - printf("We're looking for drive %s\n", drive); d = gv_find_drive(sc, drive); - printf("Creating subdisk for drive %s on device %s\n", - d->name, d->device); if (d == NULL) { gctl_error(req, "No such drive '%s'", drive); /* XXX: Should we fail instead? */ @@ -208,6 +191,7 @@ scount++; } } + gv_post_event(sc, GV_EVENT_SAVE_CONFIG, sc, NULL, 0, 0); } /* @@ -227,7 +211,6 @@ sc = gp->softc; dcount = 0; pcount = 0; - printf("Starting to create striped device\n"); vol = gctl_get_param(req, "name", NULL); if (vol == NULL) { gctl_error(req, "volume's not given"); @@ -236,8 +219,6 @@ flags = gctl_get_paraml(req, "flags", sizeof(*flags)); drives = gctl_get_paraml(req, "drives", sizeof(*drives)); - printf("Got both flags and drives\n"); - /* XXX: Remove when we have support for drivegroups. */ if (drives == NULL) { gctl_error(req, "drives not given"); return; @@ -268,10 +249,7 @@ for (dcount = 0; dcount < *drives; dcount++) { snprintf(buf, sizeof(buf), "drive%d", dcount); drive = gctl_get_param(req, buf, NULL); - printf("We're looking for drive %s\n", drive); d = gv_find_drive(sc, drive); - printf("Creating subdisk for drive %s on device %s\n", d->name, - d->device); if (d == NULL) { gctl_error(req, "No such drive '%s'", drive); continue; @@ -285,4 +263,5 @@ s->size = -1; gv_post_event(sc, GV_EVENT_CREATE_SD, s, NULL, 0, 0); } + gv_post_event(sc, GV_EVENT_SAVE_CONFIG, sc, NULL, 0, 0); }