FreeBSD Mail Archives

Date:      Fri, 18 Apr 2003 18:28:39 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Marko Zec <zec@tel.fer.hr>
Cc:        freebsd-stable@FreeBSD.org
Subject:   Re: PATCH: Forcible delaying of UFS (soft)updates
Message-ID:  <3EA0A647.BEC5931A@mindspring.com>
References:  <3E976EBD.C3E66EF8@tel.fer.hr> <200304182243.05739.zec@tel.fer.hr> <3EA06C07.A34F1C31@mindspring.com> <200304182348.58356.zec@tel.fer.hr>

Marko Zec wrote:
> > The main technical (not philosophical) problem with the patch
> > as it sits is that you can cause the soft updates wheel to wrap
> > around.
> 
> No, that just cannot happen. You are probably confusing rushjob with
> syncer_delayno, which gets reset to 0 each time it reaches the value of
> syncer_maxdelay. The rushjob variable simply tells the syncer how many times
> it should iterate _sequentially_ through the softupdates queues before
> getting to sleep on lbolt.

Obviously I am not explaining myself correctly.  I guess the next
step would be to offer my own patch set for doing what you are
trying to do.  Before I do that, let me try one more time.

I think that it is important that the value of syncer_delayno
needs to continue to be incremented once a second, and that the
modified sched_sync(), which with your patch no longer does this,
needs to used it's own counter.

In other words, I think that you need to implement a two handed
clock algorithm, to keep the buckets from getting too deep with
work items, and in case there is some dependency which is not
being accounted for that has been working because there is an
implicit delay of 1 second or more in vn_syncer_add_to_worklist()
calls (your patch would break this, so without taking this into
account, we would have to retest all of soft updates).

> > Then when you write things out, they write out of order.
> 
> Uhh.. NO!

Uh, yes; potentially they do.  See the implicit dependency
situation described above.  There are other cases, too, but
they are much more complicated.  I wish Kirk would speak up
in more technical detail about the problems you are potentially
introducing; they require a deep understanding of the soft
updates code.

> > The purpose of the wheel is to allow placing of operations at
> > some relative offset in the future to an outstanding operation,
> > to ensure ordering.
> 
> True. And this has not changed with my patch.

No, it has changed.  It's changed both in the depth of the
queue entries, and it's changed in the relative spacing of
implicit dependencies, and it's changed in the relative depth,
for two or more dependent operations with future offsets.

In the depth case, when the code runs, is going to stall the
system for a really long time, relatively, because there are
a number of worklists which are *substantially* deep, because
vn_syncer_add_to_worklist() was using a syncer_delano that has
been assumed to be updated once a second, and never changed
during your stall.  This means that the worklist represented by
syncer_workitem_pending[syncer_delayno] is going to contain
*almost all work* that was enqueued in the interim.

The problem with this is that in the for(;;) loop in sched_sync()
in the "if (LIST_FIRST(slp) == vp)" code block, you are likely
to run yourself into a panic.  See the comment about "sync_fsync()
moves it to a different slot so we are safe"?  That comment is no
longer true.

> > No matter what else you do, you can not allow the wheel to
> > "wrap".  Because the offsets are "future relative", that means
> > that you have to flush at some number of wheel entries equal
> > to:
> >
> >       wrap_boundary - the_largest_potential_future_offset - 1.
> >
> > Making the wheel bigger is probably acceptable, but then you
> > will exacerbate the memory problem that rushjob was invented
> > to resolve (please do a "cvs log" and look at the checkin
> > comments; I still believe it was "dillon" who made the change).
> 
> Where from did you get the idea I'm making the wheel bigger? The size of the
> softupdates "wheel" is determined by the value of syncer_maxdelay, which not
> only I haven't touched at all, but is also completely unrelated to the
> rushjob variable.

I didn't get the idea you were making the wheel bigger.  That's
the problem: you probably need to make the wheel bigger, so that
the [new!] second hand on the two handed clock has more time until
it runs into first hand on the clock.  You will have to do this so
you can bound the vn_syncer_add_to_worklist() add delay to something
less than "syncer_maxdelay - 2"; I suggest "syncer_maxdelay / 2", as
a first approximation (remember this needs to be a power of 2, due
to syncer_mask).

You also want to count workitem insertions and removals, so you have
a total count.  This is easy: it's already protected by sync_mtx,
so all you need is a static global counter.

When the counter gets to a certain size (configurable), you have too
much memory tied up in the work queue -- so you flush it.

> If it is of any relevance for this discussion, I want to add that I've been
> running my system with extended delaying all the time for the last two weeks
> (even when on AC power). I have had absolutely no problems nor have lost a
> single bit of data, even during the most stresfull tests such as untarring of
> huge archives, or making the kernel etc. Not to mention this is also my
> primary and "production" machine, with all my e-mail on it etc.

Write some code that specifically stresses a specific FS dependency
on a set of files, iteratively, over and over again.  Then close all
the files, and call "sync", and wait.

Or run your test, and then unmount the FS on which the test was
running, before your delayed fsync gets a change to run, and then
do a shutdown.  When the system comes back up, check the data to
see if it's what it's supposed to be.

Basically, you are going to have to provide something *other than*
"rushjob" to be able to cause unmounts and other "special" code to
be able to force the fsync (consider removable media, like flash,
if nothing else).

-- Terry

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3EA0A647.BEC5931A>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation