Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Jan 2000 14:59:42 -0800
From:      "David Fuchs" <beastie@beastie.net>
To:        "Sean Heber" <sean@fifthace.com>, <freebsd-questions@freebsd.org>
Subject:   Re: Update regarding stuck file systems
Message-ID:  <003d01bf66be$b97168e0$0201a8c0@uniserve.com>
References:  <Pine.BSF.4.10.10001241456160.2386-100000@marvin.fifthace.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Well... just think... what happens other than your backup script at
1:00am to 2:30am on your server??

By default the cron process starts it's daily server maintenance at
precisely 1:00.  I know that when I'm around the server during this time
it's making a hell of a racket...  you should probably try turning
system maintenance off for one night just to see if it solves the
problem.

I know it isn't a definite answer, but it should bring you a step closer
to the solution. :)

-David Fuchs

----- Original Message -----
From: Sean Heber <sean@fifthace.com>
To: <freebsd-questions@freebsd.org>
Sent: Monday, January 24, 2000 1:50 PM
Subject: Update regarding stuck file systems


> Ok, you may remember my previous e-mail about this a few days ago..  I
> have since done a LOT of testing.  I don't have much of a conclusion
> (which is why I'm writing again).
>
> As you may recall, my system had an odd problem.  If I ran my backup
> script (which tars files on one hard drive and puts them on another
hard
> drive), all file system access stopped.  So, the box would still be
up,
> top would still be running on the console, but nothing would work
because
> the OS couldn't seem to read from the drive.
>
> The kicker, though, is no error messages.  Nothing in the logs.
Nothing
> on the console.  It would just stop and the processes would happily
wait
> for data from the drives, but none would ever come.
>
> So, after a whole lot of swearing and Dew drinking, I have narrowed it
> down only slightly.  It seems that for some reason this only happens
> around 1:00 - 2:30 AM or so.  Never any other times.
>
> For example, as I write this a backup is being performed.  For testing
> purposes I've been running one backup after another since 8:00 AM
(3:30 PM
> now).  No problems at all.
>
> I can't think of any reason why this would fail in the early morning
hours
> and never any other time.  It's not uptime related since just
yesterday I
> had the box up and down (while testing this) and everything was going
> great.  When I tried to run the backup again around 1:30AM, it died.
I
> was forced to hit the rest button.  Once the system came back up, I
> figured I would try to narrow things more.  So, I unloaded vinum on my
two
> IDE backup drives (see below), reformated one and gave it the same
mount
> point.  (So the backup would still work.  I don't need all that space
just
> yet.)  Once that was done, vinum was not loaded and I gave it another
> shot.  The backup froze again.  The box had only been up about 30
minutes.
>
> The first night I made the backup process, I put it at the end of my
> daily.local cron script.  It runs at 1:59 or something like that.
Before
> that time, the box was up for 2 days.  That first night brought it
down
> with a froze file system.
>
> The night after I gave the backup script it's own entry in crontab for
> 3:00AM.  It worked just fine.  When I woke up in the morning things
still
> worked.
>
> Just the other night I changed the cron's run time to 12:05 AM.  That
also
> made it through the night just fine.
>
> Does any of this make any sense?  It doesn't to me.
>
> I suppose I have two basic questions here:
> 1) Is there anyway to make this work aside from the obvious "Don't run
it
> between 1:00 and 2:30 AM"?  Because this really bothers me.  I have no
> idea if heavy server load would cause this to happen or if this is
just a
> backup problem due to something stupid I'm doing.
>
> 2) I really need a better backup method.  The idea originally was to
have
> a duplicate structure on the backup drive as well as the main drive so
> that in the event of a disk faliure the broken drive could just be
> unplugged.  Is that reasonable?  Obviously using tar the way I am
doesn't
> really allow this.  The catch (at least it seems like one to me) is
the
> drives are all different sizes.. (see below)
>
>
> Ok, the famed "below":
>
>
> Running FreeBSD 3.3-RELEASE (I had 3.4-STABLE before.  Don't ask.
Long
> story.  But the problem is still the same in either case.)
> SMP Kernel
> 256 MB RAM
> Dual PII-400Mhz
> Currently sitting in my room with no other active users and no outside
> activity via web or anything (it's still being configured, after all)
>
> Drives:
> SCSI id6: 4.5 GB (boot: /, /usr, swap)
> SCSI id9: 9.0 GB (backup: /eddie)
> IDE bus1master: 37 GB (data: /sites)
> IDE bus1slave: none
> IDE bus2master: 25 GB (backup1)
> IDE bus2slave: 20 GB (backup2)
>
> The last two backup drives are concated using vinum.  Mounted as
> /wowbagger.
>
> The idea is that everything on the boot SCSI drive could be on the
backup
> SCSI drive, and the same for the IDE.  This layout is like this
because
> our original plan was to have the ability to unplug the broken drive
and
> get things backup with minimum pain.  But using tar sort of defeats
the
> purpose--which is why I would like some more suggestions.  :-)
>
> The backup script does this right now:
>
> echo "Backup /:"
> tar -cslpf /eddie/root.tar /
> echo
>
> # Backup by itself to be handy, maybe.
> echo "Backup /usr/local:"
> tar -clspf /eddie/usr.local.tar /usr/local
> echo
>
> echo "Backup all of /usr:"
> tar -clpsf /eddie/usr.tar /usr
> echo
>
> echo "Backup /sites:"
> tar -clpsf /wowbagger/sites.tar /sites
> echo
>
> Make sense?  One thing I just realized, though, is that I might hit
that
> famed 2GB file limit.  I imagine FreeBSD is prone to this?  Oh well.
I
> need a better method anyway..
>
> Just so you know, here's the current df:
>
> Filesystem        1K-blocks     Used    Avail Capacity  Mounted on
> /dev/da0s1a           99183    45741    45508    50%    /
> /dev/da0s1e         3713364   507654  2908641    15%    /usr
> /dev/da1s1e         8679993  1227161  6758433    15%    /eddie
> /dev/wd0s1e        35503710   449097 32214317     1%    /sites
> /dev/vinum/vinum0  43643010   996729 39154841     2%    /wowbagger
> procfs                    4        4        0   100%    /proc
>
>
> As you can see, the partitions that are being backed up are not over
2GB,
> so that shouldn't be the problem right now.
>
> Anyway..  I'm looking for some input here.  It's very very hard to
make
> this problem happen.  I can try all day and nothing will come of it,
but
> wait until 1:30AM or so, and it happens almost(key word) everytime.
Is
> something deadlocking?  Perhaps something to do with SMP?  Or am I
doing
> something terrbily stupid?  (feel free to flame..  I need to learn
> sometime, right? :-)
>
> I hope someone has a clue of where to start digging, at least.  The
last
> e-mail generated one response.  The person suggested I try removing
drives
> one by one from the equation.  I'm going to attempt that tonight in
more
> detail.  The problem is, setting the clock to 1:30 AM myself doesn't
> seem to matter.  Maybe it's tied to the BIOS time...  Or perhaps it's
not
> time related at all and just really really coincedental that it
happens
> around that time all the time regardless of how long the box was up,
how
> hot it is, etc.
>
> l8r
> Sean
>
> PS> ARG!!!!  (This has been driving me nuts for the past 4.5 days now)
>
>
>
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-questions" in the body of the message
>



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?003d01bf66be$b97168e0$0201a8c0>