Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 Mar 2015 12:29:03 +0100
From:      Polytropon <freebsd@edvax.de>
To:        CK <nibbana@gmx.us>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: thrashing + lost files
Message-ID:  <20150318122903.05e189f8.freebsd@edvax.de>
In-Reply-To: <0M8Nme-1ZTz3v3hkQ-00vvpg@mail.gmx.com>
References:  <0M8Nme-1ZTz3v3hkQ-00vvpg@mail.gmx.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 18 Mar 2015 03:05:28 -0800, CK wrote:
> > On Tue, 17 Mar 2015 23:56:25 -0800, CK wrote:
> > > I would like any thoughts or ideas on how to prevent the following problem,
> > > because it is making my computer completely unusable, wasting many efforts.
> > > I am using this mail-list because freebsd.forums.org has become completely
> > > unusable to those with dial-up connections, requiring 10 seconds for each
> > > character typed ...  no exaggeration.
> >
> > A common reply would be: "Who still uses dial-up anyway?" ;-)
> 
> 10s of millions in the USA.  High-speed internet is way too expensive,
> over $100/mo where I live.  Over $1200/yr.  Easily 5-10% of take-home
> salary for many minimal wage workers.

Here in Germany, many people believe that the USA is a
"technology utopia", a "magical wonderland" where people
earn high wages and have the fastest Internet of the world... :-)



> > > The result is the loss of many critical files from a hard drive, as if a "rm
> > > *" was done in the home directory.  This occurs after the thrashing when
> > > Xwindow is accidently shutdown with Opera open with many javascript page tabs,
> > > eg, being a memory pig - consuming 1/2 of RAM (256M), which after dumping
> > > core, writes a large amount of data (crashlog) even after Xwindow is down:
> > >
> > > pid 1118 (opera), uid 1001: exited on signal 11 (core dumped)
> >
> > I thought Opera would simply write a core dump, well, still
> > several 100s of MB though...
> 
> Interestingly, the core dump was deleted out of the home directory. I caught a
> quick glimpse of it doing "ls" before it was deleted. As I said, it was
> exactly like "rm *".  Dot files were left intact.

Oh, that's surprising! I also had that experience once - home
directory empty (!) _except_ dot files (and other directories),
just like "rm *" had been issued... very strange...



> At first, I thought it was a bug with journaling/soft-updates, so I disabled
> those things with tunefs (to the best of my memory).  But now it has happened
> again.

I can't imagine it has to do with that. Massive file loss can
appear when a directory inode has been damaged. Then fsck will
remove the directory altogether. But it's possible to rescue
the files _content_, as those are written with their (orphan)
inode number to lost+found/. So their names are lost, but their
content will be kept.



> The drive was being written to for about 1 minute by the Opera
> crashlog/coredump.  About 45 seconds after Xwindow was already down.

Such kind of crash indicates a significant problem. Are you
sure the drives are fully intact? Check with "smartctl -a" just
to be sure. And even if it sounds stupid: check the cables.



> > > FSCK RESULTS:
> > > ------------
> > > Of interest, is that each time fsck was run, more files were lost!
> > >
> > > # fsck -t ufs -p /dev/ada0p6.eli
> > > /dev/ada0p6.eli: NO WRITE ACCESS
> > > /dev/ada0p6.eli: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
> >
> > This message should alert you. Don't just preen the disk.
> > In this mode, only a subset of errors will be detected,
> > and not all of them can be corrected. You should actually
> > perform
> >
> > 	# fsck -t ufs -f /dev/ada0p6.eli
> 
> Thanks, I didn't think of using the -f option.

The -f options *f*orces a *f*ull check. You can even run the
command two times. The 2nd run should then reveal "no errors",
the file system is kept marked clean.



> After reading a paper by
> Marshall McKusick on fsck, it was my understanding that "preen mode" only
> fixed errors that could be fixed with 100% accuracy.

I also read that famous paper to gain a better understanding
of how UFS works and what fsck does. Data loss teaches you a
lot of fundamental knowledge. :-)



> > There are several errors shown:
> >
> > > INCORRECT BLOCK COUNT I=2327435 (8 should be 0)
> > > [...]
> > > UNREF FILE I=2327428  OWNER=abc MODE=100600
> > > [...]
> > > UNREF FILE I=2327439  OWNER=abc MODE=100600
> > > [...]
> > > FREE BLK COUNT(S) WRONG IN SUPERBLK
> > > [...]
> > > SUMMARY INFORMATION BAD
> > > [...]
> > > BLK(S) MISSING IN BIT MAPS
> 
> I lost about 8 files, a lot of legal research/work, in case that is what the
> (8 should be 0) is citing.

The question is: Is the data still there? Just because the
file is gone - the inode entry -, this does not have to imply
that the data isn't still on the disk. Everything is on the
disk as long as it hasn't been overwritten.

When I found out that one of my files (which I worked a whole
day on) was gone (0 bytes) after a freeze + reboot + fsck, I
immediately forced a r/o mount on the /home partition and
grepped for some text fragment I could remember. I found the
block where it was in, dumped that block, and trimmed it to
become the original file again. The data wasn't lost, it was
fully intact. But not referenced (!) anymore.



> > Unmount the partition, let fsck do its job. :-)
> 
> fsck -t ufs -f /dev/ada0p6.eli only reported that
> everything was clean.

So at _this_ point in time the file system was consistent.
Do you maybe have background_fsck="YES" in /etc/rc.conf?
Set it to ="NO". Always perform file system checks _prior_
to accessing a file system r/o or even r/w. This may take
some time, but you have to find a relation of time vs. data
that reflects your priorities. :-)



> > Copy files to a different disk (or maybe even external storage,
> > such as USB sticks) temporarily, just to be sure.
> 
> Yes, I do this of course, with a USB SDRAM device. But I still lose days of
> work, because I can't back up every minute.

You could automate this - but on the other hand, when a
crash appears, this might also affect the backup process
and its results.



> This should not happen at all.

Yes, it sounds too unusual.



> I
> have used FreeBSD for 20 years, since 1995, and I never had problems like this
> before - and I have the same hardware since 2003, which I ran FreeBSD 4.11 on
> until recently.  But only now does this problem occur.  Certainly, there is a
> bug somewhere.  My gut feeling is that something is allowing Opera to do
> things it should not do, or something in the filesystem layers is breaking
> under the stress of Opera's crash dumps.

I'd think it's somewhere filesystem-related. I have tortured
Opera with approx. 100 tabs open with "Flash" content and JS
stuff in it. No crash, it just started swapping heavily. Sometimes
I can get Opera to crash, but it successfully "resumes". However,
when my system freezes (due to a faulty GPU) and Opera has been
running. sometimes the bookmarks are lost. That's why I tend to
copy them to ~/ from time to time, just to be sure. In few
cases, the Opera settings also are reset. A copy of ~/.opera
is helpful. Maybe it's just program design that got worse, like
first reading a file into memory, then keeping that file open,
maybe modify it, or not, and upon program exit, write memory
content back to the file. When the normal program termination
is not reached, a damaged or empty file is left behind. I have
no idea what makes people write software that way, but it seems
to be "modern" now...




-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150318122903.05e189f8.freebsd>