Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 06 Jan 2005 13:33:41 -0600
From:      "Joseph Koenig (jWeb)" <joe@jwebmedia.com>
To:        Joe Koenig <joe@jwebmedia.com>, FreeBSD Mailing List <freebsd-questions@freebsd.org>
Cc:        Henry Miller <hmiller@intradyn.com>
Subject:   Re: Hardware or OS problem? System Crashing...
Message-ID:  <BE02EEB5.156EC%joe@jwebmedia.com>
In-Reply-To: <BE02B574.15688%joe@jwebmedia.com>

next in thread | previous in thread | raw e-mail | index | archive | help
>> On 1/5/2005 at 09:14 Joseph Koenig (jWeb) wrote:
>> 
>>> Hi,
>>> 
>>> We have a system that is currently giving us some trouble. The system
>> is
>>> FreeBSD 4.9. It's a 2 GHz system with 1MB RAM and (here's the kicker)
>> 73GB
>>> RAID 1 ATA drives. The system serves as a web/database server
>> dedicated to
>>> 1
>>> site. Daily the system goes out and downloads real estate listings
>> (via
>>> shell scripts and cURL) and processes them (via PHP into MySQL). Also,
>>> nightly the system downloads a zipped set of images (probably around
>>> 400-500) and processes them into thumbnails (PHP scripts calling
>>> ImageMagick). Over the last week or two, the system is crashing and
>>> rebooting into single user mode. It's not consistently during updates,
>> or
>>> resizing of images, or anything like that. Yesterday, it crashed with
>> 99%
>>> processor idle and load averages of 0.00 0.00 0.00 -- I was watching a
>>> 'top'
>>> when the machine died. When it boots into single user mode, an fsck
>> must be
>>> run, which identified a few corrupt JPEG files -- however, the
>> sysadmin who
>>> reboots it never tells me which files they are. The sysadmin is
>> convinced
>>> it
>>> is a FreeBSD problem and says that Linux will not crash because of a
>>> corrupt
>>> file and if it does, will not boot into single user mode and he will
>> be
>>> able
>>> to access it remotely to do the fsck. About 3-4 weeks ago, one of the
>>> drives
>>> in the mirror set crashed and had to be replaced. I'm not convinced
>> that
>>> drives are not to blame for these issues. Is there any way to verify
>> that?
>>> Is it possible a corrupt JPEG on the drive could cause the system to
>> crash
>>> randomly? What can I do to correctly identify the problem so that we
>> can
>>> fix
>>> it and not change the OS? Thanks,
>> 
>> The sysadmin has no clue about either linux or freebsd!
>> 
>> A corrupt JPEG cannot cause a crash of the OS, for any real OS.  (If it
>> does, it is a bug in the OS, but I doubt one exists)  Real OS includes
>> Windows XP, linux, and FreeBSD.
>> 
>> However, an OS crash can cause a corrupt JPEG!
>> 
>> Either linux or FreeBSD may boot into single user mode when the
>> filesystem is corrupt.    What your sysadmin means is that with one of
>> the newer filesystems Linux uses journeling, which is much less likely
>> to enter this situation, but it still can happen.   With soft updates
>> FreeBSD is in the same situation as linux, but softupdates is
>> (generally, there are exceptions) better than journeling.   There is
>> softupdates in Freebsd 4.9, but I'm not sure how to enable it, or how
>> good it is.  (in 5.3 it is awesome!)
>> 
>> I suspect hardware.
>> 
>> I'd burn memtest to a CD, and run that for a few hours to see if
>> something is identified.   Memtest won't catch everything, but it does
>> a pretty good job.
>> 
>> Also look at other factors.  Does the HVAC kick in when this happens?
>> Is someone hitting the panic stop switch?  Situations like that have
>> happened, and they can take a while to debug.  They are not likely, but
>> don't rule them out.
>> 
>> FreeBSD 4.9 is fairly old at this point.   You should seriously
>> consider upgrading to 4.11 (due out in a few weeks), or 5.3 (my
>> recommendation, but a much more involved upgrade).
>> 
> 
> In addition, to the original problem stated above, we are seeing a number of
> problems like "...in free(): warning: modified (page-) pointer" and "...in
> free(): warning: chunk is already free". I have them admin running a memtest
> today, but wanted to make sure these errors were not indicative of something
> else going on. Thanks,
> 

Well, the sysadmin tells me that memtest passed. Any one have any
suggestions as to what could be causing the crashes? Thanks,

Joe



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BE02EEB5.156EC%joe>