Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 19 Jan 2013 12:30:17 +0200
From:      Marin Atanasov Nikolov <dnaeon@gmail.com>
To:        Warren Block <wblock@wonkity.com>
Cc:        ml-freebsd-stable <freebsd-stable@freebsd.org>, Ian Lepore <ian@freebsd.org>, kpneal@pobox.com, Ronald Klop <ronald-freebsd8@klop.yi.org>
Subject:   Re: Spontaneous reboots on Intel i5 and FreeBSD 9.0
Message-ID:  <CAJ-UWtRRfCKg9GBR_ppvtjvJGadiOXMXBFBpX7tAvLEXDoZHQg@mail.gmail.com>
In-Reply-To: <alpine.BSF.2.00.1301181313560.1604@wonkity.com>
References:  <CAJ-UWtSANRMsOqwW9rJ6Eebta6=AiHeNO6fhPO0mhYhZiMmn4A@mail.gmail.com> <op.wq3zxn038527sy@ronaldradial.versatec.local> <alpine.BSF.2.00.1301180758460.96418@wonkity.com> <1358527685.32417.237.camel@revolution.hippie.lan> <20130118173602.GA76438@neutralgood.org> <alpine.BSF.2.00.1301181313560.1604@wonkity.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,

Re-sending this one, as I've attached an image which was too large to pass
the mailing lists, sorry about that :)

After starting the system last night I kept monitoring the memory usage
just in case I see something strange and I've noticed a significant memory
drop of the free memory between 03:00am and 03:05am time. I've taken a
screenshot of the graph, which you can also see at the link below:

* http://users.unix-heaven.org/~dnaeon/memory-usage.jpg

At 03:00am I can see that periodic(8) runs, but I don't see what could have
taken so much of the free memory. I'm also running this system on ZFS and
have daily rotating ZFS snapshots created - currently the number of ZFS
snapshots are > 1000, and not sure if that could be causing this. Here's a
list of the periodic(8) daily scripts that run at 03:00am time.

% ls -1 /etc/periodic/daily
100.clean-disks
110.clean-tmps
120.clean-preserve
130.clean-msgs
140.clean-rwho
150.clean-hoststat
200.backup-passwd
210.backup-aliases
220.backup-pkgdb
300.calendar
310.accounting
330.news
400.status-disks
404.status-zfs
405.status-ata-raid
406.status-gmirror
407.status-graid3
408.status-gstripe
409.status-gconcat
420.status-network
430.status-rwho
440.status-mailq
450.status-security
460.status-mail-rejects
470.status-named
480.status-ntpd
490.status-pkg-changes
500.queuerun
800.scrub-zfs
999.local

% ls -1 /usr/local/etc/periodic/daily
402.zfSnap
403.zfSnap_delete
411.pkg-backup
smart

I'll keep monitoring the memory usage and will see if the free memory drops
again by more than 50% on the next periodic(8) daily run. If the memory
drop keeps the current trend that would mean that the system should crash
in the next 1-2 days, so if that happens and the memory was low at that
time I'll start debugging the periodic(8) scripts and see which one might
be causing this.

Thanks and regards,
Marin


On Fri, Jan 18, 2013 at 10:23 PM, Warren Block <wblock@wonkity.com> wrote:

> On Fri, 18 Jan 2013, kpneal@pobox.com wrote:
>
>  On Fri, Jan 18, 2013 at 09:48:05AM -0700, Ian Lepore wrote:
>>
>>> I tend to agree, a machine that starts rebooting spontaneously when
>>> nothing significant changed and it used to be stable is usually a sign
>>> of a failing power supply or memory.
>>>
>>
>> Agreed.
>>
>>  But I disagree about memtest86.  It's probably not completely without
>>> value, but to me its value is only negative:  if it tells you memory is
>>> bad, it is.  If it tells you it's good, you know nothing.  Over the
>>> years I've had 5 dimms fail.  memtest86 found the error in one of them,
>>> but said all the others were fine in continuous 48-hour tests.  I even
>>> tried running the tests on multiple systems.
>>>
>>> The thing that always reliably finds bad memory for me
>>> is /usr/ports/math/mprime run in test/benchmark mode.  It often takes 24
>>> or more hours of runtime, but it will find your bad memory.
>>>
>>
>> I've had "good" luck with gcc showing bad memory. If compiling a new
>> kernel
>> produces seg faults then I know I have a hardware problem. I've seen
>> compilers at work failing due to bad memory as well.
>>
>> Some problems only happen with particular access patterns.  So if a
>> compiler
>> works fine then, like memtest86, it doesn't say anything about the health
>> of the hardware.
>>
>
> Most test tools are like that.  They might diagnose something as bad, but
> they often can't prove it is good.  SMART has a reputation for not finding
> any problems on disks that are failing, and capacitors that aren't swollen
> or leaking still may not be working.
>
> But diagnostic tools can at least give a hint.  In my case, memtest
> indicated a problem--a big problem.  I removed one DIMM at random (there
> were only two) and the problems and memtest errors both went away. Replace
> the DIMM, and both came back.
>
> ______________________________**_________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/**mailman/listinfo/freebsd-**stable<http://lists.freebsd.org/mailman/listinfo/freebsd-stable>;
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@**freebsd.org<freebsd-stable-unsubscribe@freebsd.org>
> "
>



-- 
Marin Atanasov Nikolov

dnaeon AT gmail DOT com
http://www.unix-heaven.org/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-UWtRRfCKg9GBR_ppvtjvJGadiOXMXBFBpX7tAvLEXDoZHQg>