Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Jan 2018 13:07:11 -0600
From:      Valeri Galtsev <galtsev@kicp.uchicago.edu>
To:        Dave B <g8kbvdave@googlemail.com>, freebsd-questions@freebsd.org
Subject:   Re: freebsd-questions Digest, OT: Max system physical memory
Message-ID:  <5900c538-35b7-33ae-160c-0c070cdea4d2@kicp.uchicago.edu>
In-Reply-To: <eb5d205f-88c8-7c68-e856-d4a76e2ff4a6@googlemail.com>
References:  <mailman.112.1515672002.77530.freebsd-questions@freebsd.org> <0bce5e82-97ba-0a73-e261-c91473837737@googlemail.com> <b8ad49f4-59ad-e593-a1b6-ae470a2b0dee@kicp.uchicago.edu> <eb5d205f-88c8-7c68-e856-d4a76e2ff4a6@googlemail.com>

next in thread | previous in thread | raw e-mail | index | archive | help


On 01/11/18 11:36, Dave B wrote:
> Memtest86 is anything but un-stressfull!
> 
> It runs a multitude of tests, designed to increase the noise level on
> the memory cell sense lines by way of specific address and data
> patterns.  DRAM is not unlike older memory technology in that respect,
> regarding noise on sense lines due to particular data/address patterns.
> 
> Yes, an OS can stress the RAM, but so can this.   That's why it can take
> an age to complete one full pass.
> 
> I have in the past found it to be the only tool to find single bit
> errors in several GByte of RAM.   On one (infamous) occasion, a user
> complained that one OS update wouldn’t "take", the system (you know the
> one) just tried and tried again over and over.    Other than that, the
> "system" as a whole was working just fine.  (The use was music and video
> editing, so not exactly a low stress environment!)
> 
> When they were able to release the machine to me, after cleaning out all
> the coolers etc, I ran Memtest86, and eventually, in the last few % of
> the last test, some hours after starting, it flagged a single bit
> error.   1 Bit, in over 4G bytes of memory!
> 
> If you doubt the stress it puts a system under, just monitor the cooling
> exhaust temperatures, and power consumption when tests are running!
> 
> Long story short, after identifying which of the 8 memory cards was the
> buggy one (by selective removal-relocation and retest etc, several days
> later...)  A new part was sourced, installed and tested successfully,
> all OK.
> 
> The very next full OS boot, the problematic security update ran and
> stuck just fine, followed by several others that must have been waiting
> for a dependency to be satisfied..
> 
> Just to be sure, I swapped that suspect card into another similar
> system, that booted and ran OK, but again Memtest86 eventually found a
> single bit error, right at the end of the last test.   The memory card
> went for recycling along with a load of other WEEE junk.
> 
> I'm sure it could happen, but in over 30 years in total that I've been
> working on this sort of hardware, professionally and at home, I've yet
> to find any OS fail due to RAM errors, that a "Proper" memory diagnostic
> tool could not find the cause..
> 
> Memtest86 might not be the be all and end all of all RAM tests, and of
> course its x86 specific, but it's pretty damn close.   For the price
> though, it can't be beaten.  I've seen it find and identify problems
> that paid for diagnostics just ignore.  Often allowing me to repair
> systems that were declared BER by other (so called) professional data
> system engineers.
> 
> The downside, is the time it all takes, and of course, time == money.

Well, it is all well and agreeable. And memtest86 is all you say and is 
very powerful to catch single bit errors, even those which may only 
happen trough cross talk between signal lines.

No one would attempt to undermine the validity of memtest86. But 
everything has its limitations, so:

What I attempted to say is: memtest86 does not create CPU load. Someone 
who does know the details of memtest86, chime in, but as far as I know 
thing didn't change lately and is still valid. And CPU load and high 
workload of other machine electronic boards (or components on the system 
board) do affect the operation and signals memory controllers and memory 
modules produce. Every electrical engineer will tell you that (yes, I do 
have electrical engineering degree in addition to computer science one). 
I would try to think of it like this. When you are running memtest64 you 
are sterilizing the operation room of vet clinic. But when the room 
actually makes operation on a stray dog that just was brought from the 
street, the sterility level is quite different.

Let me attempt to make my point differently. Memtest86 catches all 
memory errors that consistently happen. It may catch some errors that 
are "transient", i.e. sometimes happen sometimes not, but to catch some 
of them may take extremely large number of loops, and still in may not 
catch some that still may happen under real system run. Stressing ALL 
components of computer helps to increase probability of transient 
errors. Memtest86 is not designed to do so (and it is single thread 
program as far as I know, and this is how I would program it if I were 
to achieve the goal memtest86 has).

I hope, this time I managed to make my point clear.

Valeri

> 
> Regards to All.
> 
> Dave B.
> 
>>> <<
> 
> On 11/01/18 15:57, Valeri Galtsev wrote:
>>
>>
>> On 01/11/18 06:14, Dave B via freebsd-questions wrote:
>>> That I suspect depends on how many physical address lines are available
>>> for it via the memory management system.
>>>
>>> If the PC documentation says 8G is the max', then that is probably all
>>> that'll be seen by the CPU, even if there is 16G installed.
>>>
>>> There is only one way to find out for sure, at your expense.
>>>
>>> Personally, I doubt it, and the info at
>>> https://support.hp.com/gb-en/document/c03363664  says it wont.
>>>
>>> If it does see 16G (or more) best park it in a corner and thrash it with
>>> a Live Boot Memtest86 CD for some days, to ensure it's working
>>> correctly.   (Let it run to full completion, it can take hours for a
>>> full test, even for 4G!)
>>
>> In addition to memtest86 I would also run much more stressful test,
>> namely have the machine booted into system, run multiple CPU and RAM
>> hungry stuff (make buildworld comes to my mind). The reason for that
>> would be: with signals on memory bus marginally out of specs, system
>> stress will help them being pushed to the limit, which "unstressful"
>> memtest86 will not do and may pass, though the failure on stressed
>> system still may happen.
>>
>> Just my $0.02
>>
>> Valeri
>>
>>>
>>> But even that, doesn’t fully exercise the memory management system in
>>> the same way an OS will.
>>>
>>> Chances are, if it works at all with the large memory modules, it'll
>>> only "See" and be able to use 8G.
>>>
>>> Have Fun.
>>>
>>> Dave B.
>>>
>>>
>>>
>>> On 11/01/18 12:00, freebsd-questions-request@freebsd.org wrote:
>>>> Subject:
>>>> OT: Max system physical memory
>>>> From:
>>>> Aryeh Friedman <aryeh.friedman@gmail.com>
>>>> Date:
>>>> 10/01/18 14:59
>>>>
>>>> To:
>>>> FreeBSD Mailing List <freebsd-questions@freebsd.org>
>>>>
>>>>
>>>> My computer (HP Pavilion P7-1234, FreeBSD 11.1-RELEASE [amd64]) has
>>>> 2 240
>>>> pin DIMM (DDR3, PC3-10600) the manual says the max memory is 8 GB
>>>> but I see
>>>> some 16 GB packages (2x8GB).   If I put one or two of these in will
>>>> it see
>>>> the extra memory?
>>>
>>> _______________________________________________
>>> freebsd-questions@freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-questions
>>> To unsubscribe, send any mail to
>>> "freebsd-questions-unsubscribe@freebsd.org"
>>>
>>

-- 
++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5900c538-35b7-33ae-160c-0c070cdea4d2>