Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 19 May 2020 10:12:33 -0500
From:      Valeri Galtsev <galtsev@kicp.uchicago.edu>
To:        Tim Daneliuk <tundra@tundraware.com>, freebsd-questions@freebsd.org
Subject:   Re: OT: Weird Hardware Problem
Message-ID:  <31c596ff-777d-47ff-83e8-3db9b3430d92@kicp.uchicago.edu>
In-Reply-To: <65f0c2dd-b431-9b9a-d256-1acf4801c771@tundraware.com>
References:  <0a9f810d-7b4b-f4e6-4b7c-716044a9cf69@tundraware.com> <a3aa3910-5f7e-813f-b6f8-b9ab12b1336d@qeng-ho.org> <20200519103835.00003914@seibercom.net> <e91cfa85-d7cf-637b-01e7-9fd6e870a103@kicp.uchicago.edu> <65f0c2dd-b431-9b9a-d256-1acf4801c771@tundraware.com>

next in thread | previous in thread | raw e-mail | index | archive | help


On 5/19/20 9:54 AM, Tim Daneliuk wrote:
> On 5/19/20 9:52 AM, Valeri Galtsev wrote:
>>> Have you checked to see if an updated BIOS has been issued for your
>>> machine?
>>>
>> Arthur's and Jerry's suggestions almost certainly cover everything. Were there not for them I would suggest to also open the machine and "re-seat" SSD or hard drive whichever device you have. (and RAM unless it is soldered to the system board). I know the contacts are gold plated but still they manage to oxydate somehow (not that much). Probably gold plating is porous sometimes, not contiguous layer of gold.
>>
>> Valeri
> 
> I have done all the above :(
> 

Next what I will try is put the machine in the freezer for half hour 
(whatever time is necessary to cool it to the core), the after taking 
out of freezer attempt to reproduce the trouble.

The above is related to the micro crack(s) in the circuit board(s).

Next, I would investigate all places where heat syncs are supposed to 
have heat contact to chips. Overheating some of the components on the 
circuit board can trigger something like that.

While the thing is stripped I will carefully investigate boards for 
suspicious areas. First thing that happened after everyone went 
"lad-free" soldering was: soldering alloys were not flexible, and some 
places where components were supposed to be soldered were not reliably 
soldered. These days it is overcome, but it just may be some unlucky 
board... Which I hope it is not. Basically, stressing things, like you 
are making it use encryption of filesystem which is accessed constantly 
during installation may reveal that sort of trouble by non-uniformly 
heating (some chips become much hotter than everything else). When we 
needed to test CPU/RAM for that compiling world or kernel was good test.

Incidentally, is your new AC adapter attached to machine during the 
test? What if you repeat it on battery (make sure to disable power 
saving where possible). Dell smartly detects AC adapter attached, and I 
can suspect something coming from AC adapter may trigger the same (e.g., 
reboot).

Good luck, and keep us posted.

Valeri

> 
> 

-- 
++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?31c596ff-777d-47ff-83e8-3db9b3430d92>