Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 2 Feb 2022 13:52:10 +1100
From:      MJ <mafsys1234@gmail.com>
To:        "freebsd-arm@freebsd.org" <freebsd-arm@FreeBSD.org>
Subject:   Re: Error detection for microSD-based swap, buildworld failures on pi3
Message-ID:  <9b604f9e-45bf-b197-562f-1f6381ee5515@gmail.com>
In-Reply-To: <E849CF0D-F894-44DF-AA45-B29761242AD9@yahoo.com>
References:  <20220129022255.GA59340@www.zefox.net> <6B822440-6F01-4578-803C-20A51DADF10C@yahoo.com> <20220130020546.GA63792@www.zefox.net> <1964F2B7-EC41-42C8-9C18-5E2B79EE0271@yahoo.com> <F4CAC6F9-B9E8-4BD3-BFA0-1706BE56A2AD@yahoo.com> <5B3DF910-23B1-4246-999E-0196E90269F2@yahoo.com> <20220131165333.GA69543@www.zefox.net> <9E0510D2-9FAC-4F01-89A3-E6D8C7C21FDA@yahoo.com> <20220131221405.GA70251@www.zefox.net> <14716537-6E22-44F5-B6AA-841E3EB2AD04@yahoo.com> <20220201161808.GA73977@www.zefox.net> <0e61e2d8-c65f-eb23-473f-69403e33da9e@gmail.com> <E849CF0D-F894-44DF-AA45-B29761242AD9@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help


On 2/02/2022 12:25 pm, Mark Millard wrote:
> On 2022-Feb-1, at 16:47, MJ <mafsys1234@gmail.com> wrote:
> 
>> On 2/02/2022 3:18 am, bob prohaska wrote:
>>> [new subject, different emphasis, old problem]
>>> On Mon, Jan 31, 2022 at 03:06:01PM -0800, Mark Millard wrote:
>>>>
>>>> One thing that could fit the behavior is if small part(s)
>>>> of the system c++ compiler (or libraires it uses) were
>>>> corrupted on that specific media. In that case, nothing
>>>> elsewhere would replicate the failures but a lot might
>>>> work without using the corrupted part(s), making the
>>>> failures not random.
>>> [spaced for emphasis]
>>>> Checking on that is part of why
>>>> I'd hoped to get a lldb report for a .sh/.cpp pair
>>>> leading to failure on your RPi3* in question.
>>>>
>>> If/when the stable/13 Pi3 finishes its -j1 single-user
>>> build/install cycle I'll make a point of trying the
>>> .sh/.cpp test under lldb.
>>> For most of their operational history both troublesome Pi3
>>> systems have had some of their swap on microSD. If there
>>> is no error detection at all for microSD-based storage
>>
>> Is this true? I would have thought it used some form of error detection in the firmware or in
>> the controller.
> 
> The type of error and stage at which the error occurs matters.
> The firmware can not cover all issues that lead to corrupted
> content on media.

I did not state it covers all corruption. However, I would be totally surprised if the controller in
ALL SD cards does not do error checking, whether ECC or even BCH. That remains my point.

> 
>>> then undetected corruption of data from swap is a real
>>> possibility. I expected that storage errors would be
>>> reported but maybe not, especially outside file systems.
>>
>> If indeed your suppositions are correct, would a file for swap be more prudent as it has to
>> go through the file system (UFS/VFS) to read/write to swap?
> 
> No. See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206048 and
> its comments #7 and #8.
> 

This seems to address potential memory over-use because of a swapfile, not the safety of it over a
swap partition. I still contend the UFS file system has better protection against corruption than
a raw partition labelled swap. If Bob's requirement is a "safer" swap, then a file would be the answer. Whether there are other issues to contend with are likely out of context in this particular discussion.


>>> Mechanical disks have some internal error detection and
>>> report explictly when data can't be retrieved. As I think
>>> back on it at least one flash device (a USB thumb drive)
>>> failed silently, no reported errors but also no-write.
>>> That was on a filesystem, so the OS noticed and so did I.
>>
>> But this could "simply" be because one of the NAND blocks has failed, not that it could not
>> detect an error. Is there a lack of error detection in the driver handling USB thumb drives and reported back to the kernel? I do not know.
> 
> Bob's context is reproducible at the same places in

No, he was talking about a "failed silently" event and this is what I was replying to.

I am not up-to-date with the previous discussion on the failure of llvm/clang.

> 
> Such is unlikely for hitting the same problem page(s)
> in the swap space each way things are run.

I couldn't agree more. The chances would seem remote, unless that partition is on a part of the SD card/USB drive that is failing and the USB driver is not detecting these as reported by the controller.

MJ



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9b604f9e-45bf-b197-562f-1f6381ee5515>