Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 26 Jun 2018 07:37:59 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Jamie Landeg-Jones <jamie@catflap.org>, bob prohaska <fbsd@www.zefox.net>
Cc:        Warner Losh <imp@bsdimp.com>, freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: RPI3 swap experiments, was Re: GPT vs MBR for swap devices
Message-ID:  <A6986B21-FF6E-48F5-9F3A-06B3D2A92C55@yahoo.com>
In-Reply-To: <201806261040.w5QAeBKq035183@donotpassgo.dyslexicfish.net>
References:  <25F1A4BA-FBFC-4C32-85DD-5F5BA71A2B1A@yahoo.com> <20180620023253.GA89924@www.zefox.net> <a232ed45-a9a9-1017-72ed-720a6c7a8f03@sentry.org> <1D86911D-20D1-494A-822B-1C07C5598CB1@yahoo.com> <10CAC122-399D-459E-9153-ABD7E753777E@yahoo.com> <a2d7f4d3-0b6d-f82d-bae8-0988b0b54a8f@sentry.org> <20180623143218.GA6905@www.zefox.net> <03C2D3C4-6E90-4054-AF79-BD7FE2B7958D@yahoo.com> <20180624231020.GA11132@www.zefox.net> <C87C40CF-15B2-4137-892C-F2ADBAB32418@yahoo.com> <20180626052451.GA17293@www.zefox.net> <CANCZdfpXyzxzOZ8pqcRtuFsxYx5Jjs9oSL1ok2sGVPHdiB0qVQ@mail.gmail.com> <201806261040.w5QAeBKq035183@donotpassgo.dyslexicfish.net>

next in thread | previous in thread | raw e-mail | index | archive | help


On 2018-Jun-26, at 3:40 AM, Jamie Landeg-Jones <jamie at catflap.org> =
wrote:

> Warner Losh <imp at bsdimp.com> wrote:
>=20
>>>> _vfs_done():da0d[WRITE(offset=3D51819347968, length=3D131072)]error =
=3D 5
>>>> g_vfs_done():da0d[WRITE(offset=3D51819479040, length=3D28672)]error =
=3D 5
>>>> g_vfs_done():da0d[READ(offset=3D59586936832, length=3D32768)]error =
=3D 5
>>>> g_vfs_done():vm_fault: pager read error, pid 823 (tcsh)
>>>=20
>>=20
>> The device is broken if you get this. Period. I don't know if it is
>> hardware, or software, but it is not a reliable storage device. Until
>> that's fixed, you'll continue to have a terrible experience with it.
>>=20
>=20
> [ ... ]
>=20
>> Sorry to sound so harsh, but the data has been consistent on this for
>> everything you've reported: it works for a while, then we get a bunch =
of
>> errors then a reboot. We need to start narrowing down which of these =
three
>> broad classes of root causes it is. I'd rank actual bad thumbdrive =
last on
>> the list. It's a tossup for me between missing quirk and a bug in the =
rpi
>> usb driver that manifests itself only under heavy load. IIRC, you =
said one
>> of rpi2/3 works and the other doesn't, which would suggest a usb =
bridge
>> driver problem...
>=20
> For what it's worth, I had the same errors on a rpi3 a few months ago, =
and
> eventualy gave up "to sort it tomorrow" - it hasn't been powered on =
since, but
> I still want to get it working.
>=20
> The system would run fine, but give the vfs errors on the 128GB usb =
thumb
> drive every week - like clockwork, when one of the heavier periodic =
jobs ran.
>=20
> I was running the latest CURRENT at the time. The thumb drive works =
fine elsewhere,
> and indeed - did on the same hardware when I test installed a linux =
install,
> and thrashed the hell out of it.
>=20
> I'll fire it up again - hopefully I'll still have the same results, =
and with 2
> of us, we may find the cause quicker.
>=20
> (n.b. i never had swap errors, but I can't recall if i ever configured =
swap on the usb
> drive)

The presence of the errors is a confounding variable for the other
issues being looked into.

It would likely be better for the effort to be split:

A) Looking into the drive errors and what range of contexts
   get them, hoping to find something to fix the issue (such
   as by adding a quirk).

B) Looking into the swapping and Out Of Memory process killing
   --but absent such errors being involved. (For now this might
   require a different instance of the same type of device
   or a different type of device.)

It seems too complicated to be investigating (B) but in a
context with the drive errors also involved.

As I remember, Bob P. Did reproduce drive errors even without
the problem drive being used for swapping. This too suggests
(A) as separate activity.

If only one of the 2 is targeted first, (A) may be the
better one to pursue for those with reproducible examples.

For those with contexts that lack the drive errors, (B)
activity might show a contrasting behavior for lack of drive
errors --or the behavior might be reproduced. Cross checking
on if drive errors started showing up would be appropriate.

An intersting question for (A) might be if some drive benchmark
program(s) might reproduce the drive errors. If such was found,
the context for reproduction would be far simpler than buildworld
buildkernel use.


=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A6986B21-FF6E-48F5-9F3A-06B3D2A92C55>