Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Jul 2015 16:43:33 -0700
From:      Kevin Oberman <rkoberman@gmail.com>
To:        Claude Buisson <clbuisson@orange.fr>, FreeBSD Stable ML <stable@freebsd.org>
Subject:   Re: suspend/resume regression
Message-ID:  <CAN6yY1tbvciVtOWBLvPA9xndtcnnv_xv1hOip8FV73YJDwS6tg@mail.gmail.com>
In-Reply-To: <55B957EA.1020801@orange.fr>
References:  <86oak289hv.fsf@gly.ftfl.ca> <86oaj9dnbo.fsf@gly.ftfl.ca> <CAN6yY1ubQhsCMqXqV9Fp0M9bLRZGGg07qt0Z3KZNSaxW80GAOg@mail.gmail.com> <12509399.h3RdpFfE1l@ralph.baldwin.cx> <CAN6yY1vaxDd6raCf5p%2BFh5Pw%2BYhHUc9VwRA258roAadozdRLuw@mail.gmail.com> <CAN6yY1uKNmDkhm2vntoRvVMLFhNfw04OCqsmc5ggdoRFeFbEMw@mail.gmail.com> <55B8CE0C.6040000@orange.fr> <CAN6yY1ti14C6u9pnkHFGsw5WKQ6YVxh=jYtTXPW2ckmOtVqHwQ@mail.gmail.com> <55B957EA.1020801@orange.fr>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jul 29, 2015 at 3:47 PM, Claude Buisson <clbuisson@orange.fr> wrote:

> On 07/29/2015 23:53, Kevin Oberman wrote:
>
>> On Wed, Jul 29, 2015 at 5:58 AM, Claude Buisson <clbuisson@orange.fr>
>> wrote:
>>
>>  On 07/26/2015 00:54, Kevin Oberman wrote:
>>>
>>>  John,
>>>>
>>>> I'm concerned that two issues may be getting conflated.
>>>>
>>>> The issue I thought we were looking at was the failure of some systems
>>>> (T520, X220, T430) to resume after a number of PCI enhancements were
>>>> MFCed.
>>>> This is completely unrelated to the USB issue I was experiencing when
>>>> trying to test the problem on HEAD. The more I think about it, the more
>>>> I
>>>> think that the USB "issue" is just how things need to work.
>>>>
>>>> Specifically, if you are booting a full, multi-user system from a USB
>>>> connected drive, suspending and resuming will leave the system in an
>>>> untenable condition that will force a panic. At least I don't see how
>>>> the
>>>> OS can determine that the disk present on resume is unchanged from that
>>>> present when the system was suspended. Modern disk IDs greatly improve
>>>> the
>>>> situation, but I am unaware of any way to be sure that a removable drive
>>>> (such as a USB) has not been removed and plugged into some other system
>>>> that might have written to it. My knowledge of such things is very
>>>> dated,
>>>> going back to my days doing kernel programming about 25-30 year ago on
>>>> VMS,
>>>> so someone may have resolved the issue, but I don't understand exactly
>>>> how.
>>>> I guess that the risk might be low enough to just go ahead and pray that
>>>> nobody did something really, really stupid like unplugging the drive,
>>>> plugging it in elsewhere, and writing to it.
>>>>
>>>> The real issue is just resuming the system after  r281874 was MFCed as a
>>>> part of 284034. No USB connected file systems are involved. I m happy to
>>>> see that it has been reverted for 10.2, but clearly, these changes are
>>>> needed down the line and I hope the issue can be resolved well before
>>>> 11.0.
>>>> (This assumes a 10.3 before 11.0 happens next year.)
>>>>
>>>>
>>>>  I have done some tests on my T530 at r285668 and had some (good and
>>> bad)
>>> surprises:
>>>
>>> 0) historically i915kms+drm2 could not be loaded by loader.conf without
>>> locking the machine, but needed to be loaded by rc.conf (kld_list). Now
>>> these modules can be loaded by loader.conf.
>>>
>>> 1) resume does not work with a non patched kernel, but works when the
>>> MFC of r281874 is reverted (i.e. r285863 applied) - in console mode (vt)
>>> and X.org.
>>>
>>> 2) and now is my bad surprise: when i915kms+drm2+iic*+kbdmux are not
>>> loaded at all, suspend works (in console mode of course), but not
>>> resume, both with the nonpatched and the patched kernel. After resume
>>> the screen keeps being black, but the system can be logged to with ssh,
>>> but cannot be powered off nor rebooted from another system. Furthermore
>>> the log shows unidentified _USB_ devices at resume (which never appeared
>>> in any log before):
>>>
>>> Jul 29 12:28:12 watson devd: Executing '/etc/rc.suspend acpi 0x03'
>>> Jul 29 12:28:12 watson acpi: suspend at 20150729 12:28:12
>>> Jul 29 12:28:37 watson kernel: uhub0: at usbus0, port 1, addr 1
>>> (disconnected)
>>> Jul 29 12:28:37 watson kernel: uhub1: at usbus1, port 1, addr 1
>>> (disconnected)
>>> Jul 29 12:28:37 watson kernel: ugen1.2: <vendor 0x8087> at usbus1
>>> (disconnected)
>>> Jul 29 12:28:37 watson kernel: uhub4: at uhub1, port 1, addr 2
>>> (disconnected)
>>> Jul 29 12:28:37 watson kernel: ugen1.3: <Chicony Electronics Co., Ltd.>
>>> at usbus1 (disconnected)
>>> Jul 29 12:28:37 watson kernel: uhub2: at usbus2, port 1, addr 1
>>> (disconnected)
>>> Jul 29 12:28:37 watson kernel: ugen2.2: <vendor 0x8087> at usbus2
>>> (disconnected)
>>> Jul 29 12:28:37 watson kernel: uhub3: at uhub2, port 1, addr 2
>>> (disconnected)
>>> Jul 29 12:28:37 watson kernel: ugen2.3: <Logitech> at usbus2
>>> (disconnected)
>>> Jul 29 12:28:37 watson kernel: ums0: at uhub3, port 5, addr 3
>>> (disconnected)
>>> Jul 29 12:28:37 watson kernel: acpi0: cleared fixed power button status
>>> Jul 29 12:28:37 watson kernel: em0: link state changed to DOWN
>>> Jul 29 12:28:37 watson kernel: xhci0: Port routing mask set to 0xffffffff
>>> Jul 29 12:28:37 watson kernel: uhub0: <0x8086 XHCI root HUB, class 9/0,
>>> rev 3.00/1.00, addr 1> on usbus0
>>> Jul 29 12:28:37 watson kernel: uhub1: <Intel EHCI root HUB, class 9/0,
>>> rev 2.00/1.00, addr 1> on usbus2
>>> Jul 29 12:28:37 watson kernel: uhub2: <Intel EHCI root HUB, class 9/0,
>>> rev 2.00/1.00, addr 1> on usbus1
>>> Jul 29 12:28:38 watson kernel: uhub0: 8 ports with 8 removable, self
>>> powered
>>> Jul 29 12:28:37 watson devd: Executing '/etc/rc.resume acpi 0x03'
>>> Jul 29 12:28:38 watson acpi: resumed at 20150729 12:28:38
>>> Jul 29 12:28:38 watson kernel: uhub2: 3 ports with 3 removable, self
>>> powered
>>> Jul 29 12:28:38 watson kernel: uhub1: 3 ports with 3 removable, self
>>> powered
>>> Jul 29 12:28:38 watson kernel: em0: link state changed to UP
>>> Jul 29 12:28:38 watson devd: Executing '/etc/rc.d/dhclient quietstart
>>> em0'
>>> Jul 29 12:28:39 watson kernel: ugen2.2: <vendor 0x8087> at usbus2
>>> Jul 29 12:28:39 watson kernel: uhub3: <vendor 0x8087 product 0x0024,
>>> class 9/0, rev 2.00/0.00, addr 2> on usbus2
>>> Jul 29 12:28:39 watson kernel: ugen1.2: <vendor 0x8087> at usbus1
>>> Jul 29 12:28:39 watson kernel: uhub4: <vendor 0x8087 product 0x0024,
>>> class 9/0, rev 2.00/0.00, addr 2> on usbus1
>>> Jul 29 12:28:40 watson kernel: uhub4: 6 ports with 6 removable, self
>>> powered
>>> Jul 29 12:28:41 watson kernel: uhub3: 8 ports with 8 removable, self
>>> powered
>>> Jul 29 12:28:41 watson kernel: ugen1.3: <Chicony Electronics Co., Ltd.>
>>> at usbus1
>>> Jul 29 12:28:41 watson devd: Executing 'logger Unknown USB device:
>>> vendor 0x04f2 product 0xb2ea bus uhub4'
>>> Jul 29 12:28:41 watson root: Unknown USB device: vendor 0x04f2 product
>>> 0xb2ea bus uhub4
>>> Jul 29 12:28:41 watson devd: Executing 'logger Unknown USB device:
>>> vendor 0x04f2 product 0xb2ea bus uhub4'
>>> Jul 29 12:28:41 watson root: Unknown USB device: vendor 0x04f2 product
>>> 0xb2ea bus uhub4
>>> Jul 29 12:28:41 watson kernel: ugen2.3: <Logitech> at usbus2
>>> Jul 29 12:28:41 watson devd: Executing 'logger Unknown USB device:
>>> vendor 0x046d product 0xc52b bus uhub3'
>>> Jul 29 12:28:41 watson root: Unknown USB device: vendor 0x046d product
>>> 0xc52b bus uhub3
>>> Jul 29 12:28:41 watson kernel: ums0: <Logitech USB Receiver, class 0/0,
>>> rev 2.00/24.00, addr 3> on usbus2
>>> Jul 29 12:28:41 watson kernel: ums0: 16 buttons and [XYZT] coordinates
>>> ID=2
>>> Jul 29 12:28:41 watson devd: Executing 'logger Unknown USB device:
>>> vendor 0x046d product 0xc52b bus uhub3'
>>> Jul 29 12:28:41 watson root: Unknown USB device: vendor 0x046d product
>>> 0xc52b bus uhub3
>>>
>>> I dare say that there is some mess somewhere..
>>>
>>> 4) last minute tests: I get the same resume problem as 3) supra when
>>> booting from a USB stick with a 11-CURRENT snapshot, both
>>> 20150330-r28086 and 20150722-r285794 (and cannot obtain anything useful
>>> from /var/log/messages)
>>>
>>> Claude Buisson
>>>
>>>
>>>  I am a bit confused by several things.
>>
>> 0) Yes, I understand that  i915 can be loaded at boot time without the
>> display going away, but I am still unclear on why people do this (or force
>> it's loading in any other way). IIRC, before the i915 code was committed
>> to
>> HEAD, the kernel module has auto-loaded when X started. I just "startx".
>> It
>> also loads fine if you use a display manager (gdm, xdm, kdm... )to start
>> X.
>>
>> Is there some special reason that you need the module loaded prior to X
>> starting?
>>
>>
> I do not like having console display sizes varying with the direction of
> winds, phase of the moon, etc.. So I try to load the module as soon as
> possible at boot, and find it in the same state when switching back from
> X to console mode.
>
> Any one is permitted to have personal preferences ?
>

Indeed you are. That is why I refused to use Gnome3 and now run mate. A
good answer, but the faont change never really bothered me. If it bothers
you, that is fine. Glad you can now load i915kms at boot.

>
>  1) Here the confusion starts. You say that resume works as of r285863
>> after
>> saying that it does not work with an non-patched kernel. These statement
>> seem self-contradictory. Are you saying that after r285863 that other
>> patches are required? Or are you talking about a distributed binary kernel
>> as opposed to STABLE, in which case, yes, you will need to wait until next
>> week for RC2.
>>
>>
> I say that:
>
> resume does not work on my T530 at r285668, and works on my T530 at
> r285668+r285863 i.e. the same system patched with r285863, with
> I915kms,drm2,.. modules preloaded at boot.
>

I was mis-reading this. The '8' and '6' look too much alike when I don't
look closely enough.

It should work on r285863 and does for me. So you just updated pci and pccb
to r285863 and the remainder of the system at 285668. Sounds good. That
combo would work fine on my system, too.


2) Yes, I was under the impression that vt(4) would allow suspend/resume to
>> work vtys, but I have also found that to not be the case. I just have not
>> had the time or interest to pursue the issue or even do any real testing.
>> (In other words, I really don't care too much.)
>>
>>
> I care..
>
> And my observation seems to imply that resume with vt does not work (at
> least on this system) with vt simply in vga mode. Next experiment will
> be to build a kernel witk syscons.
>

> But I would be happy to get an explanation of the detection of
> nonexistent/unknown USB devices at resume, without any physical change.
>

The unknown devices are things connected to USB that FreeBSD does not
support. E.g. fingerprint reader, Bluetooth or maybe the camera. Possibly
others. Internal USB devices on laptops can be difficult to track down
since, being unknown, FreeBSD can't say much other than that they are
present.

It seems unlikely that this is a suspend/resume issue. You might look at
usbconfig(8) to try to learn more.If the system does not fully resume
(blank scren), try to ssh in from another system or your phone and run
usbconfig(8) to compare. (If you have not run ssh on a cell phone, it's
interesting. Really, really tiny characters) I use the ssh client
implementation in ConnectBot on Android.

 4) This refers to "3)", but I only see 0, 1, 2, and 4. Where did 3 go?
>>
>>
> My error: replace 4) by 3) and 3) by 2) !
>
>  Look back in this thread for discussion of the issue. Simply stated, no
>> system can resume successfully when booted from a USB drive. The problem
>> is
>> that, on suspend all USB devices are removed from the system and on resume
>> they are re-created, but there is no way to assure that a device that is
>> present when the system is essentially "OFF" is consistent after it is
>> re-connected. This problem was one that I faced 30 years ago with long
>> dead
>> operating systems (RSX-11D, RSX11-M and IAS) and it has not been resolved
>> in all of those years. Thanks to the fact that all useful drives are now
>> "smart", there is hope that this can be resolved in some way. It is
>> certainly possible to confirm that the same physical drive is available on
>> resume, but it might be on a different device name, so this will likely
>> use
>> gptid or some similar technique to do so.
>>
>> Even then, how can you tell if the drive was moved to another system,
>> written to, and returned to the suspended system before it was restored?
>> Just don't expect it any time soon. It's a really hard problem that used
>> to
>> be an impossible one. I suspect that it will probably always be at least a
>> bit unsafe, as it is today with removable SATA drives. (I can suspend my
>> system, unplug the system disk (on the second spindle on my T520), plug it
>> into another system, modify its contents, and put it back, and resume to a
>> likely disaster. USB just makes this disaster easier.
>> --
>>
>
> You are saying that resume cannot work when booting from an USB key,
> even without any physical change intervening between suspend and resume.
> So this part of my experiment cannot succeed.
>
> I take note
>
>  Kevin Oberman, Network Engineer, Retired
>> E-mail: rkoberman@gmail.com
>> PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683
>>
>
> Claude Buisson, another "Network Engineer, Retired" ;-)
>

"Network Engineer, Retired" probably means that you were early to
networking. I started in the late 70s.  It was so much fun I gave up kernel
programming and then moved from LANs to WANs. Helped build one of the first
production 100G national backbone networks just before retiring. It was fun
right until I left. (Well, most days.) Mostly dealt with research nets such
as RENATER.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAN6yY1tbvciVtOWBLvPA9xndtcnnv_xv1hOip8FV73YJDwS6tg>