Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 29 Dec 2013 11:19:25 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        "freebsd-arch@freebsd.org Arch" <freebsd-arch@FreeBSD.org>
Subject:   Re: UART grab patch
Message-ID:  <7C0A2FDD-7531-4C9F-B89B-004682E05F80@bsdimp.com>
In-Reply-To: <20131229173041.E979@besplex.bde.org>
References:  <79C659B9-F402-42B6-8240-C4AB0A0EB92B@bsdimp.com> <20131229173041.E979@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Dec 29, 2013, at 1:14 AM, Bruce Evans wrote:

> On Sat, 28 Dec 2013, Warner Losh wrote:
>=20
>> OK. Looks like my initial assessment of mountroot was too optimistic. =
With the exception of the imx driver, all the uart drivers in the tree =
fell victim to this bug. I hit a second one with the Raspberry Pi, and =
went looking.
>=20
> Please use the Unix newline character in mail.

Apple's Mail.app hates me. I don't see how to change it.

>> I've uploaded a patch to =
http://people.freebsd.org/~imp/uart-grab.diff that should solve the =
problem. I don't have all these systems to test on, so I'm hoping people =
can report back to me what works and what doesn't.  Boot -a with and =
without the patch for all serial consoles will tell you if it is =
working. Type stuff. If it appears exactly as you type it, then the =
patch is working. If not, please let me know.
>>=20
>> The basic strategy for all these is to disable RX interrupts when the =
console is grabbed, and enable them when it is ungrabbed. This has to be =
done in the hardware, since masking the interrupt at the CPU or PIC =
level will cause many UARTs to misbehave and/or prevent other interrupts =
from happening that are shared which can cause problems on some =
platforms.
>=20
> This is very broken on xx50 (3 copies which differ only in style).

Yea, I think we need to remerge them.

> There it doesn't actually disable interrupts using the interrupt =
enable
> register, but hacks on the fifo control register to try to reduce the
> rx fifo trigger level to 1.  It has to hard-code the latter register
> since it is not supposed to be available at this level.  You =
previously
> pointed out that the bug is smaller when the rx fifo is large, so this
> change makes the problem worse when its attempt succeeds.  It is not
> so bad when either the fifo doesn't exist (then it has no effect) or
> when the trigger level doesn't get reduced all the way to 1 (this
> happens in some non-default xx50 modes).

Doh! That's what I get for trying to do this without properly consulting
the data sheets.

> This is broken or fragile on most or all device types.  It "restores" =
to a
> hard-coded state instead of to the previous state.  Maybe uart fixates
> the state of the console device sufficently for the final state to be
> constant, but I doubt this. =20

For many of the UARTs, the state is fixed in attached, so blasting
out a fixed state is sufficient.

> Indeed, just for xx50, buggy hardware needs
> special handling for tx interrupts, and uart has this.  It sets the tx
> interrupt enable bit while transmitting and keeps it clear it =
otherwise.
> Thus the initial state of the interrupt enable register varies and the
> state restored should vary to match.  The correct state to restore
> for this register can possibly be determined by looking at the global
> state of the driver, but this would be complicated (it needs transient
> locking of software state all over; only hardware state is locked).

Yes. That's a flaw in the current implementation. We don't have the
software state passed around, and we should.

> In general, hardware state should be saved in *sc by grab and restored
> by ungrab.  Locking only this part of *sc across grab/ungrab is still
> difficult.  Parts of the driver that might invalidate the saved state
> must be locked out.

Yes. "sc" isn't saved in uart_devinfo for the console, so we can't do =
this
today. I'll work things up so that we can do that...

> The patch also "sets" the state to a hard-coded state instead of =
modifying
> the current state.  If this is done right, then it is correct since it
> amounts to an initialization of the driver to console mode.  But the
> parts of the patch that set the correct (interrupt control) register
> enable all the non-tx interrupts even if none were enabled before =
(these
> are later "restored").  This asks for interrupts that might cause =
problems.
> On xx50, this has a chance of being harmless only due to the =
implementation
> detail that the hardware has an interrupt identification register and =
the
> software uses this register so as to not see hardware state changes =
unless
> their interrupt has happened.

Good point...  I'd forgotten just how picky the nsXX50 parts are.

> As I mentioned previously, grab/ungrab were supposed to handle both
> directions of i/o, but are currently only called from higher levels
> for input.  Forcing rx interrupts off and tx interrupts on in grab
> is very input-centric.

For now, grab/ungrab is rx centric. Fixing that is beyond the scope of
what I'm trying to achieve.

> The locking is certainly incomplete.  I don't remember even locking
> against recursion in earlier changes in uart grab/ungrab.  (Note that
> there is no anti-recursion code at the top level, except for some
> deadlocking in cnputs()).

The proper fix to this is at the top level. I'll look into this as a =
separate issue.

> syscons grab has the buggy locking:
>=20
> 	if (scp->grabbed++ > 0)
> 		return;
>=20
> Bugs in this start with the access being non-atomic.  With atomic
> accesses, locking like this can work, but you might have to sprinkle
> too many atomic tests of the variable, using code like:
>=20
> 	uart_lock(...);
> 	if (scp->grabbed =3D=3D 0)
> 		normal_operation();
> 	else
> 		operation_reduced_to_avoid_confict_with_grabbed_mode();
> 	uart_unlock(...);

Concurrent access is verboten, so I'm not sure how useful this would =
be...
I'll have to think about this..

> I changed this to use a mutex because atomic accesses are too hard to
> use.  They would have to be spammed through normal_operation().  The
> lock must be held throughout.  The !normal_operation() case may still
> be complicated.  E.g., suppose you have a buggy grab routine that
> somehow allows an interrupt.  Normal operation is to service the
> interrupt so that it doesn't repeat.  Abnormal operation can't be to
> simply return from the interrupt handler, since then the interrupt
> might repeat endlessly or never occur again.  Here it would be better
> to do the normal operation and break the console operation.  Make this
> a feature and let invalid (but not very harmful) states occur
> occasionally and fix up the invalid states using polling.

True. This would make it more robust. However, given the extent of grabs
today, I'm not sure that this is worth fixing. I'll have to think about =
it.

Thanks for the review.

Warner=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7C0A2FDD-7531-4C9F-B89B-004682E05F80>