Date: Sun, 29 Dec 2013 19:14:32 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Warner Losh <imp@bsdimp.com> Cc: "freebsd-arch@freebsd.org Arch" <freebsd-arch@FreeBSD.org> Subject: Re: UART grab patch Message-ID: <20131229173041.E979@besplex.bde.org> In-Reply-To: <79C659B9-F402-42B6-8240-C4AB0A0EB92B@bsdimp.com> References: <79C659B9-F402-42B6-8240-C4AB0A0EB92B@bsdimp.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 28 Dec 2013, Warner Losh wrote: > OK. Looks like my initial assessment of mountroot was too optimistic. With the exception of the imx driver, all the uart drivers in the tree fell victim to this bug. I hit a second one with the Raspberry Pi, and went looking. Please use the Unix newline character in mail. > I've uploaded a patch to http://people.freebsd.org/~imp/uart-grab.diff that should solve the problem. I don't have all these systems to test on, so I'm hoping people can report back to me what works and what doesn't. Boot -a with and without the patch for all serial consoles will tell you if it is working. Type stuff. If it appears exactly as you type it, then the patch is working. If not, please let me know. > > The basic strategy for all these is to disable RX interrupts when the console is grabbed, and enable them when it is ungrabbed. This has to be done in the hardware, since masking the interrupt at the CPU or PIC level will cause many UARTs to misbehave and/or prevent other interrupts from happening that are shared which can cause problems on some platforms. This is very broken on xx50 (3 copies which differ only in style). There it doesn't actually disable interrupts using the interrupt enable register, but hacks on the fifo control register to try to reduce the rx fifo trigger level to 1. It has to hard-code the latter register since it is not supposed to be available at this level. You previously pointed out that the bug is smaller when the rx fifo is large, so this change makes the problem worse when its attempt succeeds. It is not so bad when either the fifo doesn't exist (then it has no effect) or when the trigger level doesn't get reduced all the way to 1 (this happens in some non-default xx50 modes). This is broken or fragile on most or all device types. It "restores" to a hard-coded state instead of to the previous state. Maybe uart fixates the state of the console device sufficently for the final state to be constant, but I doubt this. Indeed, just for xx50, buggy hardware needs special handling for tx interrupts, and uart has this. It sets the tx interrupt enable bit while transmitting and keeps it clear it otherwise. Thus the initial state of the interrupt enable register varies and the state restored should vary to match. The correct state to restore for this register can possibly be determined by looking at the global state of the driver, but this would be complicated (it needs transient locking of software state all over; only hardware state is locked). In general, hardware state should be saved in *sc by grab and restored by ungrab. Locking only this part of *sc across grab/ungrab is still difficult. Parts of the driver that might invalidate the saved state must be locked out. The patch also "sets" the state to a hard-coded state instead of modifying the current state. If this is done right, then it is correct since it amounts to an initialization of the driver to console mode. But the parts of the patch that set the correct (interrupt control) register enable all the non-tx interrupts even if none were enabled before (these are later "restored"). This asks for interrupts that might cause problems. On xx50, this has a chance of being harmless only due to the implementation detail that the hardware has an interrupt identification register and the software uses this register so as to not see hardware state changes unless their interrupt has happened. As I mentioned previously, grab/ungrab were supposed to handle both directions of i/o, but are currently only called from higher levels for input. Forcing rx interrupts off and tx interrupts on in grab is very input-centric. The locking is certainly incomplete. I don't remember even locking against recursion in earlier changes in uart grab/ungrab. (Note that there is no anti-recursion code at the top level, except for some deadlocking in cnputs()). syscons grab has the buggy locking: if (scp->grabbed++ > 0) return; Bugs in this start with the access being non-atomic. With atomic accesses, locking like this can work, but you might have to sprinkle too many atomic tests of the variable, using code like: uart_lock(...); if (scp->grabbed == 0) normal_operation(); else operation_reduced_to_avoid_confict_with_grabbed_mode(); uart_unlock(...); I changed this to use a mutex because atomic accesses are too hard to use. They would have to be spammed through normal_operation(). The lock must be held throughout. The !normal_operation() case may still be complicated. E.g., suppose you have a buggy grab routine that somehow allows an interrupt. Normal operation is to service the interrupt so that it doesn't repeat. Abnormal operation can't be to simply return from the interrupt handler, since then the interrupt might repeat endlessly or never occur again. Here it would be better to do the normal operation and break the console operation. Make this a feature and let invalid (but not very harmful) states occur occasionally and fix up the invalid states using polling. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131229173041.E979>