Date: Fri, 3 May 2013 12:58:20 GMT From: Luiz Otavio O Souza <loos.br@gmail.com> To: freebsd-gnats-submit@FreeBSD.org Subject: kern/178318: [patch] [arge] if_arge/bootp race under some circunstances Message-ID: <201305031258.r43CwK4x023533@red.freebsd.org> Resent-Message-ID: <201305031300.r43D01Es097971@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 178318 >Category: kern >Synopsis: [patch] [arge] if_arge/bootp race under some circunstances >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri May 03 13:00:01 UTC 2013 >Closed-Date: >Last-Modified: >Originator: Luiz Otavio O Souza >Release: -head r250121 >Organization: >Environment: FreeBSD rb433 10.0-CURRENT FreeBSD 10.0-CURRENT #61 r250121M: Fri May 3 09:45:51 BRT 2013 root@devel:/data/rb/rb433/obj/mips.mips/data/rb/rb433/src/sys/RSPRO mips >Description: I'd discovered (by the hard way :) that adding some debug on arge_init_locked() (like the example bellow) will cause bootp to fail. Index: mips/atheros/if_arge.c =================================================================== --- mips/atheros/if_arge.c (revision 250121) +++ mips/atheros/if_arge.c (working copy) @@ -1006,6 +1006,7 @@ ARGE_LOCK_ASSERT(sc); +printf("%s: called\n", __func__); arge_stop(sc); /* Init circular RX list. */ Bootp will loop for a while with the timeout message until the kernel panics: arge0: link state changed to UP arge_init_locked: called arge_init_locked: called arge_init_locked: called arge_init_locked: called arge_init_locked: called arge_init_locked: called arge_init_locked: called arge_init_locked: called DHCP/BOOTP timeout for server 255.255.255.255 arge_init_locked: called arge_init_locked: called DHCP/BOOTP timeout for server 255.255.255.255 arge_init_locked: called arge_init_locked: called DHCP/BOOTP timeout for server 255.255.255.255 arge_init_locked: called arge_init_locked: called DHCP/BOOTP timeout for server 255.255.255.255 arge_init_locked: called arge_init_locked: called DHCP/BOOTP timeout for server 255.255.255.255 arge_init_locked: called panic: EFBIG KDB: enter: panic [ thread pid 0 tid 100000 ] Stopped at kdb_enter+0x4c: lui at,0x8059 db> After confirm that it really was the printf() that causes the problem i started to look why arge_init() was being called twice between the timeouts and why it was making bootp timeout and fail to boot. A few things contribute for this race to occur, first arge_init() forces a full stop->start cicle every time it is called, so with the following debug we can understand what happens: bootpc_call: set netmask 0.0.0.0 arge_init_locked: called bootpc_call: sosend() bootpc_call: set netmask 255.0.0.0 arge_init_locked: called DHCP/BOOTP timeout for server 255.255.255.255 bootpc_call: soreceive() bootpc_call: soreceive() bootpc_call: soreceive() bootpc_call: soreceive() bootpc_call: set netmask 0.0.0.0 arge_init_locked: called bootpc_call: sosend() bootpc_call: set netmask 255.0.0.0 arge_init_locked: called DHCP/BOOTP timeout for server 255.255.255.255 bootpc_call: soreceive() bootpc_call: soreceive() bootpc_call: soreceive() bootpc_call: soreceive() bootpc_call: soreceive() bootpc_call: set netmask 0.0.0.0 If arge_init() isn't fast enough while resetting the driver on the second netmask change it will miss the bootp response packet. >How-To-Repeat: Add something like this to arge_init_locked(): Index: mips/atheros/if_arge.c =================================================================== --- mips/atheros/if_arge.c (revision 250121) +++ mips/atheros/if_arge.c (working copy) @@ -1006,6 +1006,7 @@ ARGE_LOCK_ASSERT(sc); +printf("%s: called\n", __func__); arge_stop(sc); /* Init circular RX list. */ Add the following to RSPRO kernel: Index: sys/mips/conf/RSPRO =================================================================== --- sys/mips/conf/RSPRO (revision 250121) +++ sys/mips/conf/RSPRO (working copy) @@ -28,3 +28,12 @@ # Boot off of flash options ROOTDEVNAME=\"ufs:redboot/rootfs.uzip\" +options NFSCL +options NFS_ROOT +options BOOTP +options BOOTP_NFSROOT +options BOOTP_NFSV3 +options BOOTP_WIRED_TO=arge0 +options BOOTP_COMPAT + + And try boot from bootp. >Fix: The fix is based on simply refuse to proceed with the driver restart if the driver is already 'up' and 'running'. There is no need to restart the driver on each time we change or add an IP address or netmask. Then, if we just proceed when the driver is stopped we don't need to force the stop->start cicle anymore. The leakage that leads to the panic will be fixed in a subsequent PR. Patch attached with submission follows: Index: sys/mips/atheros/if_arge.c =================================================================== --- sys/mips/atheros/if_arge.c (revision 250121) +++ sys/mips/atheros/if_arge.c (working copy) @@ -1006,7 +1006,8 @@ ARGE_LOCK_ASSERT(sc); - arge_stop(sc); + if ((ifp->if_flags & IFF_UP) && (ifp->if_drv_flags & IFF_DRV_RUNNING)) + return; /* Init circular RX list. */ if (arge_rx_ring_init(sc) != 0) { >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201305031258.r43CwK4x023533>