Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Oct 1999 22:40:00 -0400 (EDT)
From:      Andrew Gallatin <gallatin@cs.duke.edu>
To:        freebsd-hackers@freebsd.org
Cc:        freebsd-alpha@freebsd.org
Subject:   ip forwarding broken on alpha
Message-ID:  <14359.43410.495963.975277@grasshopper.cs.duke.edu>

next in thread | raw e-mail | index | archive | help

I have an older AlphaStation 600 5/266 running -current (cvsupped
last week) which is setup as a router between 2 100mb networks.  When
the machine is pushed fairly hard (like running a netperf -tUDP_STREAM
-- -m 100 across the router, eg about 10-20k 100byte packets/sec ) the
alpha falls over almost instantly.  I have not enabled any NAT or
firewall functionality, just ip forwarding.

It generally crashes in MCLGET down in the ethernet driver's receiver
interrupt handler.  The driver doesn't seem to matter -- I've tried
Intel Etherexpress Pro 100Bs and 3Com 3c905C-TX Fast Etherlink XLs.  A
typical stack trace looks like this:

fatal kernel trap:

    trap entry = 0x2 (memory management fault)
    a0         = 0x826417b78f222
    a1         = 0x1
    a2         = 0x0
    pc         = 0xfffffc00004b31bc
    ra         = 0xfffffc00004b315c
    curproc    = 0

ddbprinttrap from 0xfffffc00004b31bc
ddbprinttrap(0x826417b78f222, 0x1, 0x0, 0x2)
panic: trap
panic
Stopped at      Debugger+0x2c:  ldq     ra,0(sp) <0xfffffe0005ab57d0>   <ra=0xff
fffc00005042e0,sp=0xfffffe0005ab57d0>
db> tr
Debugger() at Debugger+0x2c
panic() at panic+0xf4
trap() at trap+0x5cc
xl_newbuf() at xl_newbuf+0x15c
(null)() at 0x4
db> c

this maps to pci/if_xl.c:1654.  But the if_xl driver is probably not
at fault, as I can crash just as easily in fxp_add_rfabuf() when using
intel nics.

Before trying the 3com cards, I had been working under the assumption
that it was a problem with the fxp driver.  I instrumented the mbuf
routines somewhat (i hate debugging macros) and it seems the bad
access is due to mclfree getting trashed & replaced by a "random" bad
value (0x826417b78f222 in this panic).

This might be a red herring, but I've found that if I run the entire
ip_input path under splnet() (added splnet() around the call to
ip_input() in ipintr().), things get a hell of a lot more stable.
Rather than crashing in a few seconds, it sometimes takes minutes.
And rather than an illegal access, I tend to run out of kernel stack
space ( either a panic("possible stack overflow\n"); in
alpha/alpha/interrupt.c, or I end up in the SRM console after calling
halt from a PC which isn't in the kernel, which smells like an overrun
stack to me).  I'm not sure if this is related, or if it is a separate
problem entirely.

Since an x86 (PII@300MHz, 440lx motherboard, kernel built from same
sources) is rock solid under the same workload, I suspect there's
something wrong that is alpha specific, but I'll be damned if I can
figure it out.

My best guess is that it has something to do with the different
interrupt structure on i386 & alpha.  As I understand it, the i386 can
mask off particular interrupt sources, but the alpha simply raises &
lowers the ipl with the following levels available
(from alpha/include/alpha_cpu.h):

#define ALPHA_PSL_IPL_0         0x0000          /* all interrupts enabled */
#define ALPHA_PSL_IPL_SOFT      0x0001          /* software ints disabled */
#define ALPHA_PSL_IPL_IO        0x0004          /* I/O dev ints disabled */
#define ALPHA_PSL_IPL_CLOCK     0x0005          /* clock ints disabled */
#define ALPHA_PSL_IPL_HIGH      0x0006          /* all but mchecks disabled */

Can anybody hazard a guess as to what's going on?  I've appended dmesg
output & my config file for completeness.

BTW, as long as the load is light, ip forwarding seems to work.  I
can't seem to make this happen using 2 100Mb tulips in this box (which
must copy on the input path due to DMA alignment problems, this slows
things down quite a bit, due to the low memory bandwidth of this
machine)

Thanks,

Drew
------------------------------------------------------------------------------
Andrew Gallatin, Sr Systems Programmer	http://www.cs.duke.edu/~gallatin
Duke University				Email: gallatin@cs.duke.edu
Department of Computer Science		Phone: (919) 660-6590


Copyright (c) 1992-1999 The FreeBSD Project.
Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California. All rights reserved.
FreeBSD 4.0-CURRENT #4: Wed Oct 27 11:35:25 EDT 1999
    gallatin@torrent.cs.duke.edu:/usr/project/ari_scratch2/gallatin/src/sys/comp
ile/ALPHA
AlphaStation 500 or 600 (KN20AA)
Digital AlphaStation 600 5/266, 266MHz
8192 byte page size, 1 processor.
CPU: EV5 (21164) major=5 minor=0
OSF PAL rev: 0x1000000020116
real memory  = 131940352 (128848K bytes)
avail memory = 122200064 (119336K bytes)
Preloaded elf kernel "kernel" at 0xfffffc0000674000.
cia0: ALCOR/ALCOR2, pass 2
pcib0: <2117x PCI host bus adapter> on cia0
pci0: <PCI bus> on pcib0
xl0: <3Com 3c905C-TX Fast Etherlink XL> irq 8 at device 7.0 on pci0
xl0: interrupting at CIA irq 8
xl0: Ethernet address: 00:50:da:09:3e:41
miibus0: <MII bus> on xl0
xlphy0: <3c905C 10/100 internal PHY> on miibus0
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
pcib1: <DEC 21050 PCI-PCI bridge> at device 8.0 on pci0
pci1: <PCI bus> on pcib1
de0: <Digital 21040 Ethernet> irq 16 at device 0.0 on pci1
de0: interrupting at CIA irq 16
de0: DEC 21040 [10Mb/s] pass 2.3
de0: address 08:00:2b:e7:e6:d6
isp0: <Qlogic ISP 1020/1040 PCI SCSI Adapter> irq 17 at device 1.0 on pci1
isp0: interrupting at CIA irq 17
isp0: invalid NVRAM header (aa,aa,aa,aa)
isp0: isp_mboxcmd sees mailbox int with 0x0 in mbox0
isp0: isp_mboxcmd sees mailbox int with 0x0 in mbox0
<..>
isp1: <Qlogic ISP 1020/1040 PCI SCSI Adapter> irq 18 at device 2.0 on pci1
isp1: interrupting at CIA irq 18
isp1: isp_mboxcmd sees mailbox int with 0x0 in mbox0
isp1: invalid NVRAM header (55,55,55,55)
isp1: isp_mboxcmd sees mailbox int with 0x0 in mbox0
isp1: isp_mboxcmd sees mailbox int with 0x0 in mbox0
de1: <Digital 21140 Fast Ethernet> irq 12 at device 9.0 on pci0
de1: interrupting at CIA irq 12
de1: DEC DE500-XA 21140 [10-100Mb/s] pass 1.1
de1: address 00:00:f8:00:99:ba
de1: enabling Full Duplex 100baseTX port
isab0: <Intel 82375EB PCI-EISA bridge> at device 10.0 on pci0
isa0: <ISA bus> on isab0
xl1: <3Com 3c905C-TX Fast Etherlink XL> irq 0 at device 11.0 on pci0
xl1: interrupting at CIA irq 0
xl1: Ethernet address: 00:50:da:09:42:41
miibus1: <MII bus> on xl1
xlphy1: <3c905C 10/100 internal PHY> on miibus1
xlphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
xl2: <3Com 3c905C-TX Fast Etherlink XL> irq 4 at device 12.0 on pci0
xl2: interrupting at CIA irq 4
xl2: Ethernet address: 00:50:da:09:3f:e8
miibus2: <MII bus> on xl2
xlphy2: <3c905C 10/100 internal PHY> on miibus2
xlphy2:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
mcclock0: <MC146818A real time clock> at port 0x70-0x71 on isa0
sio0 at port 0x3f8-0x3ff irq 4 on isa0
sio0: type 16550A, console
sio0: interrupting at ISA irq 4
sio1 at port 0x2f8-0x2ff irq 3 flags 0x80 on isa0
sio1: type 16550A
sio1: interrupting at ISA irq 3
fdc0: interrupting at ISA irq 6
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <keyboard controller (i8042)> at port 0x60-0x6f on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
atkbd0: interrupting at ISA irq 1
struct nfssvc_sock bloated (> 256bytes)
Try reducing NFS_UIDHASHSIZ
struct nfsuid bloated (> 128bytes)
Try unionizing the nu_nickname and nu_flag fields
Timecounter "alpha"  frequency 266671691 Hz
Waiting 3 seconds for SCSI devices to settle
isp0: driver initiated bus reset of bus 0
isp1: driver initiated bus reset of bus 0
de0: autosense failed: cable problem?
Creating DISK da0
Creating DISK da1
Creating DISK cd0
da0 at isp0 bus 0 target 0 lun 0
da0: <SEAGATE ST15150W 0023> Fixed Direct Access SCSI-2 device 
da0: 20.000MB/s transfers (10.000MHz, offset 12, 16bit), Tagged Queueing Enabled
da0: 4095MB (8388315 512 byte sectors: 255H 63S/T 522C)
da1 at isp0 bus 0 target 1 lun 0
da1: <SEAGATE ST32171W 0484> Fixed Direct Access SCSI-2 device 
da1: 20.000MB/s transfers (10.000MHz, offset 12, 16bit), Tagged Queueing Enabled
da1: 2062MB (4223444 512 byte sectors: 255H 63S/T 262C)
cd0 at isp0 bus 0 target 5 lun 0
cd0: <DEC RRD45   (C) DEC 1645> Removable CD-ROM SCSI-2 device 
cd0: 4.032MB/s transfers (4.032MHz, offset 12)
cd0: Attempt to query device size failed: NOT READY, Medium not present

#
machine		alpha
cpu		EV4
cpu		EV5
ident		ALPHA
maxusers	32

# Platforms supported
options		DEC_AXPPCI_33		# UDB, Multia, AXPpci33, Noname
options		DEC_EB164		# EB164, PC164, PC164LX, PC164SX
options		DEC_EB64PLUS		# EB64+, Aspen Alpine, etc
options		DEC_2100_A50		# AlphaStation 200, 250, 255, 400
options		DEC_KN20AA		# AlphaStation 500, 600
options		DEC_ST550		# Personal Workstation 433, 500, 600
options		DEC_ST6600		# xp1000, dp264, ds20, ds10, family
#options		DEC_3000_300		# DEC3000/300* Pelic* family
#options		DEC_3000_500		# DEC3000/[4-9]00 Flamingo/Sandpiper family

options		INET			#InterNETworking
`options		FFS			#Berkeley Fast Filesystem
options		NFS			#Network Filesystem
options		MFS			#Memory Filesystem
options		MFS_ROOT		#Memory Filesystem as rootfs
options		MSDOSFS			#MSDOS Filesystem
options		CD9660			#ISO 9660 Filesystem
options		CD9660_ROOT		#CD-ROM usable as root device
options		FFS_ROOT		#FFS usable as root device [keep this!]
options		NFS_ROOT		#NFS usable as root device
options		PROCFS			#Process filesystem
options		COMPAT_43		#Compatible with BSD 4.3 [KEEP THIS!]
options		SCSI_DELAY=3000	#Be pessimistic about Joe SCSI device
options		UCONSOLE		#Allow users to grab the console
options   	SOFTUPDATES

# Standard busses
controller	pci0
controller	isa0

# A single entry for any of these controllers (ncr, ahb, ahc, amd) is
# sufficient for any number of installed devices.
controller	ncr0
controller	isp0
controller	ahc0
#controller	esp0

controller	scbus0

device		da0
device		sa0
device		pass0
device		cd0

#
# ATA and ATAPI devices
# This is work in progress, use at your own risk.
# It currently reuses the majors of wd.c and friends.
# It cannot co-exist with the old system in one kernel.
# You only need one "controller ata0" for it to find all
# PCI devices on modern machines.
controller	ata0
device		atadisk0	# ATA disk drives
device		atapicd0	# ATAPI CDROM drives
device		atapifd0	# ATAPI floppy drives
device		atapist0	# ATAPI tape drives

# real time clock
device		mcclock0 at isa0 port 0x70

controller	fdc0	at isa? port IO_FD1 irq 6 drq 2
disk		fd0	at fdc0 drive 0

controller	atkbdc0	at isa? port IO_KBD
device		atkbd0	at atkbdc? irq 1
device		psm0	at atkbdc? irq 12

device		vga0	at isa? port ? conflicts

# splash screen/screen saver
pseudo-device	splash

# syscons is the default console driver, resembling an SCO console
device		sc0	at isa?

device		sio0	at isa0 port IO_COM1 irq 4
device		sio1	at isa0 port IO_COM2 irq 3 flags 0x80

# MII bus support, required for some 10/100 NICs.
controller miibus0

# Operational PCI Ethernet drivers.
device al0
device ax0
device de0
device dm0
device fxp0
device le0
device mx0
device pn0
device rl0
device sf0
device sis0
device ste0
device tl0
device vr0
device wb0
device xl0

pseudo-device	loop
pseudo-device	ether
pseudo-device	sl	1
pseudo-device	ppp	1
pseudo-device	tun
pseudo-device	pty
pseudo-device	bpf	4

# KTRACE enables the system-call tracing facility ktrace(2).
# This adds 4 KB bloat to your kernel, and slightly increases
# the costs of each syscall.
options		KTRACE		#kernel tracing

# This provides support for System V shared memory and message queues.
#
options         SYSVSHM
options         SYSVMSG
options         SYSVSEM

#
# everything above is essentially GENERIC.  customizations below.
#

options         DDB
options         BREAK_TO_DEBUGGER



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?14359.43410.495963.975277>