Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 6 May 2005 11:08:32 +0200 (CEST)
From:      Erik Norgaard <norgaard@math.ku.dk>
To:        questions@freebsd.org
Subject:   Spontaneous reboots
Message-ID:  <Pine.LNX.4.40.0505050921350.22295-100000@shannon.math.ku.dk>

next in thread | raw e-mail | index | archive | help
Hi,

I am experiencing tremendous problems keeping my FBSD 5 up and
happy, yet I keep experiencing spontaneous reboots and crashes.

This is a looong story, I have been trying to figure out what's
causing the problem for two weeks now. I really appreciate
your patience and response if you make it all to the end :-)

The setup:

  FBSD---DSL---Internet

The DSL is a Thomsom 510 ADSL router doing 1-1 NAT, no firewall.
The FBSD is configured with IPFilter firewall and running named,
postfix, cyrus-imap22 with virtual domains and apache with
virtual hosts, also to serve the local net (behind the DSL) it
runs dhcpd, ntpd and mysql.

Postfix, Cyrus-Imap and Apache are all configured with TLS
support and I have generated certificates using OpenSSL. This
system was installed in november, and upgraded begning january. I
have had no problems for months.

Then - from the beginning:

On April 15, FreeBSD 5.3-p5, I had two simultaneous+/- events:

1) A huge number of incoming mail delivery attempts to addresses
   of the type randomchars@mydomain.com
2) Kernel panic, fatal trap 12

I had done no prior system tuning or changes.

Since then, uptime has been anywhere between 0 and >3 days - the
last obtained by stopping all services and disconnecting the
machine from the network.

1) By huge, I mean enough to suck up a 512kbps DSL connection,
but this should be far from enough to make FBSD cough or even
panic. Also, system load is always close to 0.00.

I have postfix handling mail and use cyrus-imap with virtual
domains as backend. Since postfix didn't know hosted addresses,
cyrus rejects the mail. I created a list of existing addresses so
mail could be rejected faster.

The illicit mail delivery attempts persists.

2) I followed the handbook to investigate the panic:
Following the kernel panic faq:

Fatal trap 12: Page fault while in kernel mode
Fault virtual address   = 0xc
Fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc053d638
stack pointer           = 0x10:0xcb4ddaec
frame pointer           = 0x10:0xcb4ddaf8
code segment            = base 0x0, limit 0xffff, type 0x1b
                         = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL=0
current process         = 28 (swi1:net)
trap number             = 12
panic: page fault

# nm -n /boot/kernel/kernel | grep  c053d6
c053d610 T m_copydata
c053d670 T m_dup

I no longer get this panic, however my system does not deserve the
predicate -STABLE. Somehow, I prefered the panic, at least it gave
some info for debugging. But now it reboots without a blip.

Disk errors:

The crashes _always_ causes disk errors that cannot be recovered
by the background fsck, particularly on /var where mail resides.
This may result in new reboots.

To solve this I have tried mounting drives read-only, unless
write permission was necesary. It turns out that postfix requires
write access to /, /usr and /var - the first two appears to be
related to tls(?).

Also, I have set fsck_y_enable="yes" in rc.conf, so the disk is
thorouly checked on boot after a crash.

I had dumpon set in my rc.conf but this just made the partition
full making things even worse. I have removed all kernel dumps and
also unnecessary data as I understood diskperformance may drop
when diskspace is below 15%.

The kernel:

The first kernel was a 5.3-p5 custom kernel. To make it easier to
debug I updated to -p8, GENERIC. No change. No change. Following
suggestions by Kris K. I upgraded to 5.4-RC2.

This solved the panic - but the system still crashes, also after
updating to RC3 and RC4.

The system:

Upgrading to 5.4, RC2, I built world also. I then realized that
some ports may have been built against the old base causing new
problems.

I have now deinstalled all ports. The system has been completely
updated, kernel and base, to 54RC4. I have reinstalled the
minimal set of ports needed to serve my needs, version to -CURRENT
as of may 3.

I still experience crashes.

Postfix:

I tried to limit the amount of simultaneaous deliveries handles.
No change.

When a connection is made postfix sends a lot of dns queryes to
verify that the sender hostname resolves to the ip, that sender
domain exists, and that it is not in a block list.

IPFilter:

I have restricted access to port 25, now only a handfull of
servers are permitted by the firewall. This has helped, uptime is
now hours rather than minutes, but I still have crashes.

I have reduced all timeouts to prevent state table from
saturating, but no change.

If I open up for incoming mail, for a (any) /8 segment, the number
of connections explode. Due to the limitation of simultaneous
postfix threads, many time out. No change.

I am working on a black list based on the maillog, but this is
another project.

DNS:

Since mail to mydomain.com is currently useless I have decided to
set the MX record to 127.0.0.1. This has stopped the illicit mail,
but also all other legitimate mail to that domain - mostly this
gives me peace and bandwith.

Hardware: (dmesg below)

I have tried to change the disk cable, I have a 2.5" disk with a
converter cable to standard IDE.

Also, I have tried the disk in my laptop and it appears stable,
but testing period was limited.

I have tried both IDE connectors on the MB and both NIC's. No
change.

Summary:

Despite all my attempts to solve the problem, my system is far
from STABLE. I still experience spontaneous crashes, allthough
less often.

It is my personal belief that there may be a hardware problem,
or persistent disk errors.

The reason is that despite the traffic load satturates the
connection it should not be enough to crash even limited hardware.
I have no more ideas on how to debug this.

Questions:

* Is there a disk tool for analysing the disk, marking sectors bad
  etc?
* How do I find the file if I know the Inode number (as reported
  by fsck)?
* Can malformed packets cause FBSD crash? Could Thomson510 be
  accountable for such packets?
* Did I miss the obvious?
* Any ideas where to go now?

All help is highly appreciated.

Thanks, Erik

Disk space: df
Filesystem  1K-blocks     Used    Avail Capacity  Mounted on
/dev/ad0s1a    507630    76966   390054    16%    /
devfs               1        1        0   100%    /dev
/dev/ad0s1g  30859916 14228272 14162852    50%    /home
/dev/ad0s1f    507630       42   466978     0%    /tmp
/dev/ad0s1d  12186190  2134420  9076876    19%    /usr
/dev/ad0s1e  12186190  7689462  3521834    69%    /var
devfs               1        1        0   100%    /var/named/dev

last (24h):
norgaard         ttyp0    x.x.x.x  Fri  6 May 10:09   still logged in
norgaard         ttyp0    x.x.x.x  Fri  6 May 09:22 - 09:25  (00:03)
norgaard         ttyp0    charm            Fri  6 May 08:28 - 08:42  (00:13)
norgaard         ttyp0    charm            Fri  6 May 07:48 - 08:00  (00:11)
reboot           ~                         Fri  6 May 04:16
norgaard         ttyp1    charm            Thu  5 May 22:45 - 23:18  (00:32)
norgaard         ttyp0    charm            Thu  5 May 22:09 - crash  (06:07)
reboot           ~                         Thu  5 May 22:05
norgaard         ttyp0    charm            Thu  5 May 21:45 - crash  (00:20)
reboot           ~                         Thu  5 May 21:20
norgaard         ttyp1    charm            Thu  5 May 21:11 - crash  (00:09)
norgaard         ttyp0    charm            Thu  5 May 20:45 - crash  (00:35)
reboot           ~                         Thu  5 May 18:57
norgaard         ttyp0    x.x.x.x  Thu  5 May 18:23 - 18:23  (00:00)
reboot           ~                         Thu  5 May 18:22
norgaard         ttyp0    x.x.x.x  Thu  5 May 16:44 - crash  (01:37)
norgaard         ttyp0    x.x.x.x  Thu  5 May 15:44 - 16:13  (00:28)
norgaard         ttyp0    x.x.x.x  Thu  5 May 13:57 - 13:58  (00:00)
norgaard         ttyp0    x.x.x.x  Thu  5 May 13:38 - 13:51  (00:12)
norgaard         ttyp0    x.x.x.x  Thu  5 May 13:06 - 13:27  (00:21)
norgaard         ttyp0    x.x.x.x  Thu  5 May 10:53 - 11:00  (00:06)
reboot           ~                         Thu  5 May 10:43
norgaard         ttyp0    x.x.x.x  Thu  5 May 10:37 - crash  (00:06)
norgaard         ttyp0    x.x.x.x  Thu  5 May 10:14 - 10:22  (00:08)
reboot           ~                         Thu  5 May 10:06
norgaard         ttyp0    charm            Thu  5 May 08:38 - crash  (01:27)
reboot           ~                         Thu  5 May 08:38
norgaard         ttyp0    charm            Thu  5 May 07:53 - 07:54  (00:00)
norgaard         ttyp0    charm            Thu  5 May 07:52 - 07:52  (00:00)
reboot           ~                         Thu  5 May 07:17
reboot           ~                         Thu  5 May 04:59
norgaard         ttyp0    charm            Thu  5 May 04:17 - crash  (00:41)
reboot           ~                         Thu  5 May 04:16
shutdown         ~                         Thu  5 May 04:14
norgaard         ttyp0    charm            Thu  5 May 03:45 - shutdown  (00:28)
reboot           ~                         Thu  5 May 03:42
reboot           ~                         Thu  5 May 03:40
norgaard         ttyp0    charm            Thu  5 May 03:40 - crash  (00:00)
reboot           ~                         Thu  5 May 03:31
reboot           ~                         Thu  5 May 03:27
reboot           ~                         Thu  5 May 03:13
reboot           ~                         Thu  5 May 03:03
reboot           ~                         Thu  5 May 02:58
reboot           ~                         Thu  5 May 02:51
reboot           ~                         Thu  5 May 02:47
reboot           ~                         Thu  5 May 02:41
reboot           ~                         Thu  5 May 02:35
reboot           ~                         Thu  5 May 02:29
reboot           ~                         Thu  5 May 02:25
reboot           ~                         Thu  5 May 02:20
reboot           ~                         Thu  5 May 02:09
reboot           ~                         Thu  5 May 01:58
reboot           ~                         Thu  5 May 01:53
reboot           ~                         Thu  5 May 01:50
reboot           ~                         Thu  5 May 01:46
reboot           ~                         Thu  5 May 01:42
reboot           ~                         Thu  5 May 01:33
reboot           ~                         Thu  5 May 01:30
reboot           ~                         Thu  5 May 01:27
reboot           ~                         Thu  5 May 01:13
reboot           ~                         Thu  5 May 01:08
reboot           ~                         Thu  5 May 01:05
reboot           ~                         Thu  5 May 00:58
reboot           ~                         Thu  5 May 00:53
reboot           ~                         Thu  5 May 00:44
reboot           ~                         Thu  5 May 00:34
reboot           ~                         Thu  5 May 00:24
reboot           ~                         Thu  5 May 00:20
reboot           ~                         Thu  5 May 00:13
reboot           ~                         Wed  4 May 23:58
reboot           ~                         Wed  4 May 23:43
reboot           ~                         Wed  4 May 23:40
reboot           ~                         Wed  4 May 23:36
norgaard         ttyp0    charm            Wed  4 May 20:57 - 23:29  (02:31)

Note the reboots from Wed 4, 23.36 - Thu 5 7.52 appeared to be
caused by postfix throtling due to a read only mounted /usr.

dmesg.today:

Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992,
1993, 1994
        The Regents of the University of California. All rights
reserved.
FreeBSD 5.4-RC4 #0: Tue May  3 14:07:30 CEST 2005
    root@top.daemonsecurity.com:/usr/obj/usr/src/sys/GENERIC
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: VIA C3 Nehemiah+RNG (1002.28-MHz 686-class CPU)
  Origin = "CentaurHauls"  Id = 0x694  Stepping = 4

Features=0x380b03d<FPU,DE,PSE,TSC,MSR,MTRR,PGE,CMOV,MMX,FXSR,SSE>
real memory  = 251592704 (239 MB)
avail memory = 236548096 (225 MB)
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <VT9174 AWRDACPI> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
cpu0: <ACPI CPU (3 Cx states)> on acpi0
acpi_throttle0: <ACPI CPU Throttling> on cpu0
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
agp0: <VIA 862x (CLE266) host to PCI bridge> mem 0xd0000000-0xd7ffffff at device 0.0 on pci0
pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pci1: <display, VGA> at device 0.0 (no driver attached)
vr0: <VIA VT6105 Rhine III 10/100BaseTX> port 0xd000-0xd0ff mem 0xde000000-0xde0000ff irq 12 at device 15.0 on pci0
miibus0: <MII bus> on vr0
ukphy0: <Generic IEEE 802.3u media interface> on miibus0
ukphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
vr0: Ethernet address: 00:40:63:d4:89:72
uhci0: <VIA 83C572 USB controller> port 0xd400-0xd41f irq 11 at
device 16.0 on pci0
usb0: <VIA 83C572 USB controller> on uhci0
usb0: USB revision 1.0
uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <VIA 83C572 USB controller> port 0xd800-0xd81f irq 11 at device 16.1 on pci0
usb1: <VIA 83C572 USB controller> on uhci1
usb1: USB revision 1.0
uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: <VIA 83C572 USB controller> port 0xdc00-0xdc1f irq 9 at device 16.2 on pci0
usb2: <VIA 83C572 USB controller> on uhci2
usb2: USB revision 1.0
uhub2: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
pci0: <serial bus, USB> at device 16.3 (no driver attached)
isab0: <PCI-ISA bridge> at device 17.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <VIA 8235 UDMA133 controller> port 0xe000-0xe00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 17.1 on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
pci0: <multimedia, audio> at device 17.5 (no driver attached)
vr1: <VIA VT6102 Rhine II 10/100BaseTX> port 0xe800-0xe8ff mem 0xde002000-0xde0020ff irq 11 at device 18.0 on pci0
miibus1: <MII bus> on vr1
ukphy1: <Generic IEEE 802.3u media interface> on miibus1
ukphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
vr1: Ethernet address: 00:40:63:d4:89:71
fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
ppc0: <Standard parallel printer port> port 0x378-0x37f irq 7 on acpi0
ppc0: Generic chipset (EPP/NIBBLE) in COMPATIBLE mode
ppbus0: <Parallel port bus> on ppc0
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
sio2: <16550A-compatible COM port> port 0x3e8-0x3ef irq 5 on acpi0
sio2: type 16550A
sio3: <16550A-compatible COM port> port 0x2e8-0x2ef irq 10 on acpi0
sio3: type 16550A
orm0: <ISA Option ROM> at iomem 0xc0000-0xcdfff on isa0
pmtimer0 on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x64,0x60 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounter "TSC" frequency 1002278507 Hz quality 800
Timecounters tick every 10.000 msec
ad0: 57231MB <IC25N060ATMR04-0/MO3OAD4A> [116280/16/63] at ata0-master UDMA100
Mounting root from ufs:/dev/ad0s1a
WARNING: /home was not properly dismounted
WARNING: /tmp was not properly dismounted
WARNING: /usr was not properly dismounted
WARNING: /var was not properly dismounted
IP Filter: v3.4.35 initialized.  Default = pass all, Logging =
enabled
Accounting enabled



GnuPG: http://www.locolomo.org/home/norgaard/norgaard.gpg.asc
pub  1024D/11D11F9E 2003-08-15 Erik Norgaard <norgaard@locolomo.org>
     Key fingerprint = C394 81C4 D137 EEE5 39BE  82D5 3E6B FB3E 11D1 1F9E





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.4.40.0505050921350.22295-100000>