Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 5 Jan 2013 00:53:36 +0100
From:      Marius Strobl <marius@alchemy.franken.de>
To:        Kurt Lidl <lidl@pix.net>
Cc:        freebsd-sparc64@freebsd.org
Subject:   Re: smartmontools panics 9.1-RELEASE on sunfire 240
Message-ID:  <20130104235336.GB37999@alchemy.franken.de>
In-Reply-To: <20130104051914.GA22613@pix.net>
References:  <20130104051914.GA22613@pix.net>

next in thread | previous in thread | raw e-mail | index | archive | help

--ZPt4rx8FFjLCG7dd
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit

On Fri, Jan 04, 2013 at 12:19:15AM -0500, Kurt Lidl wrote:
> Greetings all --
> 
> I recently endeavored to install the same suite of ports on
> my sparc64 machines as I have installed on my amd64 hosts.
> 
> I installed the smartmontools from /usr/ports (it installed
> smartmontools-6.0), configured it thusly:
> 
> echo 'DEVICESCAN -a -m somealias@example.com' > \
> 	/usr/local/etc/smartd.conf
> 
> And then when I started the daemon:
> 
> /usr/local/etc/rc.d/smartd start
> Starting smartd.
> (pass0:ata2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 01 00
> (pass0:ata2:0:0:0): CAM status: ATA Status Error
> (pass0:ata2:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
> (pass0:ata2:0:0:0): RES: 51 04 00 00 00 00 00 00 00 01 00
> 
> And the kernel panic'd:
> 
> panic: trap: memory address not aligned (kernel)
> cpuid = 0
> KDB: stack backtrace:
> #0 0xc086c934 at trap+0x554
> Uptime: 2h0m44s
> Automatic reboot in 15 seconds - press a key on the console to abort
> Rebooting...
> 
> It's running the GENERIC kernel from 9.1-RELEASE.
> 
> lidl@host-2: uname -a
> FreeBSD [redacted] 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243836: Tue Dec  4 15:49:34 UTC 2012     root@heller.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  sparc64
> 
> It's completely repeatable.  The second time it panic'd, the
> machine successfully wrote a crashdump file, and savecore recovered it.
> Here's what kgdb had to say:
> 
> root@spork-1: kgdb /boot/kernel/kernel /var/crash/vmcore.0
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "sparc64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> panic: trap: memory address not aligned (kernel)
> cpuid = 0
> KDB: stack backtrace:
> #0 0xc086c934 at trap+0x554
> Uptime: 23m57s
> Dumping 8192 MB (4 chunks)
>   chunk at 0: 2147483648 bytes |
> 
> Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/zfs.ko
> Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/opensolaris.ko
> Reading symbols from /boot/kernel/geom_mirror.ko...Reading symbols from /boot/kernel/geom_mirror.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/geom_mirror.ko
> Reading symbols from /boot/kernel/aio.ko...Reading symbols from /boot/kernel/aio.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/aio.ko
> Reading symbols from /boot/kernel/accf_data.ko...Reading symbols from /boot/kernel/accf_data.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/accf_data.ko
> Reading symbols from /boot/kernel/accf_http.ko...Reading symbols from /boot/kernel/accf_http.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/accf_http.ko
> Reading symbols from /boot/kernel/pflog.ko...Reading symbols from /boot/kernel/pflog.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/pflog.ko
> Reading symbols from /boot/kernel/pf.ko...Reading symbols from /boot/kernel/pf.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/pf.ko
> #0  0x00000000c053199c in doadump (textdump=Variable "textdump" is not available.
> )
>     at /usr/src/sys/kern/kern_shutdown.c:259
> 259             savectx(&dumppcb);
> (kgdb) where
> #0  0x00000000c053199c in doadump (textdump=Variable "textdump" is not available.
> )
>     at /usr/src/sys/kern/kern_shutdown.c:259
> #1  0x00000000c05323a0 in kern_reboot (howto=260)
>     at /usr/src/sys/kern/kern_shutdown.c:448
> #2  0x00000000c053284c in panic (fmt=0xc0ab6a40 "trap: %s (kernel)")
>     at /usr/src/sys/kern/kern_shutdown.c:636
> #3  0x00000000c086c93c in trap (tf=0xea36b0b0)
>     at /usr/src/sys/sparc64/sparc64/trap.c:411
> #4  0x00000000c0099060 in tl1_trap ()
> #5  0x00000000c012aebc in ata_pio_read (request=0xfffff80003642800, length=512)
>     at bus.h:548
> #6  0x00000000c012ae0c in ata_pio_read (request=0xfffff800053927b0, length=512)
>     at /usr/src/sys/dev/ata/ata-lowlevel.c:838
> #7  0x00000000c012c004 in ata_end_transaction (request=dwarf2_read_address: Corrupted DWARF expression.
> )
>     at /usr/src/sys/dev/ata/ata-lowlevel.c:282
> #8  0x00000000c0128750 in ata_interrupt_locked (data=0xfffff80003642800)
>     at /usr/src/sys/dev/ata/ata-all.c:586
> #9  0x00000000c01287e8 in ata_interrupt (data=0xfffff80003642800)
>     at /usr/src/sys/dev/ata/ata-all.c:549
> #10 0x00000000c0130100 in ata_generic_intr (data=Variable "data" is not available.
> )
>     at /usr/src/sys/dev/ata/ata-pci.c:814
> #11 0x00000000c04ff8c0 in intr_event_execute_handlers (p=0xfffff800032a2dc8,
>     ie=0xfffff800033b3d00) at /usr/src/sys/kern/kern_intr.c:1262
> #12 0x00000000c05014a4 in ithread_loop (arg=0xfffff800036a15c0)
>     at /usr/src/sys/kern/kern_intr.c:1275
> #13 0x00000000c04fc548 in fork_exit (callout=0xc05013c0 <ithread_loop>,
>     arg=0xfffff800036a15c0, frame=0xea36b880)
>     at /usr/src/sys/kern/kern_fork.c:992
> #14 0x00000000c0099270 in fork_trampoline ()
> #15 0x00000000c0099270 in fork_trampoline ()
> Previous frame identical to this frame (corrupt stack?)
> (kgdb) up 5
> #5  0x00000000c012aebc in ata_pio_read (request=0xfffff80003642800, length=512)
>     at bus.h:548
> 548     bus.h: No such file or directory.
>         in bus.h
> (kgdb) up 1
> #6  0x00000000c012ae0c in ata_pio_read (request=0xfffff800053927b0, length=512)
>     at /usr/src/sys/dev/ata/ata-lowlevel.c:838
> 838         struct ata_channel *ch = device_get_softc(request->parent);
> (kgdb) list
> 833     }
> 834
> 835     static void
> 836     ata_pio_read(struct ata_request *request, int length)
> 837     {
> 838         struct ata_channel *ch = device_get_softc(request->parent);
> 839         uint8_t *addr;
> 840         int size = min(request->transfersize, length);
> 841         int resid;
> 842         uint8_t buf[2];
> (kgdb) p *request
> $1 = {dev = 0x0, parent = 0xfffff80003563400, unit = 0, u = {ata = {
>       command = 161 '¡', feature = 0, count = 1, lba = 0}, atapi = {
>       ccb = "¡\000\000\000\000\001\000\000\000\000\000\000\000\000\000",
>       sense = {error = 0 '\0', segment = 0 '\0', key = 0 '\0', cmd_info = 0,
>         sense_length = 0 '\0', cmd_specific_info = 0, asc = 0 '\0',
>         ascq = 0 '\0', replaceable_unit_code = 0 '\0', specific = 0 '\0',
>         specific1 = 0 '\0', specific2 = 0 '\0'}, saved_cmd = 0 '\0'}},
>   bytecount = 512, transfersize = 512, data = 0xdd551577 "", tag = 0,
>   flags = 2, dma = 0x0, status = 88 'X', error = 0 '\0', donecount = 0,
>   result = 0, callback = 0, done = {sema_mtx = {lock_object = {lo_name = 0x0,
>         lo_flags = 0, lo_data = 0, lo_witness = 0x0}, mtx_lock = 0},
>     sema_cv = {cv_description = 0x0, cv_waiters = 0}, sema_waiters = 0,
>     sema_value = 0}, retries = 0, timeout = 20, callout = {c_links = {sle = {
>         sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0xc1950940}},
>     c_time = 1456010, c_arg = 0xfffff800053927b0,
>     c_func = 0xc012f040 <ata_timeout>, c_lock = 0xfffff80003642ac0,
>     c_flags = 22, c_cpu = 0}, task = {ta_link = {stqe_next = 0x0},
>     ta_pending = 0, ta_priority = 0, ta_func = 0, ta_context = 0x0},
>   bio = 0x0, this = 0, composite = 0x0, driver = 0x0, chain = {tqe_next = 0x0,
>     tqe_prev = 0x0}, ccb = 0xfffff80005375000}

Uhm, probably an userland buffer which isn't even 16-bit aligned.
If that's the cause, the attached patch hopefully should at least
prevent the panic. If it does, smartmontools still need to be fixed
though.

Marius


--ZPt4rx8FFjLCG7dd
Content-Type: text/x-diff; charset=us-ascii
Content-Disposition: attachment; filename="cam_periph.c.diff"

Index: cam_periph.c
===================================================================
--- cam_periph.c	(revision 245046)
+++ cam_periph.c	(working copy)
@@ -744,6 +744,9 @@ cam_periph_mapmem(union ccb *ccb, struct cam_perip
 		if ((ccb->ccb_h.flags & CAM_DIR_MASK) == CAM_DIR_NONE)
 			return(0);
 
+		if ((uintptr_t)ccb->ataio.data_ptr % sizeof(uint16_t) != 0)
+			return (EINVAL);
+
 		data_ptrs[0] = &ccb->ataio.data_ptr;
 		lengths[0] = ccb->ataio.dxfer_len;
 		dirs[0] = ccb->ccb_h.flags & CAM_DIR_MASK;

--ZPt4rx8FFjLCG7dd--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130104235336.GB37999>