Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 22 Aug 2014 11:07:55 -0700
From:      Neel Natu <neelnatu@gmail.com>
To:        Warner Losh <imp@bsdimp.com>
Cc:        "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, "src-committers@freebsd.org" <src-committers@freebsd.org>, Warner Losh <imp@freebsd.org>
Subject:   Re: svn commit: r270249 - head/sys/cam/ata
Message-ID:  <CAFgRE9HhqK77EXbs2HvW=fye6Vs_d=yrydho8F0mAf0AFnHE%2Bg@mail.gmail.com>
In-Reply-To: <118A680A-E4E4-4FEF-9C9C-44771F89A2D7@bsdimp.com>
References:  <201408202258.s7KMwDh3073409@svn.freebsd.org> <CAFgRE9GvOMp4EmryGVJvPdZiUcKU0cZ3aajTfrEu8TJkAk2d-g@mail.gmail.com> <0DAF2357-4BBA-4D5B-8F17-D61845BACDA5@bsdimp.com> <CAFgRE9HA29K3NFwRoTRWf-UHq-NWSt=jGN0hX=aAwrdV5Qbqkg@mail.gmail.com> <118A680A-E4E4-4FEF-9C9C-44771F89A2D7@bsdimp.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Warner,

On Fri, Aug 22, 2014 at 6:13 AM, Warner Losh <imp@bsdimp.com> wrote:
>
> On Aug 21, 2014, at 11:58 PM, Neel Natu <neelnatu@gmail.com> wrote:
>
>> Hi Warner,
>>
>> On Thu, Aug 21, 2014 at 10:34 PM, Warner Losh <imp@bsdimp.com> wrote:
>>>
>>> On Aug 21, 2014, at 10:31 PM, Neel Natu <neelnatu@gmail.com> wrote:
>>>
>>>> Hi Warner,
>>>>
>>>> On Wed, Aug 20, 2014 at 3:58 PM, Warner Losh <imp@freebsd.org> wrote:
>>>>> Author: imp
>>>>> Date: Wed Aug 20 22:58:12 2014
>>>>> New Revision: 270249
>>>>> URL: http://svnweb.freebsd.org/changeset/base/270249
>>>>>
>>>>> Log:
>>>>> Turns out that IDENTIFY DEVICE and IDENTIFY PACKET DEVICE return data
>>>>> that's only mostly similar. Specifically word 78 bits are defined for
>>>>> IDENTIFY DEVICE as
>>>>>       5 Supports Hardware Feature Control
>>>>> while a IDENTIFY PACKET DEVICE defines them as
>>>>>       5 Asynchronous notification supported
>>>>> Therefore, only pay attention to bit 5 when we're talking to ATAPI
>>>>> devices (we don't use the hardware feature control at this time).
>>>>> Ignore it for ATA devices. Remove kludge that papered over this issue
>>>>> for Samsung SATA SSDs, since Micron drives also have the bit set and
>>>>> the error was caused by this bad interpretation of the spec (which is
>>>>> quite easy to do, since bits aren't normally overlapping like this).
>>>>>
>>>>> Modified:
>>>>> head/sys/cam/ata/ata_xpt.c
>>>>>
>>>>> Modified: head/sys/cam/ata/ata_xpt.c
>>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D
>>>>> --- head/sys/cam/ata/ata_xpt.c  Wed Aug 20 22:39:26 2014        (r270=
248)
>>>>> +++ head/sys/cam/ata/ata_xpt.c  Wed Aug 20 22:58:12 2014        (r270=
249)
>>>>> @@ -458,12 +458,18 @@ negotiate:
>>>>>                   0, 0x02);
>>>>>               break;
>>>>>       case PROBE_SETAN:
>>>>> -               /* Remember what transport thinks about AEN. */
>>>>> -               if (softc->caps & CTS_SATA_CAPS_H_AN)
>>>>> +               /*
>>>>> +                * Only ATAPI defines this bit to mean AEN, but remem=
ber
>>>>> +                * what transport thinks about AEN.
>>>>> +                */
>>>>> +               if ((softc->caps & CTS_SATA_CAPS_H_AN) &&
>>>>> +                   periph->path->device->protocol =3D=3D PROTO_ATAPI=
)
>>>>>                       path->device->inq_flags |=3D SID_AEN;
>>>>>               else
>>>>>                       path->device->inq_flags &=3D ~SID_AEN;
>>>>>               xpt_async(AC_GETDEV_CHANGED, path, NULL);
>>>>> +               if (periph->path->device->protocol !=3D PROTO_ATAPI)
>>>>> +                       break;
>>>>>               cam_fill_ataio(ataio,
>>>>>                   1,
>>>>>                   probedone,
>>>>> @@ -750,14 +756,6 @@ out:
>>>>>                       goto noerror;
>>>>>
>>>>>               /*
>>>>> -                * Some Samsung SSDs report supported Asynchronous No=
tification,
>>>>> -                * but return ABORT on attempt to enable it.
>>>>> -                */
>>>>> -               } else if (softc->action =3D=3D PROBE_SETAN &&
>>>>> -                   status =3D=3D CAM_ATA_STATUS_ERROR) {
>>>>> -                       goto noerror;
>>>>> -
>>>>> -               /*
>>>>>                * SES and SAF-TE SEPs have different IDENTIFY commands=
,
>>>>>                * but SATA specification doesn't tell how to identify =
them.
>>>>>                * Until better way found, just try another if first fa=
il.
>>>>>
>>>>
>>>> This change causes a panic for me on boot. Here is the boot log:
>>>>
>>>> ahci0: <Intel Patsburg AHCI SATA controller> port
>>>> 0xf050-0xf057,0xf040-0xf043,0xf030-0xf037,0xf020-0xf023,0xf000-0xf01f
>>>> mem 0xfbb21000-0xfbb217ff irq 18 at device 31.2 on pci0
>>>> ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
>>>> ahcich0: <AHCI channel> at channel 0 on ahci0
>>>> ahcich1: <AHCI channel> at channel 1 on ahci0
>>>> ahcich2: <AHCI channel> at channel 2 on ahci0
>>>> ahcich3: <AHCI channel> at channel 3 on ahci0
>>>> ahcich4: <AHCI channel> at channel 4 on ahci0
>>>> ahcich5: <AHCI channel> at channel 5 on ahci0
>>>> ahciem0: <AHCI enclosure management bridge> on ahci0
>>>> ...
>>>> xpt_action_default: CCB type 0xdeadc0de not supported
>>>> ...
>>>> run_interrupt_driven_hooks: still waiting after 60 seconds for xpt_con=
fig
>>>> run_interrupt_driven_hooks: still waiting after 120 seconds for xpt_co=
nfig
>>>> run_interrupt_driven_hooks: still waiting after 180 seconds for xpt_co=
nfig
>>>> run_interrupt_driven_hooks: still waiting after 240 seconds for xpt_co=
nfig
>>>> run_interrupt_driven_hooks: still waiting after 300 seconds for xpt_co=
nfig
>>>> panic: run_interrupt_driven_config_hooks: waited too long
>>>> cpuid =3D 0
>>>> KDB: stack backtrace:
>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff=
81d92920
>>>> kdb_backtrace() at kdb_backtrace+0x39/frame 0xffffffff81d929d0
>>>> vpanic() at vpanic+0x189/frame 0xffffffff81d92a50
>>>> kassert_panic() at kassert_panic+0x139/frame 0xffffffff81d92ac0
>>>> boot_run_interrupt_driven_config_hooks() at
>>>> boot_run_interrupt_driven_config_hooks+0x111/frame 0xffffffff81d92b50
>>>> mi_startup()fffff81d92b70
>>>> btext() at btext+0x2c
>>>> KDB: enter: panic
>>>> [ thread pid 0 tid 100000 ]
>>>> Stopped at      kdb_enter+0x3e: movq    $0,kdb_why
>>>> db>
>>>>
>>>> The peripheral in question is a SATA attached CDROM:
>>>>
>>>> % camcontrol devlist
>>>> <INTEL SSDSC2CW240A3 400i>         at scbus0 target 0 lun 0 (pass0,ada=
0)
>>>> <ATAPI iHAS524   C LL23>           at scbus2 target 0 lun 0 (cd0,pass1=
)
>>>> <WDC WD1000CHTZ-04JCPV0 04.06A00>  at scbus3 target 0 lun 0 (pass2,ada=
1)
>>>> <Corsair Neutron GTX SSD M306>     at scbus4 target 0 lun 0 (pass3,ada=
2)
>>>> <AHCI SGPIO Enclosure 1.00 0001>   at scbus6 target 0 lun 0 (ses0,pass=
4)
>>>>
>>>> pass1 at ahcich2 bus 0 scbus2 target 0 lun 0
>>>> pass1: <ATAPI iHAS524   C LL23> Removable CD-ROM SCSI-0 device
>>>> pass1: Serial Number 3524472 2N8225501140
>>>> pass1: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192=
bytes)
>>>>
>>>> The following patch fixes the panic.
>>>>
>>>> Index: sys/cam/ata/ata_xpt.c
>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>> --- sys/cam/ata/ata_xpt.c       (revision 270249)
>>>> +++ sys/cam/ata/ata_xpt.c       (working copy)
>>>> @@ -468,7 +468,8 @@
>>>>               else
>>>>                       path->device->inq_flags &=3D ~SID_AEN;
>>>>               xpt_async(AC_GETDEV_CHANGED, path, NULL);
>>>> -               if (periph->path->device->protocol !=3D PROTO_ATAPI)
>>>> +               if (periph->path->device->protocol !=3D PROTO_ATAPI &&
>>>> +                   periph->path->device->protocol !=3D PROTO_SCSI)
>>>>                       break;
>>>>               cam_fill_ataio(ataio,
>>>>                   1,
>>>
>>> I think the more proper test is =3D=3D PROTO_ATA elsewhere, since that=
=E2=80=99s what
>>> distinguishes the ATA_IDENTIFY from the ATAPI_IDENTIFY.
>>>
>>>> However, there seem to be a couple of issues with the original patch:
>>>>
>>>> 1. The 'periph->path->device->protocol' is not initialized to
>>>> PROTO_ATAPI anywhere in the tree so the not-equal-to test is  a no-op.
>>>
>>> We test here to determine which identify command to send:
>>>
>>>                if (periph->path->device->protocol =3D=3D PROTO_ATA)
>>>                        ata_28bit_cmd(ataio, ATA_ATA_IDENTIFY, 0, 0, 0);
>>>                else
>>>                        ata_28bit_cmd(ataio, ATA_ATAPI_IDENTIFY, 0, 0, 0=
);
>>>
>>> and that is working to send the right command.
>>>
>>
>> Yes, but PROTO_ATA !=3D PROTO_ATAPI :-)
>>
>> Since we never initialize 'periph->path->device->protocol' to
>> 'PROTO_ATAPI' in -current:
>
> But this code appears to:
>
>         case PROBE_RESET:
>         {
>                 int sign =3D (done_ccb->ataio.res.lba_high << 8) +
>                     done_ccb->ataio.res.lba_mid;
>                 CAM_DEBUG(path, CAM_DEBUG_PROBE,
>                     ("SIGNATURE: %04x\n", sign));
>                 if (sign =3D=3D 0x0000 &&
>                     done_ccb->ccb_h.target_id !=3D 15) {
>                         path->device->protocol =3D PROTO_ATA;
>                         PROBE_SET_ACTION(softc, PROBE_IDENTIFY);
>                 } else if (sign =3D=3D 0x9669 &&
>                     done_ccb->ccb_h.target_id =3D=3D 15) {
>                         /* Report SIM that PM is present. */
>                         bzero(&cts, sizeof(cts));
>                         xpt_setup_ccb(&cts.ccb_h, path, CAM_PRIORITY_NONE=
);
>                         cts.ccb_h.func_code =3D XPT_SET_TRAN_SETTINGS;
>                         cts.type =3D CTS_TYPE_CURRENT_SETTINGS;
>                         cts.xport_specific.sata.pm_present =3D 1;
>                         cts.xport_specific.sata.valid =3D CTS_SATA_VALID_=
PM;
>                         xpt_action((union ccb *)&cts);
>                         path->device->protocol =3D PROTO_SATAPM;
>                         PROBE_SET_ACTION(softc, PROBE_PM_PID);
>                 } else if (sign =3D=3D 0xc33c &&
>                     done_ccb->ccb_h.target_id !=3D 15) {
>                         path->device->protocol =3D PROTO_SEMB;
>                         PROBE_SET_ACTION(softc, PROBE_IDENTIFY_SES);
>                 } else if (sign =3D=3D 0xeb14 &&
>                     done_ccb->ccb_h.target_id !=3D 15) {
>                         path->device->protocol =3D PROTO_SCSI;
>                         PROBE_SET_ACTION(softc, PROBE_IDENTIFY);
>                 } else {
>                         if (done_ccb->ccb_h.target_id !=3D 15) {
>                                 xpt_print(path,
>                                     "Unexpected signature 0x%04x\n", sign=
);
>                         }
>                         goto device_fail;
>                 }
>
> what am I missing?
>

In the snippet above 'protocol' is set to one of PROTO_ATA,
PROTO_SATAPM, PROTO_SEMB or PROTO_SCSI - none of which is PROTO_ATAPI
:-)

$ find sys -type f -exec grep -nH -w PROTO_ATAPI  {} \;
sys/cam/scsi/scsi_pass.c:355: if (cgd->protocol =3D=3D PROTO_SCSI ||
cgd->protocol =3D=3D PROTO_ATAPI)
sys/cam/cam_ccb.h:249: PROTO_ATAPI, /* AT Attachment Packetized Interface *=
/

best
Neel

>> if (protocol !=3D PROTO_ATAPI) equates to if (1)
>> if (protocol =3D=3D PROTO_ATAPI) equates to if (0)
>>
>> I was trying to say that any code that compares 'protocol' to
>> PROTO_ATAPI probably deserves a second look (e.g., the original patch
>> that triggered this panic).
>
> Yes, but I think you=E2=80=99re analysis was incorrect on this point :)
>
>>>> 2. It seems not right to break out of switch in 'probestart()' without
>>>> providing a way for 'probedone()' to be called. I believe that this
>>>> stops the state machine from making forward progress and results in
>>>> 'xpt_config()' not completing.
>>>
>>> That=E2=80=99s a problem, you=E2=80=99re right. Let me rework.
>>>
>>>> If you need more information to debug this some more or test a proper
>>>> fix then I am happy to help.
>>>
>>> Please try the one included here. I think it will address things. I=E2=
=80=99ve tried it on one system, and am trying it on others in parallel to =
sending this.
>>>
>>
>> Yup, works fine. Thanks for the quick fix!
>
> Will push it in. Thanks.
>
> Warner



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFgRE9HhqK77EXbs2HvW=fye6Vs_d=yrydho8F0mAf0AFnHE%2Bg>