Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Apr 2012 13:30:09 GMT
From:      Gavin Mu <gavin.mu@gmail.com>
To:        freebsd-sparc64@FreeBSD.org
Subject:   Re: sparc64/165025: [PATCH] zfsboot support for sparc64
Message-ID:  <201204301330.q3UDU95f093133@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR sparc64/165025; it has been noted by GNATS.

From: Gavin Mu <gavin.mu@gmail.com>
To: Marius Strobl <marius@alchemy.franken.de>
Cc: bug-followup@freebsd.org
Subject: Re: sparc64/165025: [PATCH] zfsboot support for sparc64
Date: Mon, 30 Apr 2012 21:24:46 +0800

 On Mon, Apr 30, 2012 at 12:10 AM, Marius Strobl
 <marius@alchemy.franken.de> wrote:
 > On Sun, Apr 22, 2012 at 08:40:13PM +0000, Marius Strobl wrote:
 >> The following reply was made to PR sparc64/165025; it has been noted by =
 GNATS.
 >>
 >> From: Marius Strobl <marius@alchemy.franken.de>
 >> To: Gavin Mu <gavin.mu@gmail.com>
 >> Cc: bug-followup@freebsd.org, Kurt Lidl <lidl@pix.net>
 >> Subject: Re: sparc64/165025: [PATCH] zfsboot support for sparc64
 >> Date: Sun, 22 Apr 2012 22:32:11 +0200
 >>
 >> =A0On Thu, Apr 12, 2012 at 10:27:31PM +0800, Gavin Mu wrote:
 >> =A0> On Mon, Mar 5, 2012 at 2:06 AM, Marius Strobl <marius@alchemy.frank=
 en.de> wrote:
 >> =A0> > Typically, opening and closing devices via OFW causes quite a del=
 ay,
 >> =A0> > the exact impact depends on the firmware version and the devices
 >> =A0> > involved though. Therefore, it would be advisable to keep using t=
 he
 >> =A0> > current approach of caching opened packages. In what way does thi=
 s
 >> =A0> > fail with ZFS?
 >> =A0> The error message on Fire V100 is: Fast Data Access MMU Miss
 >> =A0>
 >> =A0> > Basically, IEEE 1275 just says that support for
 >> =A0> > opening a package more than once depends on the particular packag=
 e
 >> =A0> > but nothing about concurrently opening different packages. Not
 >> =A0> > being able to concurrently open different packages also doesn't
 >> =A0> > make all that much of a sense as opening one package also means
 >> =A0> > to subsequentially open all the parents up to the root if not
 >> =A0> > already opened and I think to actually have tested opening disks
 >> =A0> > concurrently when writing the current code. Could this fail due
 >> =A0> > to one device actually being opened twice, once via the full path
 >> =A0> > and once via its alias?
 >> =A0> There is no such scene though the code lacks the checking for full
 >> =A0> path/devalias.
 >> =A0> I have tried many times to find the root cause but I think it is
 >> =A0> difficult without open firmware knowledge.
 >> =A0> currently I found that following scenes will cause this issue with =
 my test code:
 >> =A0> 1. do OF_seek(ihandle_t a) just after OF_close(ihandle_t b). in rea=
 l
 >> =A0> world, OF_seek(a) is the step to read ZFS data just after OF_close(=
 )
 >> =A0> another disk during zfs init/probe.
 >> =A0> 2. do OF_seek(ihandle_t a) just after OF_open("available controller
 >> =A0> without disk"). For example there is no disk3 on my machine though
 >> =A0> there is disk controller. OF_open("disk3:") will report:
 >> =A0> Can't read disk label.
 >> =A0> Can't open disk label package
 >> =A0>
 >> =A0> in ofw_disk.c, OF_close() has been commented out for powerpc
 >> =A0> architecture, and can not find detail reason from code history, so =
 I
 >> =A0> am thinking if we need also disable OF_close() for sparc64.
 >>
 >> =A0Hrm, some OFW implementations might have reference counting bugs,
 >> =A0causing OF_close() to also close some parent(s) when these in fact
 >> =A0are still used by another opened child device. Have you tried how
 >> =A0it works when just commenting out the OF_close() in ofwd_close() but
 >> =A0leaving the rest of ofw_disk.c as is? If that works, we probably
 >> =A0can add a cleanup handler which closes all opened disk devices
 >> =A0before leaving the loader, still taking advantage of caching opened
 >> =A0disks.
 >>
 >
 > With the machines I have at hand, I can't reproduce this problem,
 > i.e. ofw_disk.c as is works just fine for booting from a mirror.
 > This suggests that what you are seeing actually is a bug in the
 > specific firmware implementation rather than a general limitation
 > imposed by OFW. Could you please give the following patch a try?
 > It implements what I've described above, i.e. combines both caching
 > opened devices and properly closing all opened disks when leaving
 > the loader.
 > http://people.freebsd.org/~marius/ofw_disk_close_on_cleanup.diff
 >
 > Marius
 >
 
 I am sorry that I have lost the access to the sparc machines since I
 am moving to a new job.
 I checked your modification and there should be an issue of opening a
 slice twice, one by the real dev path and the other by devalias.
 ofw_disk can not deal with such scene.
 My testing machine is Sun Fire V100, a very old machine, and with the
 latest firmware (surely it is also very old though).
 I am not sure if we can still find same machine to do the testing. or
 can we call for some testers on several different hardware models
 before committing the code?
 
 Regards,
 Gavin Mu



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201204301330.q3UDU95f093133>