Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 24 Oct 2014 22:45:46 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Rui Paulo <rpaulo@me.com>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Rui Paulo <rpaulo@FreeBSD.org>
Subject:   Re: svn commit: r273598 - in head: include sys/dev/acpica
Message-ID:  <20141024194546.GE1877@kib.kiev.ua>
In-Reply-To: <33decfcd-e77c-4e4c-8161-9f4a232213c6@me.com>
References:  <33decfcd-e77c-4e4c-8161-9f4a232213c6@me.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Oct 24, 2014 at 07:33:07PM +0000, Rui Paulo wrote:
> On Oct 24, 2014, at 12:20 PM, Konstantin Belousov <kostikbel@gmail.com> wrote:
> 
> On Fri, Oct 24, 2014 at 06:39:16PM +0000, Rui Paulo wrote:
> > Author: rpaulo
> > Date: Fri Oct 24 18:39:15 2014
> > New Revision: 273598
> > URL: https://svnweb.freebsd.org/changeset/base/273598
> > 
> > Log:
> > HPET: create /dev/hpetN as a way to access HPET from userland.
> > 
> > In some cases, TSC is broken and special applications might benefit
> > from memory mapping HPET and reading the registers to count time.
> > Most often the main HPET counter is 32-bit only[1], so this only gives
> > the application a 300 second window based on the default HPET
> > interval.
> > Other applications, such as Intel's DPDK, expect /dev/hpet to be
> > present and use it to count time as well.
> > 
> > Although we have an almost userland version of gettimeofday() which
> > uses rdtsc in userland, it's not always possible to use it, depending
> > on how broken the multi-socket hardware is.
> Yes, and hpet userland mapping would be better handled through the same
> fake-vdso framework. As designed, it has discriminator to inform
> userspace about algorithm, and can happilly utilize HPET timecounter
> automatically mapped by kernel into the process address space.
> š
> I'm aware of that, but I found the vdso a bit confusing and decided to work on that later.
> 
> > +static int
> > +hpet_open(struct cdev *cdev, int oflags, int devtype, struct thread *td)
> > +{
> > + š š šstruct hpet_softc *sc;
> > +
> > + š š š šsc = cdev->si_drv1;
> > + š š š šif (!sc->mmap_allow)
> > + š š š š šreturn (EPERM);
> > + š š šif (atomic_cmpset_32(&sc->devinuse, 0, 1) == 0)
> > + š š š š š šreturn (EBUSY);
> This is extra-weird.
> The devinuse business disallows simultaneous opens, which prevents
> other process from opening and mapping. But if the original caller
> does mmap and close, second process now is allowed to open and mmap.
> 
> That said, why do you need this devinuse at all ?
> š
> Hmm, I wanted to avoid multiple mmap's, but that doesn't work like you said. šI may just remove this restriction.š
> 
This is probably best.

> > +static int
> > +hpet_mmap(struct cdev *cdev, vm_ooffset_t offset, vm_paddr_t *paddr,
> > + int nprot, vm_memattr_t *memattr)
> > +{
> > + š šstruct hpet_softc *sc;
> > +
> > + š šsc = cdev->si_drv1;
> > + š š š šif (offset > rman_get_size(sc->mem_res))
> > + š š š š š š š šreturn (EINVAL);
> > + š šif (!sc->mmap_allow_write && (nprot & PROT_WRITE))
> > + š š šreturn (EPERM);
> > + š š š*paddr = rman_get_start(sc->mem_res) + offset;
> What is the memattr for the backing page ? Is it set to non-cached
> mode somehow ? I was not able to find place where would this happen.
> š
> I expect it to be set to non-cached since it's a device anyway, but I don't know where it is. šDuring my testing, I did not see any problems with cached values, though.
> 
I am not claiming that it is wrong, only that I do not see an easy reason
why it is right.  Just printing the *memattr would provide the confidence.

> > + šsc->pdev = make_dev(&hpet_cdevsw, 0, UID_ROOT, GID_WHEEL,
> > + š š š š 0600, "hpet%d", device_get_unit(dev));
> > + š š š šif (sc->pdev) {
> > + š š š š š š š šsc->pdev->si_drv1 = sc;
> > + š š šsc->mmap_allow = 1;
> > + š š š š š š š šTUNABLE_INT_FETCH("hw.acpi.hpet.mmap_allow",
> > + š š š š š š &sc->mmap_allow);
> > + š š š š š šsc->mmap_allow_write = 1;
> > + š š š š š šTUNABLE_INT_FETCH("hw.acpi.hpet.mmap_allow_write",
> > + š š š š š š š š &sc->mmap_allow_write);
> > + š š š š š š š šSYSCTL_ADD_INT(device_get_sysctl_ctx(dev),
> > + š š š š š š š š SYSCTL_CHILDREN(device_get_sysctl_tree(dev)),
> > + š š š OID_AUTO, "mmap_allow",
> > + š š š š š CTLFLAG_RW, &sc->mmap_allow, 0,
> > + š š š š š š š š "Allow userland to memory map HPET");
> Why is mmap_allow is per-instance, while mmap_allow_write taken from
> the global tunable ?
> š
> Are you asking why there's no sysctl for it?

IMO the allow-write must be controllable per-instance, or just managed
by /dev/hpet* unix permissions.  Having one global knob, which is
consulted at the module load, is not flexible.

What is the use-case for writing to HPET page ? To manually micro-adjust
the timer ?  Then, you probably want to disable write for instance used
by system timecounter or eventtimer, but allow for HPET utilized by
other consumers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141024194546.GE1877>