Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 5 Aug 2017 11:28:48 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Colin Percival <cperciva@tarsnap.com>
Cc:        Hans Petter Selasky <hps@selasky.org>, cem@freebsd.org,  src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r321985 - head/sys/ofed/drivers/infiniband/core
Message-ID:  <20170805105048.J1055@besplex.bde.org>
In-Reply-To: <0100015dabf98abb-dde34a5d-d5bf-4dab-ae6f-897cd12526dc-000000@email.amazonses.com>
References:  <201708030918.v739IPVY034866@repo.freebsd.org> <CAG6CVpVL49nVqRs5atub=d2P39EGOqcNtx_Raa8fWtV=BFZXbw@mail.gmail.com> <bd19c6f8-fa97-c9c5-6318-1778e38dd0a9@selasky.org> <0100015dabf98abb-dde34a5d-d5bf-4dab-ae6f-897cd12526dc-000000@email.amazonses.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 4 Aug 2017, Colin Percival wrote:

> On 08/03/17 23:28, Hans Petter Selasky wrote:
>> On 08/03/17 16:37, Conrad Meyer wrote:
>>> Is it not important that the subtraction and result are evaluated
>>> without truncation?
>>
>> ticks is a circular counter. Assuming time = 0 and jiffies = -1U, then "delay"
>> becomes a very large negative value, because long is used, and the delay <= 0
>> check, is no longer working like expected.
>>
>> Casting to "int" or truncating is the right thing to do in this case.
>
> Signed integer overflow is undefined.  Using 'int' is liable to cause problems
> after 2^32 ticks.

There is no (new) overflow here.  Converting a higher rank type to int has
implementation-defined behaviour.  I have yet to see an implementation that
defines its behaviour except in its source code, but all implementations
that I have seen do the right thing of truncating 2's complement integers.
Perverse implementations would produce specific garbage values or perhaps
rand().  They just have to document this.

'int overflows after 2^31-1 ticks if int is 32 bits.  It is using u_int that
is liable to cause problems after 2^32 ticks.  Using long and u_long causes
exactly the same problems if long is 32 bits.

>>>> Modified: head/sys/ofed/drivers/infiniband/core/addr.c
>>>> ==============================================================================
>>>> --- head/sys/ofed/drivers/infiniband/core/addr.c        Thu Aug  3 09:14:43
>>>> 2017        (r321984)
>>>> +++ head/sys/ofed/drivers/infiniband/core/addr.c        Thu Aug  3 09:18:25
>>>> 2017        (r321985)
>>>> @@ -187,10 +187,10 @@ EXPORT_SYMBOL(rdma_translate_ip);
>>>>
>>>>   static void set_timeout(unsigned long time)
>>>>   {
>>>> -       unsigned long delay;
>>>> +       int delay;      /* under FreeBSD ticks are 32-bit */
>>>>
>>>>          delay = time - jiffies;

There is no overflow here.  The RHS is calculated in u_long arithmetic which
has truncation instead of overflow.

First, jiffies is converted to u_long in a standards-defined way.  If
jiffies is non-negative, then its value is unchanged by the conversion.
I think jiffies can't be negative, but if it is then there are
complications -- the value must be changed; it is changed by adding a
multiple of ULONG_MAX which is normally 1, but it not easy to see that
the multiple is always 1 (if that uis the case).

Then the subtraction can't overflow, but it produces a garbage result
of time < jiffies.  As usual, using unsigned is a bug.  Here it breaks
calculations of negative differences.

>>>> -       if ((long)delay <= 0)
>>>> +       if (delay <= 0)
>>>>                  delay = 1;

Then we cast to long in the old code, and assign the RHS to int in the
new code.  Both are implementation-defined.  The cast does nothing different
from assignment except it might change compiler warnings.

The old code seems to be correct enough.  Suppose time is 59 and jiffies
is 60.  Then time - jiffies is 0xffffffffLU on 32-bit systems and
0xffffffffffffffffLU on 64-bit systems.  If the implementation-defined
behaviour were documented, then we would know that the result of casting
either of these to long is -1L.  This is < 0, so delay gets fixed up to
+1.  Similarly with the new code -- the assignment converts the huge
unsigned values to -1.

Problems with the type mismatch might occur later, but I don't see any.
On 64-bit systems, the difference can be really huge without tripping
the (long)delay <= 0 test.  Above INT_MAX, later truncation to int in
APIs using ticks will give garbage (possibly 0 or negative).  I guess
that is the problem.  But even on 32-bit systems, the difference can
be INT_MAX and that seems too large to be correct.  If jiffies are
1/1000 seconds, then it is thewell-known magic number 24+ days.  User
code might generate that, but kernel driver code shouldn't.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170805105048.J1055>