From owner-freebsd-questions@freebsd.org Fri Jun 3 04:16:48 2016 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B2E5BB6874E for ; Fri, 3 Jun 2016 04:16:48 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mail-oi0-x22a.google.com (mail-oi0-x22a.google.com [IPv6:2607:f8b0:4003:c06::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 82F921294 for ; Fri, 3 Jun 2016 04:16:48 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: by mail-oi0-x22a.google.com with SMTP id k23so109646137oih.0 for ; Thu, 02 Jun 2016 21:16:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sippysoft-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:date:message-id:subject:from:to; bh=UuUa4dXL1nc6kcqGLYHhwZ8+IbnuB6uwDNezDYUB6qM=; b=PALyFcIuB5PPzuTvm9CxO1PXGTYU+vNmxXwUppovywJPTdjlZMmzmM/yhVyi3+TRQE hGdhzBU7O45vpIBMun7yM6/TfPlBItIXD8p+qscURFANU5wRAcESAkeagjNumkcpH7nG TRG4kK9hsz5qiv5jEs3C39v7O4/hp9KRry9gajx6fuu9fjY1ql3dBEuPcuQ4vPV2t/V1 z7t+vYZeeyAAOHbHVMYa70zy3UKWdoWzsmlqs75a8zhIvBvh8RXGj4QX8XGD/FvGS6zb pCEOGzbxiLFw1F/UDUjqgfqrP9fBO09XGFIKYPqqhVPpubYoEkLp28nefXrCCtEXiTWi gmmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:date:message-id:subject:from :to; bh=UuUa4dXL1nc6kcqGLYHhwZ8+IbnuB6uwDNezDYUB6qM=; b=aLz2inqTJjktzLTz/BZwMRXJree7MjdzuXnAz8Yj6N6mupwAOUffwMVbFpyFNnc5cg YYkeCvrP1kOvahm7mzlaByJu95WZYWbaBkkl2xCNuqOzO2HbZNOq2mhS7QZvJSRrAN/2 weXxlKLVCjbRHEPET0jek7rrAkSffMiqSItpBerCuukP32oBDJbEsoi8bHax5Xa6GZ88 mu5IHYXuSzNDkf718dzUuDEBfiAMqXZiaBl9SDdehP1meap2ZtreB7TLWxPvlsNjkkO7 EZSU6ZLn4h/kSF3a9Mv8BexlyVLoANuo3i/abLisWELznAVCdtQh2L+QZIFRgXiwgc76 99dA== X-Gm-Message-State: ALyK8tIk5F55kq+V/mIk+oMxQ8oBiT0VCTTKVNiTo2d5rV5Dpu5XeOeM5G9phICnOZYpUzRFqlbKD4YYjSWBXbuB MIME-Version: 1.0 X-Received: by 10.202.241.67 with SMTP id p64mr652858oih.197.1464927407613; Thu, 02 Jun 2016 21:16:47 -0700 (PDT) Sender: sobomax@sippysoft.com Received: by 10.157.56.70 with HTTP; Thu, 2 Jun 2016 21:16:47 -0700 (PDT) Date: Thu, 2 Jun 2016 21:16:47 -0700 X-Google-Sender-Auth: _pVd8Ya48XkdEhMcg4Hhyl0Z93I Message-ID: Subject: CLOCK_MONOTONIC / CLOCK_UPTIME is not really monotonic between threads From: Maxim Sobolev To: FreeBSD Current , freebsd-threads@freebsd.org, freebsd-questions@freebsd.org, freebsd-arch@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Jun 2016 04:16:48 -0000 Hi there, we have an application here which is trying to measure UDP command/response round-trip-time. It runs two posix threads (more actually, but that's probably irrelevant), one (let's call it A) that does high-level logic and the second one (B) that does network packet I/O. The sending side is done by first thread A forming the request, then calling the function clock_gettime(CLOCK_MONOTONIC) and passing the packet into the thread B. Obtained timestamp is stored with some logical transaction ID allowing us to pull that stored value later on when response arrives. Then we have a separate process that receives those requests, processing them and sending back some form of response. Upon receiving a response from the network, the network I/O thread (B) timestamps it by running clock_gettime(CLOCK_MONOTONIC) and passes the packet data along with that value via queue to the thread A for processing. So if we put things into timeline, what our app does would probably look something like the following: 1. Thread A generates request. 2. A calls clock_gettime(CLOCK_MONOTONIC), storing value as t1 internally 3. A passes packet to thread B 4. B sends out packet via sendto() to server process running on the same box (fully separate, not a thread) [some microseconds later] 5. B receives response from server with recvfrom() 6. B instantly calls clock_gettime(CLOCK_MONOTONIC), assigns returned value to t2 7. B passes packet data along with t2 to the A via queue 8. A picks up packet, parses it and retrieves corresponding t1 stored at step 2. 9. A calculates RTT by doing t2 - t1 assuming it's going to be positive... As you might have guessed if you are still reading, from time to time t2 - t1 comes out slightly negative! Provided it's not some obscure bug in our app, there is no way this could happen if clock_gettime(CLOCK_MONOTONIC) would work as advertised. Event (2) could not possibly happen earlier than (6), which is guaranteed by the fact that the request needs to be processed by the external entity first in order for the response to be seen by our app at all. I've added some logs and it seems to be confirming that the server only sees a single request, there is no chance for the client to receive some other packet and confuse it. I've also confirmed with tcpdump, which shows reasonable time delay between request and reply of few hundreds microseconds. I've checked all logic and I could not find any mistakes on my end here, so I added some logging for such events. The distribution appears to be centered around 0.00006s, but there are some events that go as far up as 0.012s. I've also tried using CLOCK_UPTIME_PRECISE instead, but it makes no difference whatsoever. My questions therefore are: 1. Is it intended/expected behavior of the said API? 2. Has anyone else bumped into this? 3. I know we are doing some clever optimizations using TSC to speed those APIs up, could be it related to that? 4. If the answer for (3) is yes, then what is the method to disable using TSC and use slower but possibly more reliable method? 5. Is there any way/plan to fix this long-term? 6. If the behavior is intended/expected, what is the maximum discrepancy we can expect from this effect? In general some time ago we've spend quite lot of time switching our app from using REALTIME into MONOTIME in the hope to eliminate any wizardry needed to deal with realtime possibly jumping back and forth due to NTP, leap seconds etc, it's shame that this is not working either. Apart from measuring command processing delay, that app does lot of high-volume voip call billing, so even such small discrepancy can easily build up into a bigger problem, not to mention the fact that every time we deal with duration we now need to have some check in place to make sure we don't have some negative values popping our of nowhere. Any hints, suggestions, pointers are appreciated. I can also give more debug information as needed. Thanks! -Max