From owner-freebsd-net@FreeBSD.ORG Sat May 4 02:01:20 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6CB376E2 for ; Sat, 4 May 2013 02:01:20 +0000 (UTC) (envelope-from realrichardsharpe@gmail.com) Received: from mail-wg0-x22b.google.com (mail-wg0-x22b.google.com [IPv6:2a00:1450:400c:c00::22b]) by mx1.freebsd.org (Postfix) with ESMTP id 04D3A1F04 for ; Sat, 4 May 2013 02:01:19 +0000 (UTC) Received: by mail-wg0-f43.google.com with SMTP id c11so2066170wgh.34 for ; Fri, 03 May 2013 19:01:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:content-transfer-encoding; bh=qewzZ7VUpLRErM+qVz4bFgS9MaYGYBv2Y6ZFuXtkSf0=; b=CVzEVL7PbiippmQLNhvbGX1jZTaQTFcdkGXcRbqkVnjjwiB64bgxOnArhA7HKoMkUp Tq6+C+JX5KQ4upcgFnnx+l8f1qia5E8hLsqu7PM7zPeMNacTa3UaR+M15q75pg//pBOT pboczX0qtpMZq+iY9DBaeccKpSyPq9ARFgQ+pNge678cflTj9SAz2qGhMRf1i3KOLM0q oK/C39rTG74AieyWOlPaAdNmw3LdZUqZ3bvk62qW3WHpKvfeG4g2kUEr+92ZQIJAL+Oh xVZMsfu1AukRQFozUdArY9EGlgEVb5xy4qLBQxYf2UZHTK20KSgJKVugM48iTl/oKp5P l7pQ== MIME-Version: 1.0 X-Received: by 10.180.90.203 with SMTP id by11mr524821wib.10.1367632878993; Fri, 03 May 2013 19:01:18 -0700 (PDT) Received: by 10.194.179.194 with HTTP; Fri, 3 May 2013 19:01:18 -0700 (PDT) In-Reply-To: References: <5181ECDF.1040905@mu.org> <51827DAA.2020009@vangyzen.net> <5183CC06.9020806@vangyzen.net> Date: Fri, 3 May 2013 19:01:18 -0700 Message-ID: Subject: Re: Seeing EINVAL from writev on 8.0 to a non-blocking socket even though the data seems to hit the wire From: Richard Sharpe To: Eric van Gyzen Content-Type: text/plain; charset=Big5 Content-Transfer-Encoding: quoted-printable Cc: freebsd-net@freebsd.org, Alfred Perlstein X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 May 2013 02:01:20 -0000 On Fri, May 3, 2013 at 10:18 AM, Richard Sharpe wrote: > On Fri, May 3, 2013 at 7:39 AM, Eric van Gyzen wrote: >> On 05/02/2013 19:00, Richard Sharpe wrote: >>> On Thu, May 2, 2013 at 7:52 AM, Eric van Gyzen wrot= e: >>>> On 05/02/2013 08:48, Richard Sharpe wrote: >>>>> On Wed, May 1, 2013 at 9:34 PM, Alfred Perlstein wrot= e: >>>>>> On 5/1/13 8:03 PM, Richard Sharpe wrote: >>>>>>> Hi folks, >>>>>>> >>>>>>> I am checking to see if there are any known bugs with respect to th= is >>>>>>> in FreeBSD 8.0. >>>>>>> >>>>>>> Situation is that Samba 3.6.6 uses writev to a non-blocking socket = to >>>>>>> get the SMB2 requests on the wire. >>>>>>> >>>>>>> Intermittently, we see the writev return EINVAL even though the dat= a >>>>>>> has gotten on the wire. This I have verified by grabbing a capture = and >>>>>>> comparing the SMB Sequence number in the last outgoing packet on th= e >>>>>>> wire vs the in-memory contents when we get EINVAL. >>>>>>> >>>>>>> Sometimes it occurs on a four-element IOVEC, sometimes we get EAGAI= N >>>>>>> on the four-element IOVEC and then we get EINVAL when retrying on a >>>>>>> smaller IOVEC. >>>>>>> >>>>>>> Where should I look to check if there is some path where this might= be >>>>>>> happening? Is this even the correct mailing list? >>>>>>> >>>>>> What does the iovec look like when you get EINVAL? Can you sanity ch= eck >>>>>> it? Is there anything special about it? (zero length vecs?) >>>>>> >>>>>> I think there are a few "maxvals" that if overrun cause EINVAL to be >>>>>> returned. example is if your iovec is somehow huge or has many, many >>>>>> elements. >>>>> Can anyone tell me the call graph down to the TCP code? >>>>> >>>> writev kern/sys_generic.c >>>> kern_writev >>>> dofilewrite >>>> fo_write in sys/file.h >>>> soo_write in kern/sys_socket.c >>>> sosend in kern/uipc_socket.c >>>> sosend_generic >>>> tcp_usr_send in netinet/tcp_usrreq.c >>> Is there a tool that generates call graphs? >> >> I'm not aware of one that works in the kernel--other than the kernel >> itself, of course. With DDB compiled in, you could set a breakpoint on, >> say, tcp_output, and show the call stack with bt. >> >> Also, take a look at stack(9). >> >>> I have been able to demonstrate that I am getting EINVAL returned from >>> writev kern/sys_generic.c, kern_writev, dofilewrite and soo_write, >>> but when I add printfs to sosend/sosend_generic it becomes very hard >>> to provoke this problem. >> >> So, either relocating code or changing the timing has changed the >> behavior--a Heisenbug. >> >> If your code looks like this: >> >> if (error =3D=3D EINVAL) >> printf("you are here\n"); >> >> You might add __predict_false, like this: >> >> if (__predict_false(error =3D=3D EINVAL)) >> printf("you are here\n"); >> >> That /might/ reduce the impact on runtime behavior. > > Thanks for that. The problem does not appear to be in the TCP or IP > layers. Rather, it appears to be in the ixgbe driver. > > The problem takes a little more effort to provoke, but simple printfs > are doing the job so far. The version of the ixgbe driver we are using seems to set the max size of a dma element to 65535 (IXGBE_TSO_SIZE) and, even though large numbers of iovecs are sent where the last element is 65536 bytes in size, sometimes this causes EINVAL to be returned ... --=20 Regards, Richard Sharpe (=A6=F3=A5H=B8=D1=BC~=A1H=B0=DF=A6=B3=A7=F9=B1d=A1C--=B1=E4=BE=DE)