From owner-svn-src-projects@FreeBSD.ORG Thu Feb 10 00:05:11 2011 Return-Path: Delivered-To: svn-src-projects@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C4529106566C; Thu, 10 Feb 2011 00:05:11 +0000 (UTC) (envelope-from jeff@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c]) by mx1.freebsd.org (Postfix) with ESMTP id B04018FC16; Thu, 10 Feb 2011 00:05:11 +0000 (UTC) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.3/8.14.3) with ESMTP id p1A05BXx002416; Thu, 10 Feb 2011 00:05:11 GMT (envelope-from jeff@svn.freebsd.org) Received: (from jeff@localhost) by svn.freebsd.org (8.14.3/8.14.3/Submit) id p1A05B8w002402; Thu, 10 Feb 2011 00:05:11 GMT (envelope-from jeff@svn.freebsd.org) Message-Id: <201102100005.p1A05B8w002402@svn.freebsd.org> From: Jeff Roberson Date: Thu, 10 Feb 2011 00:05:11 +0000 (UTC) To: src-committers@freebsd.org, svn-src-projects@freebsd.org X-SVN-Group: projects MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: Subject: svn commit: r218501 - in projects/ofed/head/sys: amd64/conf conf net ofed/drivers/infiniband/ulp/ipoib ofed/drivers/infiniband/ulp/sdp ofed/include/linux X-BeenThere: svn-src-projects@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the src " projects" tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Feb 2011 00:05:11 -0000 Author: jeff Date: Thu Feb 10 00:05:11 2011 New Revision: 218501 URL: http://svn.freebsd.org/changeset/base/218501 Log: - Change ofed from a device line to an options line in the config so that we can test for it in the link layer address table to avoid the overhead when it is not compiled. - Introduce some more ofed configuration options so you don't have to manually edit headers to enable debugging etc. - Fix a bug with ipoib, when cm mode is enabled checksumming doesn't work on all cards. - Ignore the linux admin flag for cm enabled, if it's compiled in, use it when the remote host supports it. - Support transmitting mbufs with more than one sg entry in ipoib cm. - Normalize the MTU settings and document whether they include the header or not in the ipoib.h file where the variables are defined. Modified: projects/ofed/head/sys/amd64/conf/GENERIC projects/ofed/head/sys/conf/files projects/ofed/head/sys/conf/options projects/ofed/head/sys/net/if_llatbl.h projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib.h projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_ib.c projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_multicast.c projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_verbs.c projects/ofed/head/sys/ofed/drivers/infiniband/ulp/sdp/sdp.h projects/ofed/head/sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c projects/ofed/head/sys/ofed/include/linux/module.h Modified: projects/ofed/head/sys/amd64/conf/GENERIC ============================================================================== --- projects/ofed/head/sys/amd64/conf/GENERIC Thu Feb 10 00:01:50 2011 (r218500) +++ projects/ofed/head/sys/amd64/conf/GENERIC Thu Feb 10 00:05:11 2011 (r218501) @@ -59,6 +59,7 @@ options _KPOSIX_PRIORITY_SCHEDULING # P options PRINTF_BUFR_SIZE=128 # Prevent printf output being interspersed. options KBD_INSTALL_CDEV # install a CDEV entry in /dev options HWPMC_HOOKS # Necessary kernel hooks for hwpmc(4) +device hwpmc options AUDIT # Security event auditing options MAC # TrustedBSD MAC Framework options FLOWTABLE # per-cpu routing cache @@ -73,7 +74,7 @@ options GDB # Support remote GDB. options DEADLKRES # Enable the deadlock resolver options INVARIANTS # Enable calls of extra sanity checking options INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by INVARIANTS -options WITNESS # Enable checks to detect deadlocks and cycles +#options WITNESS # Enable checks to detect deadlocks and cycles options WITNESS_SKIPSPIN # Don't run witness on spinlocks for speed options ALT_BREAK_TO_DEBUGGER options MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones @@ -89,11 +90,12 @@ device acpi device pci # Infiniband Bus and drivers -device infiniband -device ipoib -device mlx4 -device mthca -device sdp +options OFED # Infiniband protocol stack and support +options SDP # Sockets Direct Protocol for infiniband +device ipoib # IP over IB devices +options IPOIB_CM # Use connect mode ipoib +device mlx4 # ConnectX cards +device mthca # Infinihost cards # Floppy drives device fdc Modified: projects/ofed/head/sys/conf/files ============================================================================== --- projects/ofed/head/sys/conf/files Thu Feb 10 00:01:50 2011 (r218500) +++ projects/ofed/head/sys/conf/files Thu Feb 10 00:05:11 2011 (r218501) @@ -2743,91 +2743,91 @@ nlm/nlm_prot_svc.c optional nfslockd | nlm/nlm_prot_xdr.c optional nfslockd | nfsd nlm/sm_inter_xdr.c optional nfslockd | nfsd -# OpenFabrics Enterprise Distribution (infiniband) -ofed/include/linux/linux_compat.c optional infiniband \ +# OpenFabrics Enterprise Distribution (Infiniband) +ofed/include/linux/linux_compat.c optional ofed \ no-depend compile-with "${OFED_C}" -ofed/include/linux/linux_idr.c optional infiniband \ +ofed/include/linux/linux_idr.c optional ofed \ no-depend compile-with "${OFED_C}" -ofed/include/linux/linux_radix.c optional infiniband \ +ofed/include/linux/linux_radix.c optional ofed \ no-depend compile-with "${OFED_C}" -ofed/drivers/infiniband/core/addr.c optional infiniband \ +ofed/drivers/infiniband/core/addr.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/agent.c optional infiniband \ +ofed/drivers/infiniband/core/agent.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/cache.c optional infiniband \ +ofed/drivers/infiniband/core/cache.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" # XXX Mad.c must be ordered before cm.c for sysinit sets to occur in # the correct order. -ofed/drivers/infiniband/core/mad.c optional infiniband \ +ofed/drivers/infiniband/core/mad.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/cm.c optional infiniband \ +ofed/drivers/infiniband/core/cm.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/cma.c optional infiniband \ +ofed/drivers/infiniband/core/cma.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/device.c optional infiniband \ +ofed/drivers/infiniband/core/device.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/fmr_pool.c optional infiniband \ +ofed/drivers/infiniband/core/fmr_pool.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/iwcm.c optional infiniband \ +ofed/drivers/infiniband/core/iwcm.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/local_sa.c optional infiniband \ +ofed/drivers/infiniband/core/local_sa.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/mad_rmpp.c optional infiniband \ +ofed/drivers/infiniband/core/mad_rmpp.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/multicast.c optional infiniband \ +ofed/drivers/infiniband/core/multicast.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/notice.c optional infiniband \ +ofed/drivers/infiniband/core/notice.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/packer.c optional infiniband \ +ofed/drivers/infiniband/core/packer.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/sa_query.c optional infiniband \ +ofed/drivers/infiniband/core/sa_query.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/smi.c optional infiniband \ +ofed/drivers/infiniband/core/smi.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/sysfs.c optional infiniband \ +ofed/drivers/infiniband/core/sysfs.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/ucm.c optional infiniband \ +ofed/drivers/infiniband/core/ucm.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/ucma.c optional infiniband \ +ofed/drivers/infiniband/core/ucma.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/ud_header.c optional infiniband \ +ofed/drivers/infiniband/core/ud_header.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/umem.c optional infiniband \ +ofed/drivers/infiniband/core/umem.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/user_mad.c optional infiniband \ +ofed/drivers/infiniband/core/user_mad.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/uverbs_cmd.c optional infiniband \ +ofed/drivers/infiniband/core/uverbs_cmd.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/uverbs_main.c optional infiniband \ +ofed/drivers/infiniband/core/uverbs_main.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/uverbs_marshall.c optional infiniband \ +ofed/drivers/infiniband/core/uverbs_marshall.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" -ofed/drivers/infiniband/core/verbs.c optional infiniband \ +ofed/drivers/infiniband/core/verbs.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" Modified: projects/ofed/head/sys/conf/options ============================================================================== --- projects/ofed/head/sys/conf/options Thu Feb 10 00:01:50 2011 (r218500) +++ projects/ofed/head/sys/conf/options Thu Feb 10 00:05:11 2011 (r218501) @@ -856,3 +856,11 @@ X86BIOS # Flattened device tree options FDT opt_platform.h FDT_DTB_STATIC opt_platform.h + +# OFED Infiniband stack +OFED opt_ofed.h +OFED_DEBUG_INIT opt_ofed.h +SDP opt_ofed.h +SDP_DEBUG opt_ofed.h +IPOIB_DEBUG opt_ofed.h +IPOIB_CM opt_ofed.h Modified: projects/ofed/head/sys/net/if_llatbl.h ============================================================================== --- projects/ofed/head/sys/net/if_llatbl.h Thu Feb 10 00:01:50 2011 (r218500) +++ projects/ofed/head/sys/net/if_llatbl.h Thu Feb 10 00:05:11 2011 (r218501) @@ -30,6 +30,8 @@ __FBSDID("$FreeBSD$"); #ifndef _NET_IF_LLATBL_H_ #define _NET_IF_LLATBL_H_ +#include "opt_ofed.h" + #include #include @@ -72,7 +74,9 @@ struct llentry { union { uint64_t mac_aligned; uint16_t mac16[3]; +#ifdef OFED uint8_t mac8[20]; /* IB needs 20 bytes. */ +#endif } ll_addr; /* XXX af-private? */ Modified: projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib.h ============================================================================== --- projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib.h Thu Feb 10 00:01:50 2011 (r218500) +++ projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib.h Thu Feb 10 00:05:11 2011 (r218501) @@ -37,6 +37,7 @@ #include "opt_inet.h" #include "opt_inet6.h" +#include "opt_ofed.h" #include #include @@ -92,7 +93,13 @@ #define INFINIBAND_ALEN 20 /* Octets in IPoIB HW addr */ #define MAX_MB_FRAGS (8192 / MCLBYTES) +#ifdef IPOIB_CM #define CONFIG_INFINIBAND_IPOIB_CM +#endif + +#ifdef IPOIB_DEBUG +#define CONFIG_INFINIBAND_IPOIB_DEBUG +#endif enum ipoib_flush_level { IPOIB_FLUSH_LIGHT, @@ -106,9 +113,7 @@ enum { IPOIB_UD_HEAD_SIZE = IB_GRH_BYTES + IPOIB_ENCAP_LEN, IPOIB_UD_RX_SG = 1, /* max buffer needed for 4K mtu */ - IPOIB_CM_MTU = (16 * 1024) - 0x14, - IPOIB_CM_BUF_SIZE = IPOIB_CM_MTU + IPOIB_ENCAP_LEN, - IPOIB_CM_HEAD_SIZE = IPOIB_CM_BUF_SIZE % PAGE_SIZE, + IPOIB_CM_MAX_MTU = MJUM16BYTES, IPOIB_CM_RX_SG = 1, /* We only allocate a single mbuf. */ IPOIB_RX_RING_SIZE = 256, IPOIB_TX_RING_SIZE = 128, @@ -129,7 +134,6 @@ enum { IPOIB_FLAG_SUBINTERFACE = 5, IPOIB_MCAST_RUN = 6, IPOIB_STOP_REAPER = 7, - IPOIB_FLAG_ADMIN_CM = 9, IPOIB_FLAG_UMCAST = 10, IPOIB_FLAG_CSUM = 11, @@ -196,11 +200,6 @@ struct ipoib_tx_buf { u64 mapping[MAX_MB_FRAGS]; }; -struct ipoib_cm_tx_buf { - struct mbuf *mb; - u64 mapping; -}; - struct ib_cm_id; struct ipoib_cm_data { @@ -258,11 +257,11 @@ struct ipoib_cm_tx { struct list_head list; struct ipoib_dev_priv *priv; struct ipoib_path *path; - struct ipoib_cm_tx_buf *tx_ring; + struct ipoib_tx_buf *tx_ring; unsigned tx_head; unsigned tx_tail; unsigned long flags; - u32 mtu; + u32 mtu; /* remote specified mtu, with grh. */ }; struct ipoib_cm_rx_buf { @@ -291,7 +290,7 @@ struct ipoib_cm_dev_priv { struct ib_sge rx_sge[IPOIB_CM_RX_SG]; struct ib_recv_wr rx_wr; int nonsrq_conn_qp; - int max_cm_mtu; + int max_cm_mtu; /* Actual buf size. */ int num_frags; }; @@ -346,9 +345,9 @@ struct ipoib_dev_priv { union ib_gid local_gid; u16 local_lid; - unsigned int admin_mtu; - unsigned int mcast_mtu; - unsigned int max_ib_mtu; + unsigned int admin_mtu; /* User selected MTU, no GRH. */ + unsigned int mcast_mtu; /* Minus GRH bytes, from mcast group. */ + unsigned int max_ib_mtu; /* Without header, actual buf size. */ struct ipoib_rx_buf *rx_ring; @@ -414,8 +413,9 @@ struct ipoib_path { int valid; }; -#define IPOIB_UD_MTU(ib_mtu) (ib_mtu - IPOIB_ENCAP_LEN) -#define IPOIB_UD_BUF_SIZE(ib_mtu) (ib_mtu + IB_GRH_BYTES) +/* UD Only transmits encap len but we want the two sizes to be symmetrical. */ +#define IPOIB_UD_MTU(ib_mtu) (ib_mtu - IB_GRH_BYTES) +#define IPOIB_CM_MTU(ib_mtu) (ib_mtu - IPOIB_ENCAP_LEN) #define IPOIB_IS_MULTICAST(addr) ((addr)[4] == 0xff) @@ -501,6 +501,8 @@ void ipoib_path_iter_read(struct ipoib_p struct ipoib_path *path); #endif +int ipoib_change_mtu(struct ipoib_dev_priv *priv, int new_mtu); + int ipoib_mcast_attach(struct ipoib_dev_priv *priv, u16 mlid, union ib_gid *mgid, int set_qkey); @@ -515,6 +517,9 @@ void ipoib_pkey_poll(struct work_struct int ipoib_pkey_dev_delay_open(struct ipoib_dev_priv *priv); void ipoib_drain_cq(struct ipoib_dev_priv *priv); +int ipoib_dma_map_tx(struct ib_device *ca, struct ipoib_tx_buf *tx_req); +void ipoib_dma_unmap_tx(struct ib_device *ca, struct ipoib_tx_buf *tx_req); + void ipoib_set_ethtool_ops(struct ifnet *dev); int ipoib_set_dev_features(struct ipoib_dev_priv *priv, struct ib_device *hca); @@ -530,14 +535,12 @@ extern int ipoib_max_conn_qp; static inline int ipoib_cm_admin_enabled(struct ipoib_dev_priv *priv) { - return IPOIB_CM_SUPPORTED(IF_LLADDR(priv->dev)) && - test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags); + return IPOIB_CM_SUPPORTED(IF_LLADDR(priv->dev)); } static inline int ipoib_cm_enabled(struct ipoib_dev_priv *priv, uint8_t *hwaddr) { - return IPOIB_CM_SUPPORTED(hwaddr) && - test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags); + return IPOIB_CM_SUPPORTED(hwaddr); } static inline int ipoib_cm_up(struct ipoib_path *path) Modified: projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c ============================================================================== --- projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c Thu Feb 10 00:01:50 2011 (r218500) +++ projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c Thu Feb 10 00:05:11 2011 (r218501) @@ -32,6 +32,8 @@ #include "ipoib.h" +#ifdef CONFIG_INFINIBAND_IPOIB_CM + #include #include #include @@ -80,7 +82,7 @@ static void ipoib_cm_dma_unmap_rx(struct u64 mapping[IPOIB_CM_RX_SG]) { - ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, DMA_FROM_DEVICE); + ib_dma_unmap_single(priv->ca, mapping[0], priv->cm.max_cm_mtu, DMA_FROM_DEVICE); } @@ -135,7 +137,7 @@ static struct mbuf *ipoib_cm_alloc_rx_mb struct mbuf *mb; int buf_size; - buf_size = IPOIB_CM_HEAD_SIZE + 12; + buf_size = priv->cm.max_cm_mtu; if (buf_size <= MCLBYTES) buf_size = MCLBYTES; else if (buf_size <= MJUMPAGESIZE) @@ -150,7 +152,7 @@ static struct mbuf *ipoib_cm_alloc_rx_mb return NULL; mapping[0] = ib_dma_map_single(priv->ca, mtod(mb, void *), - IPOIB_CM_HEAD_SIZE, DMA_FROM_DEVICE); + buf_size, DMA_FROM_DEVICE); if (unlikely(ib_dma_mapping_error(priv->ca, mapping[0]))) { m_freem(mb); return NULL; @@ -293,18 +295,13 @@ static void ipoib_cm_init_rx_wr(struct i struct ib_recv_wr *wr, struct ib_sge *sge) { - int i; - for (i = 0; i < priv->cm.num_frags; ++i) - sge[i].lkey = priv->mr->lkey; - - sge[0].length = IPOIB_CM_HEAD_SIZE; - for (i = 1; i < priv->cm.num_frags; ++i) - sge[i].length = PAGE_SIZE; + sge[0].length = priv->cm.max_cm_mtu; + sge[0].lkey = priv->mr->lkey; wr->next = NULL; wr->sg_list = sge; - wr->num_sge = priv->cm.num_frags; + wr->num_sge = 1; } static int ipoib_cm_nonsrq_init_rx(struct ipoib_dev_priv *priv, @@ -388,7 +385,7 @@ static int ipoib_cm_send_rep(struct ipoi struct ib_cm_rep_param rep = {}; data.qpn = cpu_to_be32(priv->qp->qp_num); - data.mtu = cpu_to_be32(IPOIB_CM_BUF_SIZE); + data.mtu = cpu_to_be32(priv->cm.max_cm_mtu); rep.private_data = &data; rep.private_data_len = sizeof data; @@ -484,7 +481,7 @@ static int ipoib_cm_rx_handler(struct ib } } /* Adjust length of mb with fragments to match received data */ -static void mb_put_frags(struct mbuf *mb, unsigned int hdr_space, +static void mb_put_frags(struct mbuf *mb, unsigned int length, struct mbuf *tomb) { @@ -569,7 +566,7 @@ void ipoib_cm_handle_rx_wc(struct ipoib_ ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n", wc->byte_len, wc->slid); - mb_put_frags(mb, IPOIB_CM_HEAD_SIZE, wc->byte_len, newmb); + mb_put_frags(mb, wc->byte_len, newmb); ++dev->if_opackets; dev->if_obytes += mb->m_pkthdr.len; @@ -577,6 +574,9 @@ void ipoib_cm_handle_rx_wc(struct ipoib_ mb->m_pkthdr.rcvif = dev; proto = *mtod(mb, uint16_t *); m_adj(mb, IPOIB_ENCAP_LEN); + if (test_bit(IPOIB_FLAG_CSUM, &priv->flags) && likely(wc->csum_ok)) + mb->m_pkthdr.csum_flags = CSUM_IP_CHECKED | CSUM_IP_VALID; + IPOIB_MTAP_PROTO(dev, mb, proto); spin_unlock(&priv->lock); ipoib_demux(dev, mb, ntohs(proto)); @@ -601,38 +601,44 @@ repost: static inline int post_send(struct ipoib_dev_priv *priv, struct ipoib_cm_tx *tx, - unsigned int wr_id, - u64 addr, int len) + struct ipoib_tx_buf *tx_req, + unsigned int wr_id) { struct ib_send_wr *bad_wr; + struct mbuf *mb = tx_req->mb; + u64 *mapping = tx_req->mapping; + struct mbuf *m; + int i; - priv->tx_sge[0].addr = addr; - priv->tx_sge[0].length = len; - - priv->tx_wr.num_sge = 1; - priv->tx_wr.wr_id = wr_id | IPOIB_OP_CM; + for (m = mb, i = 0; m != NULL; m = m->m_next, i++) { + priv->tx_sge[i].addr = mapping[i]; + priv->tx_sge[i].length = m->m_len; + } + priv->tx_wr.num_sge = i; + priv->tx_wr.wr_id = wr_id | IPOIB_OP_CM; + priv->tx_wr.opcode = IB_WR_SEND; return ib_post_send(tx->qp, &priv->tx_wr, &bad_wr); } void ipoib_cm_send(struct ipoib_dev_priv *priv, struct mbuf *mb, struct ipoib_cm_tx *tx) { - struct ipoib_cm_tx_buf *tx_req; + struct ipoib_tx_buf *tx_req; struct ifnet *dev = priv->dev; - u64 addr; - m_adj(mb, INFINIBAND_ALEN); - if (unlikely(mb->m_pkthdr.len > tx->mtu)) { + m_adj(mb, sizeof(struct ipoib_pseudoheader)); + if (unlikely(mb->m_pkthdr.len > IPOIB_CM_MTU(tx->mtu))) { ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n", mb->m_pkthdr.len, tx->mtu); ++dev->if_oerrors; - ipoib_cm_mb_too_long(priv, mb, tx->mtu - IPOIB_ENCAP_LEN); + ipoib_cm_mb_too_long(priv, mb, IPOIB_CM_MTU(tx->mtu)); return; } ipoib_dbg_data(priv, "sending packet: head 0x%x length %d connection 0x%x\n", tx->tx_head, mb->m_pkthdr.len, tx->qp->qp_num); + /* * We put the mb into the tx_ring _before_ we call post_send() * because it's entirely possible that the completion handler will @@ -642,21 +648,22 @@ void ipoib_cm_send(struct ipoib_dev_priv */ tx_req = &tx->tx_ring[tx->tx_head & (ipoib_sendq_size - 1)]; tx_req->mb = mb; - addr = ib_dma_map_single(priv->ca, mtod(mb, void *), mb->m_pkthdr.len, - DMA_TO_DEVICE); - if (unlikely(ib_dma_mapping_error(priv->ca, addr))) { + if (unlikely(ipoib_dma_map_tx(priv->ca, tx_req))) { ++dev->if_oerrors; - m_freem(mb); + if (tx_req->mb) + m_freem(tx_req->mb); return; } - tx_req->mapping = addr; + if (mb->m_pkthdr.csum_flags & (CSUM_IP|CSUM_TCP|CSUM_UDP)) + priv->tx_wr.send_flags |= IB_SEND_IP_CSUM; + else + priv->tx_wr.send_flags &= ~IB_SEND_IP_CSUM; - if (unlikely(post_send(priv, tx, tx->tx_head & (ipoib_sendq_size - 1), - addr, mb->m_pkthdr.len))) { + if (unlikely(post_send(priv, tx, tx_req, tx->tx_head & (ipoib_sendq_size - 1)))) { ipoib_warn(priv, "post_send failed\n"); ++dev->if_oerrors; - ib_dma_unmap_single(priv->ca, addr, mb->m_pkthdr.len, DMA_TO_DEVICE); + ipoib_dma_unmap_tx(priv->ca, tx_req); m_freem(mb); } else { ++tx->tx_head; @@ -676,7 +683,7 @@ void ipoib_cm_handle_tx_wc(struct ipoib_ struct ipoib_cm_tx *tx = wc->qp->qp_context; unsigned int wr_id = wc->wr_id & ~IPOIB_OP_CM; struct ifnet *dev = priv->dev; - struct ipoib_cm_tx_buf *tx_req; + struct ipoib_tx_buf *tx_req; unsigned long flags; ipoib_dbg_data(priv, "cm send completion: id %d, status: %d\n", @@ -690,7 +697,7 @@ void ipoib_cm_handle_tx_wc(struct ipoib_ tx_req = &tx->tx_ring[wr_id]; - ib_dma_unmap_single(priv->ca, tx_req->mapping, tx_req->mb->m_pkthdr.len, DMA_TO_DEVICE); + ipoib_dma_unmap_tx(priv->ca, tx_req); /* FIXME: is this right? Shouldn't we only increment on success? */ ++dev->if_opackets; @@ -720,7 +727,6 @@ void ipoib_cm_handle_tx_wc(struct ipoib_ tx->path = NULL; rb_erase(&path->rb_node, &priv->path_tree); list_del(&path->list); - ipoib_path_free(priv, path); } if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) { @@ -731,6 +737,8 @@ void ipoib_cm_handle_tx_wc(struct ipoib_ clear_bit(IPOIB_FLAG_OPER_UP, &tx->flags); spin_unlock_irqrestore(&priv->lock, flags); + if (path) + ipoib_path_free(priv, path); } } @@ -932,7 +940,7 @@ static struct ib_qp *ipoib_cm_create_tx_ .recv_cq = priv->recv_cq, .srq = priv->cm.srq, .cap.max_send_wr = ipoib_sendq_size, - .cap.max_send_sge = 1, + .cap.max_send_sge = MAX_MB_FRAGS, .sq_sig_type = IB_SIGNAL_ALL_WR, .qp_type = IB_QPT_RC, .qp_context = tx @@ -952,7 +960,7 @@ static int ipoib_cm_send_req(struct ipoi ipoib_dbg(priv, "cm send req\n"); data.qpn = cpu_to_be32(priv->qp->qp_num); - data.mtu = cpu_to_be32(IPOIB_CM_BUF_SIZE); + data.mtu = cpu_to_be32(priv->cm.max_cm_mtu); req.primary_path = pathrec; req.alternate_path = NULL; @@ -1065,7 +1073,7 @@ static void ipoib_cm_tx_destroy(struct i { struct ipoib_dev_priv *priv = p->priv; struct ifnet *dev = priv->dev; - struct ipoib_cm_tx_buf *tx_req; + struct ipoib_tx_buf *tx_req; unsigned long begin; ipoib_dbg(priv, "Destroy active connection 0x%x head 0x%x tail 0x%x\n", @@ -1092,8 +1100,7 @@ timeout: while ((int) p->tx_tail - (int) p->tx_head < 0) { tx_req = &p->tx_ring[p->tx_tail & (ipoib_sendq_size - 1)]; - ib_dma_unmap_single(priv->ca, tx_req->mapping, tx_req->mb->m_pkthdr.len, - DMA_TO_DEVICE); + ipoib_dma_unmap_tx(priv->ca, tx_req); m_freem(tx_req->mb); ++p->tx_tail; if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) && @@ -1142,7 +1149,6 @@ static int ipoib_cm_tx_handler(struct ib tx->path = NULL; rb_erase(&path->rb_node, &priv->path_tree); list_del(&path->list); - ipoib_path_free(tx->priv, path); } if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) { @@ -1151,6 +1157,8 @@ static int ipoib_cm_tx_handler(struct ib } spin_unlock_irqrestore(&priv->lock, flags); + if (path) + ipoib_path_free(tx->priv, path); break; default: break; @@ -1182,7 +1190,9 @@ void ipoib_cm_destroy_tx(struct ipoib_cm { struct ipoib_dev_priv *priv = tx->priv; if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) { + spin_lock(&priv->lock); list_move(&tx->list, &priv->cm.reap_list); + spin_unlock(&priv->lock); queue_work(ipoib_workqueue, &priv->cm.reap_task); ipoib_dbg(priv, "Reap connection for gid %pI6\n", tx->path->pathrec.dgid.raw); @@ -1292,12 +1302,6 @@ ipoib_cm_mb_too_long(struct ipoib_dev_pr { int e = priv->cm.mb_queue.ifq_len; -/* XXX */ -#if 0 - if (mb->dst) - mb->dst->ops->update_pmtu(mb->dst, mtu); -#endif - IF_ENQUEUE(&priv->cm.mb_queue, mb); if (e == 0) queue_work(ipoib_workqueue, &priv->cm.mb_task); @@ -1403,13 +1407,12 @@ int ipoib_cm_dev_init(struct ipoib_dev_p attr.max_srq_sge = min_t(int, IPOIB_CM_RX_SG, attr.max_srq_sge); ipoib_cm_create_srq(priv, attr.max_srq_sge); if (ipoib_cm_has_srq(priv)) { - - priv->cm.max_cm_mtu = attr.max_srq_sge * PAGE_SIZE - 0x10; + priv->cm.max_cm_mtu = attr.max_srq_sge * MJUM16BYTES; priv->cm.num_frags = attr.max_srq_sge; ipoib_dbg(priv, "max_cm_mtu = 0x%x, num_frags=%d\n", priv->cm.max_cm_mtu, priv->cm.num_frags); } else { - priv->cm.max_cm_mtu = IPOIB_CM_MTU; + priv->cm.max_cm_mtu = IPOIB_CM_MAX_MTU; priv->cm.num_frags = IPOIB_CM_RX_SG; } @@ -1460,3 +1463,5 @@ void ipoib_cm_dev_cleanup(struct ipoib_d mtx_destroy(&priv->cm.mb_queue.ifq_mtx); } + +#endif /* CONFIG_INFINIBAND_IPOIB_CM */ Modified: projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_ib.c ============================================================================== --- projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_ib.c Thu Feb 10 00:01:50 2011 (r218500) +++ projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_ib.c Thu Feb 10 00:05:11 2011 (r218501) @@ -90,8 +90,7 @@ void ipoib_free_ah(struct kref *kref) static void ipoib_ud_dma_unmap_rx(struct ipoib_dev_priv *priv, u64 mapping[IPOIB_UD_RX_SG]) { - ib_dma_unmap_single(priv->ca, mapping[0], - IPOIB_UD_BUF_SIZE(priv->max_ib_mtu), + ib_dma_unmap_single(priv->ca, mapping[0], priv->max_ib_mtu, DMA_FROM_DEVICE); } @@ -132,7 +131,7 @@ static struct mbuf *ipoib_alloc_rx_mb(st /* * XXX Should be calculated once and cached. */ - buf_size = IPOIB_UD_BUF_SIZE(priv->max_ib_mtu); + buf_size = priv->max_ib_mtu; if (buf_size <= MCLBYTES) buf_size = MCLBYTES; else if (buf_size <= MJUMPAGESIZE) @@ -254,8 +253,7 @@ repost: "for buf %d\n", wr_id); } -static int ipoib_dma_map_tx(struct ib_device *ca, - struct ipoib_tx_buf *tx_req) +int ipoib_dma_map_tx(struct ib_device *ca, struct ipoib_tx_buf *tx_req) { struct mbuf *mb = tx_req->mb; u64 *mapping = tx_req->mapping; @@ -293,8 +291,7 @@ static int ipoib_dma_map_tx(struct ib_de return error; } -static void ipoib_dma_unmap_tx(struct ib_device *ca, - struct ipoib_tx_buf *tx_req) +void ipoib_dma_unmap_tx(struct ib_device *ca, struct ipoib_tx_buf *tx_req) { struct mbuf *mb = tx_req->mb; u64 *mapping = tx_req->mapping; @@ -413,11 +410,10 @@ void ipoib_send_comp_handler(struct ib_c mod_timer(&priv->poll_timer, jiffies); } -static inline int post_send(struct ipoib_dev_priv *priv, - unsigned int wr_id, - struct ib_ah *address, u32 qpn, - struct ipoib_tx_buf *tx_req, - void *head, int hlen) +static inline int +post_send(struct ipoib_dev_priv *priv, unsigned int wr_id, + struct ib_ah *address, u32 qpn, struct ipoib_tx_buf *tx_req, void *head, + int hlen) { struct ib_send_wr *bad_wr; struct mbuf *mb = tx_req->mb; @@ -466,9 +462,9 @@ ipoib_send(struct ipoib_dev_priv *priv, } m_adj(mb, hlen); } else { - if (unlikely(mb->m_pkthdr.len > priv->mcast_mtu + IPOIB_ENCAP_LEN)) { + if (unlikely(mb->m_pkthdr.len > priv->mcast_mtu)) { ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n", - mb->m_pkthdr.len, priv->mcast_mtu + IPOIB_ENCAP_LEN); + mb->m_pkthdr.len, priv->mcast_mtu); ++dev->if_oerrors; ipoib_cm_mb_too_long(priv, mb, priv->mcast_mtu); return; @@ -508,8 +504,9 @@ ipoib_send(struct ipoib_dev_priv *priv, dev->if_drv_flags |= IFF_DRV_OACTIVE; } - if (unlikely(post_send(priv, priv->tx_head & (ipoib_sendq_size - 1), - address->ah, qpn, tx_req, phead, hlen))) { + if (unlikely(post_send(priv, + priv->tx_head & (ipoib_sendq_size - 1), address->ah, qpn, + tx_req, phead, hlen))) { ipoib_warn(priv, "post_send failed\n"); ++dev->if_oerrors; --priv->tx_outstanding; @@ -518,8 +515,6 @@ ipoib_send(struct ipoib_dev_priv *priv, if (dev->if_drv_flags & IFF_DRV_OACTIVE) dev->if_drv_flags &= ~IFF_DRV_OACTIVE; } else { - /* dev->trans_start = jiffies; */ - address->last_send = priv->tx_head; ++priv->tx_head; } Modified: projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c ============================================================================== --- projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c Thu Feb 10 00:01:50 2011 (r218500) +++ projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c Thu Feb 10 00:05:11 2011 (r218501) @@ -221,14 +221,14 @@ ipoib_stop(struct ipoib_dev_priv *priv) return 0; } -static int +int ipoib_change_mtu(struct ipoib_dev_priv *priv, int new_mtu) { struct ifnet *dev = priv->dev; /* dev->if_mtu > 2K ==> connected mode */ if (ipoib_cm_admin_enabled(priv)) { - if (new_mtu > ipoib_cm_max_mtu(priv)) + if (new_mtu > IPOIB_CM_MTU(ipoib_cm_max_mtu(priv))) return -EINVAL; if (new_mtu > priv->mcast_mtu) @@ -596,7 +596,7 @@ path_rec_start(struct ipoib_dev_priv *pr p_rec = path->pathrec; p_rec.mtu_selector = IB_SA_GT; - switch (roundup_pow_of_two(dev->if_mtu + IPOIB_ENCAP_LEN)) { + switch (roundup_pow_of_two(dev->if_mtu + IB_GRH_BYTES)) { case 512: p_rec.mtu = IB_MTU_256; break; @@ -824,45 +824,6 @@ ipoib_dev_cleanup(struct ipoib_dev_priv priv->tx_ring = NULL; } -#if 0 -static int get_mb_hdr(struct mbuf *mb, void **iphdr, - void **tcph, u64 *hdr_flags, void *priv) -{ - unsigned int ip_len; - struct iphdr *iph; - - if (unlikely(mb->protocol != htons(ETH_P_IP))) - return -1; - - /* - * In the future we may add an else clause that verifies the - * checksum and allows devices which do not calculate checksum - * to use LRO. - */ - if (unlikely(mb->ip_summed != CHECKSUM_UNNECESSARY)) - return -1; - - /* Check for non-TCP packet */ - mb_reset_network_header(mb); - iph = ip_hdr(mb); - if (iph->protocol != IPPROTO_TCP) - return -1; - - ip_len = ip_hdrlen(mb); - mb_set_transport_header(mb, ip_len); - *tcph = tcp_hdr(mb); - - /* check if IP header and TCP header are complete */ - if (ntohs(iph->tot_len) < ip_len + tcp_hdrlen(mb)) - return -1; - - *hdr_flags = LRO_IPV4 | LRO_TCP; - *iphdr = iph; - - return 0; -} -#endif - static volatile int ipoib_unit; static struct ipoib_dev_priv * @@ -955,13 +916,13 @@ ipoib_set_dev_features(struct ipoib_dev_ priv->dev->if_hwassist = 0; priv->dev->if_capabilities = 0; +#ifndef CONFIG_INFINIBAND_IPOIB_CM if (priv->hca_caps & IB_DEVICE_UD_IP_CSUM) { set_bit(IPOIB_FLAG_CSUM, &priv->flags); priv->dev->if_hwassist = CSUM_IP | CSUM_TCP | CSUM_UDP; priv->dev->if_capabilities = IFCAP_HWCSUM | IFCAP_VLAN_HWCSUM; } -#if 0 if (priv->dev->features & NETIF_F_SG && priv->hca_caps & IB_DEVICE_UD_TSO) priv->dev->if_capabilities |= IFCAP_TSO4 | CSUM_TSO; #endif @@ -993,8 +954,8 @@ ipoib_add_port(const char *format, struc } /* MTU will be reset when mcast join happens */ - priv->dev->if_mtu = IPOIB_UD_MTU(priv->max_ib_mtu); - priv->mcast_mtu = priv->admin_mtu = priv->dev->if_mtu; + priv->dev->if_mtu = IPOIB_UD_MTU(priv->max_ib_mtu); + priv->mcast_mtu = priv->admin_mtu = priv->dev->if_mtu; result = ib_query_pkey(hca, port, 0, &priv->pkey); if (result) { Modified: projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_multicast.c ============================================================================== --- projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_multicast.c Thu Feb 10 00:01:50 2011 (r218500) +++ projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_multicast.c Thu Feb 10 00:05:11 2011 (r218501) @@ -550,12 +550,8 @@ void ipoib_mcast_join_task(struct work_s priv->mcast_mtu = priv->admin_mtu; spin_unlock_irq(&priv->lock); - if (!ipoib_cm_admin_enabled(priv)) { - /* - * dev_set_mtu(dev, min(priv->mcast_mtu, priv->admin_mtu)); - * XXX - */ - } + if (!ipoib_cm_admin_enabled(priv)) + ipoib_change_mtu(priv, min(priv->mcast_mtu, priv->admin_mtu)); ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n"); Modified: projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_verbs.c ============================================================================== --- projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_verbs.c Thu Feb 10 00:01:50 2011 (r218500) +++ projects/ofed/head/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_verbs.c Thu Feb 10 00:05:11 2011 (r218501) @@ -222,15 +222,8 @@ int ipoib_transport_dev_init(struct ipoi priv->tx_wr.send_flags = IB_SEND_SIGNALED; priv->rx_sge[0].lkey = priv->mr->lkey; - if (0 /* XXX ipoib_ud_need_sg(priv->max_ib_mtu)*/) { - priv->rx_sge[0].length = IPOIB_UD_HEAD_SIZE; - priv->rx_sge[1].length = PAGE_SIZE; - priv->rx_sge[1].lkey = priv->mr->lkey; - priv->rx_wr.num_sge = IPOIB_UD_RX_SG; - } else { - priv->rx_sge[0].length = IPOIB_UD_BUF_SIZE(priv->max_ib_mtu); - priv->rx_wr.num_sge = 1; - } + priv->rx_sge[0].length = priv->max_ib_mtu; + priv->rx_wr.num_sge = 1; priv->rx_wr.next = NULL; priv->rx_wr.sg_list = priv->rx_sge; Modified: projects/ofed/head/sys/ofed/drivers/infiniband/ulp/sdp/sdp.h ============================================================================== --- projects/ofed/head/sys/ofed/drivers/infiniband/ulp/sdp/sdp.h Thu Feb 10 00:01:50 2011 (r218500) +++ projects/ofed/head/sys/ofed/drivers/infiniband/ulp/sdp/sdp.h Thu Feb 10 00:05:11 2011 (r218501) @@ -3,6 +3,7 @@ #include "opt_ddb.h" #include "opt_inet.h" +#include "opt_ofed.h" #include #include @@ -51,10 +52,9 @@ #include #include -#define CONFIG_INFINIBAND_SDP_DEBUG 1 -#define CONFIG_INFINIBAND_SDP_DEBUG_DATA 1 - -#define SDP_DEBUG +#ifdef SDP_DEBUG +#define CONFIG_INFINIBAND_SDP_DEBUG +#endif #include "sdp_dbg.h" Modified: projects/ofed/head/sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c ============================================================================== --- projects/ofed/head/sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c Thu Feb 10 00:01:50 2011 (r218500) +++ projects/ofed/head/sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c Thu Feb 10 00:05:11 2011 (r218501) @@ -1957,5 +1957,5 @@ struct domain sdpdomain = { DOMAIN_SET(sdp); -int sdp_debug_level = 0; +int sdp_debug_level = 1; int sdp_data_debug_level = 0; Modified: projects/ofed/head/sys/ofed/include/linux/module.h ============================================================================== --- projects/ofed/head/sys/ofed/include/linux/module.h Thu Feb 10 00:01:50 2011 (r218500) +++ projects/ofed/head/sys/ofed/include/linux/module.h Thu Feb 10 00:05:11 2011 (r218501) @@ -49,9 +49,10 @@ static inline void _module_run(void *arg) { + void (*fn)(void); +#ifdef OFED_DEBUG_INIT char name[1024]; caddr_t pc; - void (*fn)(void); long offset; pc = (caddr_t)arg; @@ -59,7 +60,7 @@ _module_run(void *arg) printf("Running ??? (%p)\n", pc); else printf("Running %s (%p)\n", name, pc); - +#endif fn = arg; DROP_GIANT(); fn();