home May 01, 2019

Chelsio Unified Wire Adapter on FreeBSD


Configure the Chelsio Offload Policy (COP), TCP Offload Engine (TOE) and a Firewall with cxgbetool

Chelsio provides some of the best line speed network adapters with complete driver support in FreeBSD. According to Chelsio, "The T6 is a highly integrated, hyper-virtualized 1/10/25/40/50/100 GbE controller with full offload support of a complete Unified Wire solution. T6 provides no-compromise performance with both low latency (sub 1µsec through hardware) and high bandwidth, limited only by the PCI bus. Furthermore, [the T6] scales to true 100 Gigabit line rate operation from a single TCP connection to thousands of connections, and allows simultaneous low latency and high bandwidth operation thanks to multiple physical channels through the ASIC."

These are some of our notes...

Chelsio On-Chip firewall

The Chelsio T5 and T6 adapter supports filtering on the card itself which we can setup as a hardware firewall similar to Pf's pass and block rules. The firewall rules can drop packets at the full line speed (i.e. 100 gigabit) on the hardware interface itself before interrupts are triggered or the packets get to FreeBSD and the Pf firewall. By offloading the packet filtering to the NIC we can save CPU time for other tasks like a webserver or the ZFS file system.

The following shell script is called, "chelsio_rules_of_engagement.sh". The script will program the Chelsio card to allow certain traffic in the external interface (port 0) and in the internal LAN interface (port 1). Note that the filter rules only filter traffic coming into the interface. Traffic going out of the interface is unfiltered.

For this example we will be allowing traffic to our web server, ssh from an external machine to the server, and some traffic from inside the LAN to be NAT'd using Pf. It is important to remember that the rules are first match, meaning the packet will match the rule with the lowest "filter" number first. This is the reason the pass rules are at the top and the drop all rules follow. Let's go over some of the rules so you can understand our logic and then create your own rules.

#!/bin/sh
set -euf

#
# Chelsio Rules of Engagement (T520-CR and T520-BT)
# 

# drop fragments
cxgbetool t5nex0 filter 0  frag 1 action drop

# pass upper ports 32768 through 65535 (16bit mask)
# (sysctl net.inet.ip.portrange.first=32768)
cxgbetool t5nex0 filter 10 action pass dport 32768:0x8000
#xgbetool t5nex0 filter 11 action pass dport 16384:0xc000
#xgbetool t5nex0 filter 12 action pass dport  8192:0xe000
#xgbetool t5nex0 filter 13 action pass dport  4096:0xf000

# pass http TCP
cxgbetool t5nex0 filter 20 iport 0 action pass dport 80 proto 6

# pass https TCP
cxgbetool t5nex0 filter 30 iport 0 action pass dport 443 proto 6

# pass https UDP
cxgbetool t5nex0 filter 40 iport 0 action pass dport 443 proto 17

# pass ssh from 1.2.3.4
cxgbetool t5nex0 filter 50 iport 0 action pass sip 1.2.3.4 dport 22 proto 6

# pass ping from 1.2.3.4
cxgbetool t5nex0 filter 60 iport 0 action pass sip 1.2.3.4 proto 1

# pass dhcp UDP
cxgbetool t5nex0 filter 70 iport 0 action pass sport 67 dport 68 proto 17

# pass ping from 10.10.10 
cxgbetool t5nex0 filter 80 iport 1 action pass sip 10.10.10.0/24 proto 1

# pass http TCP 10.10.10
cxgbetool t5nex0 filter 81 iport 1 action pass sip 10.10.10.0/24 dport 80 proto 6

# pass https TCP 10.10.10
cxgbetool t5nex0 filter 82 iport 1 action pass sip 10.10.10.0/24 dport 443 proto 6

# pass dns TCP 10.10.10
cxgbetool t5nex0 filter 83 iport 1 action pass sip 10.10.10.0/24 dport 53 proto 17

# drop all TCP  (hex=0x06 , ProtoNum=6)
cxgbetool t5nex0 filter 100 action drop proto 6

# drop all UDP  (hex=0x11 , ProtoNum=17)
cxgbetool t5nex0 filter 101 action drop proto 17

# drop all ICMP (hex=0x01 , ProtoNum=1)
cxgbetool t5nex0 filter 102 action drop proto 1

# Connection Offload Policies (COP)
#cxgbetool t5nex0 policy /root/chelsio_offload_policy

## EOF ##

Chelsio filter rule output

What does the output of the filters rules look like? Use the cxgbetool to list out the on-chip rules and also show the hit counter per rule. The hit counter will tick up every time a packet matches a rule.

$ cxgbetool t5nex0 filter list

Idx     Hits FCoE Port      vld:VLAN  Prot MPS Frag                  DIP                  SIP     DPORT     SPORT Action
   0        0  0/0  0/0 0:0000/0:0000 00/00 0/0  1/1    00000000/00000000    00000000/00000000 0000/0000 0000/0000 Drop
  10  1029025  0/0  0/0 0:0000/0:0000 00/00 0/0  0/0    00000000/00000000    00000000/00000000 8000/8000 0000/0000 Pass: Q=RSS
  20        0  0/0  0/7 0:0000/0:0000 06/ff 0/0  0/0    00000000/00000000    00000000/00000000 0050/ffff 0000/0000 Pass: Q=RSS
  30        0  0/0  0/7 0:0000/0:0000 06/ff 0/0  0/0    00000000/00000000    00000000/00000000 01bb/ffff 0000/0000 Pass: Q=RSS
  40        0  0/0  0/7 0:0000/0:0000 11/ff 0/0  0/0    00000000/00000000    00000000/00000000 01bb/ffff 0000/0000 Pass: Q=RSS
  50    11887  0/0  0/7 0:0000/0:0000 06/ff 0/0  0/0    00000000/00000000    c0a80504/ffffffff 0016/ffff 0000/0000 Pass: Q=RSS
  60       14  0/0  0/7 0:0000/0:0000 01/ff 0/0  0/0    00000000/00000000    c0a80504/ffffffff 0000/0000 0000/0000 Pass: Q=RSS
  70      286  0/0  0/7 0:0000/0:0000 11/ff 0/0  0/0    00000000/00000000    00000000/00000000 0044/ffff 0043/ffff Pass: Q=RSS
  80        0  0/0  1/7 0:0000/0:0000 01/ff 0/0  0/0    00000000/00000000    0a000a00/ffffff00 0000/0000 0000/0000 Pass: Q=RSS
  81      620  0/0  1/7 0:0000/0:0000 06/ff 0/0  0/0    00000000/00000000    0a000a00/ffffff00 0050/ffff 0000/0000 Pass: Q=RSS
  82        0  0/0  1/7 0:0000/0:0000 06/ff 0/0  0/0    00000000/00000000    0a000a00/ffffff00 01bb/ffff 0000/0000 Pass: Q=RSS
  83        3  0/0  1/7 0:0000/0:0000 11/ff 0/0  0/0    00000000/00000000    0a000a00/ffffff00 0035/ffff 0000/0000 Pass: Q=RSS
 100    82735  0/0  0/0 0:0000/0:0000 06/ff 0/0  0/0    00000000/00000000    00000000/00000000 0000/0000 0000/0000 Drop
 101     8515  0/0  0/0 0:0000/0:0000 11/ff 0/0  0/0    00000000/00000000    00000000/00000000 0000/0000 0000/0000 Drop
 102     6234  0/0  0/0 0:0000/0:0000 01/ff 0/0  0/0    00000000/00000000    00000000/00000000 0000/0000 0000/0000 Drop

HELPFUL HINT: Make sure to take a look at our FreeBSD Tuning and Optimization performance page for 1gig and 10gig networks.

Chelsio TCP Offload Engine (TOE) Testing

The TCP Offload Engine (TOE) will allow the Chelsio hardware to completely offload the entire TCP connection into hardware. A connection using TOE will use less CPU time leaving more CPU resources for applications.

We ran some basic tests using a single wget download with the default FreeBSD TCP stack and then offloading the TCP connection to the Chelsio card using the TCP Offload Engine (TOE). The result was TOE saved between 1.5x and 5x the amount of CPU time to download the same file compared to the default TCP stack. Wget was rate limited to test the CPU usage of different download speeds.

TCP TIMING: We noticed that short lived connections of less then 0.6 seconds will NOT use the Chelsio TCP Offload Engine (TOE) even if TOE is allowed universally or through Chelsio Offload Policy (COP). Not sure of the reason.

CPU usage of a single wget process downloading
a file with the FreeBSD TCP stack compared to
Chelsio TCP Offload Engine (TOE).
Lower CPU time is better.

FreeBSD 12 (wget) -> TOE or TCP -> Ubuntu 16.04 Nginx HTTP


File size : 487 MBytes
Source NIC: 1 Gbit/sec
MTU Size  : 1500 bytes

                CPU Time
             TOE       TCP
112 MB/s   0m0.370s  0m1.547s
 50 MB/s   0m0.420s  0m1.568s
 25 MB/s   0m0.363s  0m1.539s
 10 MB/s   0m0.266s  0m1.346s
  5 MB/s   0m0.247s  0m1.548s
  1 MB/s   0m0.264s  0m0.381s
500 kb/s   0m0.297s  0m0.473s


File Size : 487 MBytes
Source NIC: 10 Gbit/sec
MTU Size  : 9000 bytes

                CPU Time
             TOE       TCP
500 MB/s      -      0m0.562s
400 MB/s   0m0.191s  0m0.570s
300 MB/s   0m0.174s  0m0.681s
200 MB/s   0m0.174s  0m0.771s
100 MB/s   0m0.191s  0m0.702s
 10 MB/s   0m0.159s  0m0.801s
  5 MB/s   0m0.200s  0m0.756s
  1 MB/s   0m0.235s  0m0.431s
500 kb/s   0m0.336s  0m0.433s

Chelsio Offload Policy (COP)

The Chelsio Offload Policy (COP) manages when the TCP Offload Engine (TOE) takes affect allowing the card to only offload TCP connections which you want to offload and leave the other connection to the default FreeBSD TCP stack.

To apply the Chelsio Offload Policy (COP) use "cxgbetool t5nex0 policy /root/chelsio_offload_policy" once the chelsio_offload_policy has been configured with your offload preferences. Make sure to add hw.cxgbe.cop_managed_offloading="1" to /etc/sysctl.conf so that TOE will only be enabled for connections defined in COP.

The Chelsio Offload Policy (COP) uses the following directives to tell the card which connections to apply offload logic to.

SECURITY NOTE: The Chelsio TCP Offload Engine (TOE) will completely bypass the FreeBSD TCP stack as well as any Chelsio filter rules. This means that traffic using TOE will NOT be filtered using our Chelsio Rules of Engagement filter rules or the Pf packet filter, nor will Pf log TOE connections. Netstat will show the connections using "netstat -np tcp" though.

Here are a few examples of using Chelsio Offload Policy (COP) config file:

# TOE only outgoing TCP connections. Incoming connections
# will still use FreeBSD's TCP stack including Pf.

$ cat /root/chelsio_offload_policy
[A] all => offload


# TOE incoming connections on port 80 and 443. If you have
# a Chelsio T6 card, TLS can also be offloaded to the card,
# T4 and T5 do not support TLS offloading. 

$ cat /root/chelsio_offload_policy
[L] port 80 => offload
[L] port 443 => offload
[P] dst port 80 => offload
[P] dst port 443 => offload tls

Questions?

Can Chelsio NAT replace Pf NAT ?

No. Chelsio Network Address Translation (NAT) is stateless NAT and FreeBSD's Pf is stateful NAT. Stateless NAT will require you to define every source ip address and port mapping to every destination ip address and port. Stateful NAT, like in Pf or IPFW, does all the mapping for you.

What kernel modules are needed to start the Chelsio NIC on boot ?

To boot FreeBSD with the Chelsio T5 network card use the following directives in /boot/loader.conf . Make sure to add cop_managed_offloading when defining Chelsio Offload Policy (COP) rules.

$ cat /boot/loader.conf

# Chelsio T5 (cxl) kernel module
#
#t4fw_cfg_load="YES"
t5fw_cfg_load="YES"
#t6fw_cfg_load="YES"
if_cxgbe_load="YES"

# Chelsio Offload Policy (COP) manages TCP Offload Engine (TOE)
hw.cxgbe.cop_managed_offloading="1"

How can I enable the TCP Offload Engine (TOE) on boot ?

Use /etc/rc.local to load the TOE kernel driver on boot and then enable the Direct Data Placement (DDP) and Zero Copy sysctl variables. The following rc.local will load the kernel t4_tom kernel module, enable TOE on both the cxl0 and cxl1 interfaces, enable DDP and ZCopy as well as our chelsio_rules_of_engagement rules from the top of this page.

$ cat /etc/rc.local

if [ -z `/sbin/kldstat | /usr/bin/grep 't4_tom\.ko'` ]; then
  /sbin/kldload t4_tom && \
  /sbin/ifconfig cxl0 toe && \
  /sbin/ifconfig cxl1 toe && \
  /sbin/sysctl -q dev.t5nex.0.toe.ddp=1 >/dev/null && \
  /sbin/sysctl -q dev.t5nex.0.toe.tx_zcopy=1 >/dev/null
fi

/root/chelsio_rules_of_engagement

What is a "Cant't set DCB Priority" error ?

TLDR: Disable the Link Layer Discovery Protocol (LLDP) on the switch.

Link Layer Discovery Protocol (LLDP) offload can be used for Data Center Bridging (DCB). The Data Center Bridging Capabilities Exchange Protocol (DCBX) is used to convey the capabilities and configuration between neighbors to ensure consistent configuration across the network. If LLDP is not configured properly then the DCB negotiation will fail and the Chelsio card will show the following errors before taking the interface offline. The easiest solution is to disable LLDP on the switch when LLDP is not needed.

# dmesg 
cxgb4 0000:04:00.4: Coming up as MASTER: Initializing adapter
cxgb4 0000:04:00.4: Successfully configured using Firmware Configuration File "Firmware Default", version 0x0, computed checksum 0x0
cxgb4 0000:04:00.4 eth0: Chelsio T520-CR rev 1 1000/10GBASE-R SFP+ RNIC PCIe x8 8 GT/s MSI-X
cxgb4 0000:02:00.4 enp2s0f4: SR module inserted
csiostor 0000:02:00.6: Port:0 - LINK DOWN
cxgb4 0000:02:00.4 enp2s0f4: link up, 10Gbps, full-duplex, Tx/Rx PAUSE
csiostor 0000:02:00.6: Port:0 - LINK UP
IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0f4: link becomes ready
cxgb4 0000:02:00.4 enp2s0f4: TX Packet without VLAN Tag on DCB Link
cxgb4 0000:02:00.4 enp2s0f4: TX Packet without VLAN Tag on DCB Link
cxgb4 0000:02:00.4 enp2s0f4: TX Packet without VLAN Tag on DCB Link

# console
command 0x8 in mailbox 4 timed out
Can't set DCB Priority on port 0, TX Queue 0: err=110
Can't set DCB Priority on port 0, TX Queue 1: err=110
Can't set DCB Priority on port 0, TX Queue 2: err=110
Can't set DCB Priority on port 0, TX Queue 3: err=110
Can't set DCB Priority on port 0, TX Queue 4: err=110
Can't set DCB Priority on port 0, TX Queue 5: err=110
Can't set DCB Priority on port 0, TX Queue 6: err=110
Can't set DCB Priority on port 0, TX Queue 7: err=110