TCP Segmentation Offloading
   HOME

TheInfoList



OR:

TCP offload engine (TOE) is a technology used in some network interface cards (NIC) to offload processing of the entire
TCP/IP The Internet protocol suite, commonly known as TCP/IP, is a framework for organizing the set of communication protocols used in the Internet and similar computer networks according to functional criteria. The foundational protocols in the suit ...
stack to the network controller. It is primarily used with high-speed network interfaces, such as
gigabit Ethernet In computer networking, Gigabit Ethernet (GbE or 1 GigE) is the term applied to transmitting Ethernet frames at a rate of a gigabit per second. The most popular variant, 1000BASE-T, is defined by the IEEE 802.3ab standard. It came into use ...
and
10 Gigabit Ethernet 10 Gigabit Ethernet (10GE, 10GbE, or 10 GigE) is a group of computer networking technologies for transmitting Ethernet frames at a rate of 10  gigabits per second. It was first defined by the IEEE 802.3ae-2002 standard. Unlike previous ...
, where processing overhead of the network stack becomes significant. TOEs are often used as a way to reduce the overhead associated with
Internet Protocol The Internet Protocol (IP) is the network layer communications protocol in the Internet protocol suite for relaying datagrams across network boundaries. Its routing function enables internetworking, and essentially establishes the Internet. ...
(IP) storage protocols such as
iSCSI Internet Small Computer Systems Interface or iSCSI ( ) is an Internet Protocol-based storage networking standard for linking data storage facilities. iSCSI provides block-level access to storage devices by carrying SCSI commands over a TCP/IP ...
and
Network File System Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystems (Sun) in 1984, allowing a user on a client computer to access files over a computer network much like local storage is accessed. NFS, lik ...
(NFS).


Purpose

Originally TCP was designed for unreliable low speed networks (such as early dial-up
modem A modulator-demodulator or modem is a computer hardware device that converts data from a digital format into a format suitable for an analog transmission medium such as telephone or radio. A modem transmits data by Modulation#Digital modulati ...
s) but with the growth of the Internet in terms of
backbone The backbone is the vertebral column of a vertebrate. Arts, entertainment, and media Film * ''Backbone'' (1923 film), a 1923 lost silent film starring Alfred Lunt * ''Backbone'' (1975 film), a 1975 Yugoslavian drama directed by Vlatko Gilić ...
transmission speeds (using
Optical Carrier Synchronous optical networking (SONET) and synchronous digital hierarchy (SDH) are standardized protocols that transfer multiple digital bit streams synchronously over optical fiber using lasers or highly coherent light from light-emitting diodes ...
,
Gigabit Ethernet In computer networking, Gigabit Ethernet (GbE or 1 GigE) is the term applied to transmitting Ethernet frames at a rate of a gigabit per second. The most popular variant, 1000BASE-T, is defined by the IEEE 802.3ab standard. It came into use ...
and
10 Gigabit Ethernet 10 Gigabit Ethernet (10GE, 10GbE, or 10 GigE) is a group of computer networking technologies for transmitting Ethernet frames at a rate of 10  gigabits per second. It was first defined by the IEEE 802.3ae-2002 standard. Unlike previous ...
links) and faster and more reliable access mechanisms (such as
DSL Digital subscriber line (DSL; originally digital subscriber loop) is a family of technologies that are used to transmit digital data over telephone lines. In telecommunications marketing, the term DSL is widely understood to mean asymmetric dig ...
and cable modems) it is frequently used in data centers and desktop PC environments at speeds of over 1 Gigabit per second. At these speeds the TCP software implementations on host systems require significant computing power. In the early 2000s, full-duplex gigabit TCP communication could consume more than 80% of a 2.4 GHz
Pentium 4 Pentium 4 is a series of single-core CPUs for desktops, laptops and entry-level servers manufactured by Intel. The processors were shipped from November 20, 2000 until August 8, 2008. The production of Netburst processors was active from 200 ...
processor, resulting in small or no processing resources left for the applications to run on the system. TCP is a connection-oriented protocol which adds complexity and processing overhead. These aspects include: * Connection establishment using the "3-way handshake" (SYNchronize; SYNchronize-ACKnowledge; ACKnowledge). * Acknowledgment of packets as they are received by the far end, adding to the message flow between the endpoints and thus the protocol load. *
Checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data ...
and sequence number calculations - again a burden on a general purpose CPU to perform. *
Sliding window A sliding window protocol is a feature of packet-based data transmission protocols. Sliding window protocols are used where reliable in-order delivery of packets is required, such as in the data link layer ( OSI layer 2) as well as in the Tran ...
calculations for packet acknowledgement and
congestion control Network congestion in data networking and queueing theory is the reduced quality of service that occurs when a network node or link is carrying more data than it can handle. Typical effects include queueing delay, packet loss or the blocking of ...
. * Connection termination. Moving some or all of these functions to dedicated hardware, a TCP offload engine, frees the system's main CPU for other tasks. As of 2012, very few consumer network interface cards support TOE.


Freed-up CPU cycles

A generally accepted rule of thumb is that 1 Hertz of CPU processing is required to send or receive of TCP/IP. For example, 5 Gbit/s (625 MB/s) of network traffic requires 5 GHz of CPU processing. This implies that 2 entire cores of a 2.5 GHz
multi-core processor A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such ...
will be required to handle the TCP/IP processing associated with 5 Gbit/s of TCP/IP traffic. Since Ethernet (10GE in this example) is bidirectional, it is possible to send and receive 10 Gbit/s (for an aggregate throughput of 20 Gbit/s). Using the 1 Hz/(bit/s) rule this equates to eight 2.5 GHz cores. Many of the CPU cycles used for TCP/IP processing are ''freed-up'' by TCP/IP offload and may be used by the CPU (usually a
server Server may refer to: Computing *Server (computing), a computer program or a device that provides functionality for other programs or devices, called clients Role * Waiting staff, those who work at a restaurant or a bar attending customers and su ...
CPU) to perform other tasks such as file system processing (in a file server) or indexing (in a backup media server). In other words, a server with TCP/IP offload can do more server work than a server without TCP/IP offload NICs.


Reduction of PCI traffic

In addition to the protocol overhead that TOE can address, it can also address some architectural issues that affect a large percentage of host based (server and PC) endpoints. Many older end point hosts are PCI bus based, which provides a standard interface for the addition of certain
peripherals A peripheral or peripheral device is an auxiliary device used to put information into and get information out of a computer. The term ''peripheral device'' refers to all hardware components that are attached to a computer and are controlled by the ...
such as Network Interfaces to Servers and PCs. PCI is inefficient for transferring small bursts of data from main memory, across the PCI bus to the network interface ICs, but its efficiency improves as the data burst size increases. Within the TCP protocol, a large number of small packets are created (e.g. acknowledgements) and as these are typically generated on the host CPU and transmitted across the PCI bus and out the network physical interface, this impacts the host computer IO throughput. A TOE solution, located on the network interface, is located on the other side of the PCI bus from the CPU host so it can address this I/O efficiency issue, as the data to be sent across the TCP connection can be sent to the TOE from the CPU across the PCI bus using large data burst sizes with none of the smaller TCP packets having to traverse the PCI bus.


History

One of the first patents in this technology, for UDP offload, was issued to
Auspex Systems Auspex Systems was a computer data storage company founded in 1987 by Larry Boucher, who was previously CEO of Adaptec. It was headquartered in Santa Clara, California. Auspex introduced the first network-attached storage (NAS) devices. After an ...
in early 1990. Auspex founder Larry Boucher and a number of Auspex engineers went on to found Alacritech in 1997 with the idea of extending the concept of network stack offload to TCP and implementing it in custom silicon. They introduced the first parallel-stack full offload network card in early 1999; the company's SLIC (Session Layer Interface Card) was the predecessor to its current TOE offerings. Alacritech holds a number of patents in the area of TCP/IP offload. By 2002, as the emergence of TCP-based storage such as
iSCSI Internet Small Computer Systems Interface or iSCSI ( ) is an Internet Protocol-based storage networking standard for linking data storage facilities. iSCSI provides block-level access to storage devices by carrying SCSI commands over a TCP/IP ...
spurred interest, it was said that "At least a dozen newcomers, most founded toward the end of the dot-com bubble, are chasing the opportunity for merchant semiconductor accelerators for storage protocols and applications, vying with half a dozen entrenched vendors and in-house ASIC designs." In 2005
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washin ...
licensed Alacritech's patent base and along with Alacritech created the partial TCP offload architecture that has become known as TCP chimney offload. TCP chimney offload centers on the Alacritech "Communication Block Passing Patent". At the same time, Broadcom also obtained a license to build TCP chimney offload chips.


Types

Instead of replacing the TCP stack with a TOE entirely, there are alternative techniques to offload some operations in co-operation with the operating system's TCP stack.
TCP checksum offload The Transmission Control Protocol (TCP) is one of the main protocols of the Internet protocol suite. It originated in the initial network implementation in which it complemented the Internet Protocol (IP). Therefore, the entire suite is commonly ...
and large segment offload are supported by the majority of today's Ethernet NICs. Newer techniques like
large receive offload TCP offload engine (TOE) is a technology used in some network interface cards (NIC) to offload processing of the entire TCP/IP stack to the network controller. It is primarily used with high-speed network interfaces, such as gigabit Ethernet and ...
and TCP acknowledgment offload are already implemented in some high-end Ethernet hardware, but are effective even when implemented purely in software.


Parallel-stack full offload

Parallel-stack full offload gets its name from the concept of two parallel TCP/IP Stacks. The first is the main host stack which is included with the host OS. The second or "parallel stack" is connected between the Application Layer and the Transport Layer (TCP) using a "vampire tap". The vampire tap intercepts TCP connection requests by applications and is responsible for TCP connection management as well as TCP data transfer. Many of the criticisms in the following section relate to this type of TCP offload.


HBA full offload

HBA (Host Bus Adapter) full offload is found in iSCSI
host adapter In computer hardware, a host controller, host adapter, or host bus adapter (HBA), connects a computer system bus, which acts as the host system, to other network and storage devices. The terms are primarily used to refer to devices for conne ...
s which present themselves as disk controllers to the host system while connecting (via TCP/IP) to an
iSCSI Internet Small Computer Systems Interface or iSCSI ( ) is an Internet Protocol-based storage networking standard for linking data storage facilities. iSCSI provides block-level access to storage devices by carrying SCSI commands over a TCP/IP ...
storage device. This type of TCP offload not only offloads TCP/IP processing but it also offloads the iSCSI initiator function. Because the HBA appears to the host as a disk controller, it can only be used with iSCSI devices and is not appropriate for general TCP/IP offload.


TCP chimney partial offload

TCP chimney offload addresses the major security criticism of parallel-stack full offload. In partial offload, the main system stack controls all connections to the host. After a connection has been established between the local host (usually a server) and a foreign host (usually a client) the connection and its state are passed to the TCP offload engine. The heavy lifting of data transmit and receive is handled by the offload device. Almost all TCP offload engines use some type of TCP/IP hardware implementation to perform the data transfer without host CPU intervention. When the connection is closed, the connection state is returned from the offload engine to the main system stack. Maintaining control of TCP connections allows the main system stack to implement and control connection security.


Large receive offload

Large receive offload (LRO) is a technique for increasing inbound
throughput Network throughput (or just throughput, when in context) refers to the rate of message delivery over a communication channel, such as Ethernet or packet radio, in a communication network. The data that these messages contain may be delivered ove ...
of high-
bandwidth Bandwidth commonly refers to: * Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range * Bandwidth (computing), the rate of data transfer, bit rate or thr ...
network connections by reducing
central processing unit A central processing unit (CPU), also called a central processor, main processor or just Processor (computing), processor, is the electronic circuitry that executes Instruction (computing), instructions comprising a computer program. The CPU per ...
(CPU) overhead. It works by aggregating multiple incoming
packet Packet may refer to: * A small container or pouch ** Packet (container), a small single use container ** Cigarette packet ** Sugar packet * Network packet, a formatted unit of data carried by a packet-mode computer network * Packet radio, a fo ...
s from a single stream into a larger buffer before they are passed higher up the networking stack, thus reducing the number of packets that have to be processed.
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, w ...
implementations generally use LRO in conjunction with the
New API New API (also referred to as NAPI) is an interface to use interrupt mitigation techniques for networking devices in the Linux kernel. Such an approach is intended to reduce the overhead of packet receiving. The idea is to defer incoming message h ...
(NAPI) to also reduce the number of
interrupt In digital computers, an interrupt (sometimes referred to as a trap) is a request for the processor to ''interrupt'' currently executing code (when permitted), so that the event can be processed in a timely manner. If the request is accepted, ...
s. According to benchmarks, even implementing this technique entirely in software can increase network performance significantly. , the Linux kernel supports LRO for TCP in software only. FreeBSD 8 supports LRO in hardware on adapters that support it. LRO should not operate on machines acting as routers, as it breaks the
end-to-end principle The end-to-end principle is a design framework in computer networking. In networks designed according to this principle, guaranteeing certain application-specific features, such as reliability and security, requires that they reside in the commu ...
and can significantly impact performance.


Generic receive offload

Generic receive offload (GRO) implements a generalised LRO in software that isn't restricted to TCP/ IPv4 or have the issues created by LRO.


Large send offload

In
computer network A computer network is a set of computers sharing resources located on or provided by network nodes. The computers use common communication protocols over digital interconnections to communicate with each other. These interconnections are ...
ing, large send offload (LSO) is a technique for increasing egress
throughput Network throughput (or just throughput, when in context) refers to the rate of message delivery over a communication channel, such as Ethernet or packet radio, in a communication network. The data that these messages contain may be delivered ove ...
of high-
bandwidth Bandwidth commonly refers to: * Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range * Bandwidth (computing), the rate of data transfer, bit rate or thr ...
network connections by reducing CPU overhead. It works by passing a multipacket buffer to the network interface card (NIC). The NIC then splits this buffer into separate packets. The technique is also called TCP segmentation offload (TSO) or generic segmentation offload (GSO) when applied to TCP. LSO and LRO are independent and use of one does not require the use of the other. When a system needs to send large chunks of data out over a computer network, the chunks first need breaking down into smaller segments that can pass through all the network elements like routers and switches between the source and destination computers. This process is referred to as '' segmentation''. Often the TCP protocol in the host computer performs this segmentation. Offloading this work to the NIC is called ''TCP segmentation offload'' (TSO). For example, a unit of 64 KiB (65,536 bytes) of data is usually segmented to 45 segments of 1460 bytes each before it is sent through the NIC and over the network. With some intelligence in the NIC, the host CPU can hand over the 64 KB of data to the NIC in a single transmit-request, the NIC can break that data down into smaller segments of 1460 bytes, add the TCP, IP, and data link layer protocol headers — according to a template provided by the host's TCP/IP stack — to each segment, and send the resulting frames over the network. This significantly reduces the work done by the CPU. many new NICs on the market support TSO. Some network cards implement TSO generically enough that it can be used for offloading fragmentation of other
transport layer In computer networking, the transport layer is a conceptual division of methods in the layered architecture of protocols in the network stack in the Internet protocol suite and the OSI model. The protocols of this layer provide end-to-end ...
protocols, or for doing
IP fragmentation file:PDU Fragmentation-en.png, 400px, An example of the fragmentation of a protocol data unit in a given layer into smaller fragments. IP fragmentation is an Internet Protocol (IP) process that breaks network packet, packets into smaller pieces ( ...
for protocols that don't support fragmentation by themselves, such as UDP.


Support in Linux

Unlike other operating systems, such as FreeBSD, the Linux kernel does not include support for TOE (not to be confused with other types of network offload). While there are patches from the hardware manufacturers such as
Chelsio Chelsio Communications is a privately held technology company headquartered in Sunnyvale, California with a design center in Bangalore, India. Early venture capital funding came from Horizons Ventures, Invesco, Investor Growth Capital, NTT Fin ...
or Qlogic that add TOE support, the Linux kernel developers are opposed to this technology for several reasons:. * ''Security'' – because TOE is implemented in hardware, patches must be applied to the TOE firmware, instead of just software, to address any security vulnerabilities found in a particular TOE implementation. This is further compounded by the newness and vendor-specificity of this hardware, as compared to a well tested TCP/IP stack as is found in an operating system that does not use TOE. * ''Limitations'' of hardware – because connections are buffered and processed on the TOE chip, resource starvation can more easily occur as compared to the generous CPU and memory available to the operating system. * ''Complexity'' – TOE breaks the assumption that kernels make about having access to all resources at all times – details such as memory used by open connections are not available with TOE. TOE also requires very large changes to a networking stack in order to be supported properly, and even when that is done, features like
quality of service Quality of service (QoS) is the description or measurement of the overall performance of a service, such as a telephony or computer network, or a cloud computing service, particularly the performance seen by the users of the network. To quantitat ...
and packet filtering might not work. * ''Proprietary'' – TOE is implemented differently by each hardware vendor. This means more code must be rewritten to deal with the various TOE implementations, at a cost of the aforementioned complexity and, possibly, security. Furthermore, TOE firmware cannot be easily modified since it is closed-source. * ''Obsolescence'' – Each TOE NIC has a limited lifetime of usefulness, because system hardware rapidly catches up to TOE performance levels, and eventually exceeds TOE performance levels.


Suppliers

Much of the current work on TOE technology is by manufacturers of 10 Gigabit Ethernet interface cards, such as
Broadcom Broadcom Inc. is an American designer, developer, manufacturer and global supplier of a wide range of semiconductor and infrastructure software products. Broadcom's product offerings serve the data center, networking, software, broadband, wirel ...
,
Chelsio Communications Chelsio Communications is a privately held technology company headquartered in Sunnyvale, California with a design center in Bangalore, India. Early venture capital funding came from Horizons Ventures, Invesco, Investor Growth Capital, NTT F ...
,
Emulex Emulex Corporation is a provider of computer network connectivity, monitoring and management hardware and software. The company's I/O connectivity offerings, including its line of Ethernet and Fibre Channel-based connectivity products, are or w ...
,
Mellanox Technologies Mellanox Technologies Ltd. ( he, מלאנוקס טכנולוגיות בע"מ) was an Israeli-American multinational supplier of computer networking products based on InfiniBand and Ethernet technology. Mellanox offered adapters, switches, softwa ...
, QLogic.


See also

* Scalable Networking Pack * I/O Acceleration Technology (I/OAT) *
Energy Efficient Ethernet In computer networking, Energy-Efficient Ethernet (EEE) is a set of enhancements to twisted-pair, twinaxial, backplane, and optical fiber Ethernet physical-layer variants that reduce power consumption during periods of low data activity. The ...
(EEE) *
Autonomous peripheral operation In computing, autonomous peripheral operation is a hardware feature found in some microcontroller architectures to off-load certain tasks into embedded autonomous peripherals in order to minimize latencies and improve throughput in hard real-tim ...


References


External links

* Article
TCP Offload to the Rescue
by Andy Currid a
ACM Queue


* * {{cite web , title= TCP/IP offload Engine (TOE) , url= http://line-provider.com/whitepapers/tcpip-offload-engine-toe/ , publisher= 10 Gigabit Ethernet Alliance , date=April 2002
Windows Network Task Offload

GSO in Linux

Brief Description of LSO in Linux

Case Studies of Performance issues with LSO and Traffic Shaping (Linux)


Networking hardware Network acceleration Offload Engine