The Berkeley Packet Filter (BPF) is a technology used in certain computer operating systems for programs that need to, among other things, analyze network traffic. It provides a raw interface to
data link layer
The data link layer, or layer 2, is the second layer of the seven-layer OSI model of computer networking. This layer is the protocol layer that transfers data between nodes on a network segment across the physical layer. The data link layer ...
s, permitting raw link-layer packets to be sent and received.
In addition, if the driver for the network interface supports
promiscuous mode
In computer networking, promiscuous mode is a mode for a wired network interface controller (NIC) or wireless network interface controller (WNIC) that causes the controller to pass all traffic it receives to the central processing unit (CPU) rat ...
, it allows the interface to be put into that mode so that all packets on the
network can be received, even those destined to other hosts.
BPF supports filtering packets, allowing a
userspace
A modern computer operating system usually segregates virtual memory into user space and kernel space. Primarily, this separation serves to provide memory protection and hardware protection from malicious or errant software behaviour.
Kernel ...
process
A process is a series or set of activities that interact to produce a result; it may occur once-only or be recurrent or periodic.
Things called a process include:
Business and management
*Business process, activities that produce a specific se ...
to supply a filter program that specifies which packets it wants to receive. For example, a
tcpdump process may want to receive only packets that initiate a TCP connection. BPF returns only packets that pass the filter that the process supplies. This avoids copying unwanted packets from the
operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
kernel
Kernel may refer to:
Computing
* Kernel (operating system), the central component of most operating systems
* Kernel (image processing), a matrix used for image convolution
* Compute kernel, in GPGPU programming
* Kernel method, in machine lea ...
to the process, greatly improving performance. The filter program is in the form of instructions for a
virtual machine
In computing, a virtual machine (VM) is the virtualization/ emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized har ...
, which are interpreted, or compiled into machine code by a
just-in-time (JIT) mechanism and executed, in the kernel.
BPF is sometimes used to refer to just the filtering mechanism, rather than to the entire interface. Some systems, such as
Linux
Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which i ...
and
Tru64 UNIX
Tru64 UNIX is a discontinued 64-bit UNIX operating system for the Alpha instruction set architecture (ISA), currently owned by Hewlett-Packard (HP). Previously, Tru64 UNIX was a product of Compaq, and before that, Digital Equipment Corporation ( ...
, provide a raw interface to the data link layer other than the BPF raw interface but use the BPF filtering mechanisms for that raw interface. The BPF filtering mechanism is available on most
Unix-like
A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-li ...
operating systems.
The Linux kernel provides an extended version of the BPF filtering mechanism, called eBPF, which uses a JIT mechanism, and which is used for packet filtering, as well as for other purposes in the kernel. eBPF is also available for
Microsoft Windows.
Raw data-link interface
BPF provides
pseudo-devices that can be bound to a network interface; reads from the device will read buffers full of packets received on the network interface, and writes to the device will inject packets on the network interface.
In 2007,
Robert Watson and Christian Peron added
zero-copy
"Zero-copy" describes computer operations in which the CPU does not perform the task of copying data from one memory area to another or in which unnecessary data copies are avoided. This is frequently used to save CPU cycles and memory bandwid ...
buffer extensions to the BPF implementation in the
FreeBSD
FreeBSD is a free and open-source Unix-like operating system descended from the Berkeley Software Distribution (BSD), which was based on Research Unix. The first version of FreeBSD was released in 1993. In 2005, FreeBSD was the most popular ...
operating system, allowing kernel packet capture in the device driver interrupt handler to write directly to user process memory in order to avoid the requirement for two copies for all packet data received via the BPF device. While one copy remains in the receipt path for user processes, this preserves the independence of different BPF device consumers, as well as allowing the packing of headers into the BPF buffer rather than copying complete packet data.
Filtering
BPF's filtering capabilities are implemented as an interpreter for a
machine language
In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ver ...
for the BPF
virtual machine
In computing, a virtual machine (VM) is the virtualization/ emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized har ...
, a 32-bit machine with fixed-length instructions, one
accumulator, and one
index register. Programs in that language can fetch data from the packet, perform
arithmetic
Arithmetic () is an elementary part of mathematics that consists of the study of the properties of the traditional operations on numbers—addition, subtraction, multiplication, division, exponentiation, and extraction of roots. In the 19th c ...
operations on data from the packet, and compare the results against constants or against data in the packet or test
bit
The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented a ...
s in the results, accepting or rejecting the packet based on the results of those tests.
BPF is often extended by "overloading" the load (ld) and store (str) instructions.
Traditional Unix-like BPF implementations can be used in userspace, despite being written for kernel-space. This is accomplished using
preprocessor conditions.
Extensions and optimizations
Some projects use BPF instruction sets or execution techniques different from the originals.
Some platforms, including
FreeBSD
FreeBSD is a free and open-source Unix-like operating system descended from the Berkeley Software Distribution (BSD), which was based on Research Unix. The first version of FreeBSD was released in 1993. In 2005, FreeBSD was the most popular ...
,
NetBSD
NetBSD is a free and open-source Unix operating system based on the Berkeley Software Distribution (BSD). It was the first open-source BSD descendant officially released after 386BSD was forked. It continues to be actively developed and is a ...
, and
WinPcap, use a
just-in-time (JIT) compiler to convert BPF instructions into
native code
In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ver ...
in order to improve performance. Linux includes a BPF JIT compiler which is disabled by default.
Kernel-mode interpreters for that same virtual machine language are used in raw data link layer mechanisms in other operating systems, such as
Tru64 Unix
Tru64 UNIX is a discontinued 64-bit UNIX operating system for the Alpha instruction set architecture (ISA), currently owned by Hewlett-Packard (HP). Previously, Tru64 UNIX was a product of Compaq, and before that, Digital Equipment Corporation ( ...
, and for socket filters in the
Linux kernel
The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally authored in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU ...
and in the WinPcap and
Npcap packet capture mechanism.
Since version 3.18, the Linux kernel includes an extended BPF virtual machine with ten 64-bit registers, termed extended BPF (eBPF). It can be used for non-networking purposes, such as for attaching eBPF programs to various
tracepoint
ftrace (Function Tracer) is a tracing framework for the Linux kernel. Although its original name, Function Tracer, came from ftrace's ability to record information related to various function calls performed while the kernel is running, ftrace ...
s. Since kernel version 3.19, eBPF filters can be attached to
sockets, and, since kernel version 4.1, to
traffic control classifiers for the ingress and egress networking data path. The original and obsolete version has been retroactively renamed to classic BPF (cBPF). Nowadays, the Linux kernel runs eBPF only and loaded cBPF bytecode is transparently translated into an eBPF representation in the kernel before program execution. All bytecode is verified before running to prevent denial-of-service attacks. Until Linux 5.3, the verifier prohibited the use of loops, to prevent potentially unbounded execution times; loops with bounded execution time are now permitted in more recent kernels.
A user-mode interpreter for BPF is provided with the libpcap/WinPcap/Npcap implementation of the
pcap API, so that, when capturing packets on systems without kernel-mode support for that filtering mechanism, packets can be filtered in user mode; code using the pcap API will work on both types of systems, although, on systems where the filtering is done in user mode, all packets, including those that will be filtered out, are copied from the kernel to user space. That interpreter can also be used when reading a file containing packets captured using pcap.
Another user-mode interpreter is uBPF, which supports JIT and eBPF (without cBPF). Its code has been reused to provide eBPF support in non-Linux systems. Microsoft's eBPF on Windows builds on uBPF and the PREVAIL formal verifier. rBPF, a Rust rewrite of uBPF, is used by the
Solana blockchain platform as the execution engine.
Programming
Classic BPF is generally emitted by a program from some very high-level textual rule describing the pattern to match. One such representation is found in
libpcap. Classic BPF and eBPF can also be written either directly as
machine code
In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ver ...
, or using an
assembly language for a textual representation. Notable assemblers include Linux kernel's tool (cBPF), (cBPF), and the assembler (eBPF). The command can also act as a disassembler for both flavors of BPF. The assembly languages are not necessarily compatible with each other.
eBPF bytecode has recently become a target of higher-level languages.
LLVM
LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate repre ...
added eBPF support in 2014, and
GCC followed in 2019. Both toolkits allow compiling
C and other supported languages to eBPF. A subset of
P4 can also be compiled into eBPF using BCC, an LLVM-based compiler kit.
History
The original paper was written by
Steven McCanne
Stephen or Steven is a common English first name. It is particularly significant to Christians, as it belonged to Saint Stephen ( grc-gre, Στέφανος ), an early disciple and deacon who, according to the Book of Acts, was stoned to death; h ...
and
Van Jacobson in 1992 while at
Lawrence Berkeley Laboratory.
In August 2003,
SCO Group
The SCO Group (often referred to SCO and later called The TSG Group) was an American software company in existence from 2002 to 2012 that became known for owning Unix operating system assets that had belonged to the Santa Cruz Operation (the ...
publicly claimed that the Linux kernel was infringing Unix code which they owned. Programmers quickly discovered that one example they gave was the Berkeley Packet Filter, which in fact SCO never owned. SCO has not explained or acknowledged the mistake but the
ongoing legal action may eventually force an answer.
Security concerns
The
Spectre
Spectre, specter or the spectre may refer to:
Religion and spirituality
* Vision (spirituality)
* Apparitional experience
* Ghost
Arts and entertainment Film and television
* ''Spectre'' (1977 film), a made-for-television film produced and writ ...
attack could leverage the Linux kernel's eBPF interpreter or JIT compiler to extract data from other kernel processes. A JIT hardening feature in the kernel mitigates this vulnerability.
Chinese computer security group Pangu Lab said the
NSA used BPF to conceal network communications as part of a complex Linux
backdoor.
See also
*
Data link layer
The data link layer, or layer 2, is the second layer of the seven-layer OSI model of computer networking. This layer is the protocol layer that transfers data between nodes on a network segment across the physical layer. The data link layer ...
*
Proof-carrying code Proof-carrying code (PCC) is a software mechanism that allows a host system to verify properties about an application via a formal proof that accompanies the application's executable code. The host system can quickly verify the validity of the pro ...
*
Express Data Path
XDP (eXpress Data Path) is an eBPF-based high-performance data path used to send and receive network packets at high rates by bypassing most of the operating system networking stack. It is merged in the Linux kernel since version 4.8. This im ...
References
Further reading
*
External links
* – an example of conventional BPF
eBPF.io - Introduction, Tutorials & Community Resourcesbpfc, a Berkeley Packet Filter compiler, Linux BPF JIT disassembler(part of netsniff-ng)
for Linux kernel
Linux filter documentation for both cBPF and eBPF bytecode formats
* {{GitHub, https://github.com/microsoft/ebpf-for-windows
Internet Protocol based network software
Packets (information technology)