*nix Documentation Project
·  Home
 +   man pages
·  Linux HOWTOs
·  FreeBSD Tips
·  *niX Forums

  man pages->OpenBSD man pages -> bpf (4)              



NAME    [Toc]    [Back]

     bpf - Berkeley Packet Filter

SYNOPSIS    [Toc]    [Back]

     pseudo-device bpfilter

DESCRIPTION    [Toc]    [Back]

     The Berkeley Packet Filter provides a raw interface to  data
link layers
     in  a protocol-independent fashion.  All packets on the network, even
     those destined for other hosts, are accessible through  this

     The  packet  filter  appears  as a character special device,
     /dev/bpf1, etc.  After opening the device, the file descriptor must be
     bound  to  a  specific  network interface with the BIOCSETIF
ioctl.  A given
     interface can be shared between multiple listeners, and  the
filter underlying
 each descriptor will see an identical packet stream.

     A  separate  device  file is required for each minor device.
If a file is
     in use, the open will fail and errno will be set  to  EBUSY.
The number of
     open  files  can  be increased by creating additional device
nodes with the
     MAKEDEV(8) script.

     Associated with each open instance of a bpf file is a  usersettable packet
  filter.   Whenever a packet is received by an interface,
all file descriptors
 listening on that interface  apply  their  filter.
Each descriptor
 that accepts the packet receives its own copy.

     Reads from these files return the next group of packets that
have matched
     the filter.  To improve performance, the  buffer  passed  to
read must be
     the  same  size as the buffers used internally by bpf.  This
size is returned
 by the BIOCGBLEN ioctl (see below),  and  under  BSD,
can be set with
     BIOCSBLEN.   Note that an individual packet larger than this
size is necessarily

     The packet filter will support any link level protocol  that
has fixed
     length  headers.   Currently,  only  Ethernet, SLIP, and PPP
drivers have
     been modified to interact with bpf.

     Since packet data is in  network  byte  order,  applications
should use the
     byteorder(3) macros to extract multi-byte values.

     A  packet can be sent out on the network by writing to a bpf
file descriptor.
  Each descriptor can also have a  user-settable  filter
for controlling
  the writes.  Only packets matching the filter are sent
out of the
     interface.  The writes  are  unbuffered,  meaning  only  one
packet can be
     processed per write.

     Once a descriptor is configured, further changes to the configuration can
     be prevented using the BIOCLOCK ioctl.

   Ioctls    [Toc]    [Back]
     The ioctl command codes below are  defined  in  <net/bpf.h>.
All commands
     require these includes:

           #include <sys/types.h>
           #include <sys/time.h>
           #include <sys/ioctl.h>
           #include <net/bpf.h>

     Additionally, BIOCGETIF and BIOCSETIF require <sys/socket.h>

     The (third) argument to the ioctl(2) call should be a pointer to the type

     BIOCGBLEN (int)
             Returns  the required buffer length for reads on bpf

     BIOCSBLEN (u_int)
             Sets the buffer length for reads on bpf files.   The
buffer must
             be  set  before the file is attached to an interface
             BIOCSETIF.  If the requested buffer size  cannot  be
             the  closest allowable size will be set and returned
in the argument.
  A read call will  result  in  EIO  if  it  is
passed a buffer
             that is not this size.

     BIOCGDLT (u_int)
             Returns  the  type of the data link layer underlying
the attached
             interface.  EINVAL is returned if no  interface  has
been specified.
  The device types, prefixed with ``DLT_'', are
defined in

             Forces the interface  into  promiscuous  mode.   All
packets, not
             just  those  destined  for  the local host, are processed.  Since
             more than one file can be listening on a  given  interface, a listener
  that  opened  its interface non-promiscuously
may receive
             packets promiscuously.  This problem can be remedied
with an appropriate

             The  interface remains in promiscuous mode until all
files listening
 promiscuously are closed.

             Flushes the buffer of incoming  packets  and  resets
the statistics
             that are returned by BIOCGSTATS.

             This  ioctl  is designed to prevent the security issues associated
             with an open bpf  descriptor  in  unprivileged  programs.  Even with
             dropped  privileges,  an  open bpf descriptor can be
abused by a
             rogue program to listen on any interface on the system, send
             packets  on  these  interfaces if the descriptor was
opened readwrite
 and send signals to arbitrary processes  using
the signaling
             mechanism  of  bpf.  By allowing only ``known safe''
ioctls, the
             BIOCLOCK ioctl prevents this abuse.   The  allowable
ioctls are
             BIOCGHDRCMPLT,  TIOCGPGRP, and FIONREAD.  Use of any
other ioctl
             is denied with error EPERM.  Once  a  descriptor  is
locked, it is
             not  possible  to  unlock  it.   A process with root
privileges is not
             affected by the lock.

             A privileged program can open  a  bpf  device,  drop
privileges, set
             the  interface, filters and modes on the descriptor,
and lock it.
             Once the descriptor is locked, the  system  is  safe
from further
             abuse  through the descriptor.  Locking a descriptor
does not prevent
 writes.  If the application does  not  need  to
send packets
             through  bpf,  it  can  open the device read-only to
prevent writing.
             If sending packets is necessary, a write-filter  can
be set before
             locking  the descriptor to prevent arbitrary packets
from being
             sent out.

     BIOCGETIF (struct ifreq)
             Returns the name of the hardware interface that  the
file is listening
  on.   The  name  is returned in the ifr_name
field of the
             struct ifreq.  All other fields are undefined.

     BIOCSETIF (struct ifreq)
             Sets the  hardware  interface  associated  with  the
file.  This command
  must  be  performed  before any packets can be
read.  The device
 is indicated by name using the  ifr_name  field
of the struct
             ifreq.    Additionally,   performs  the  actions  of

             Set or get the read timeout parameter.  The  timeval
specifies the
             length  of  time to wait before timing out on a read
request.  This
             parameter is initialized to zero by  open(2),  indicating no timeout.

     BIOCGSTATS (struct bpf_stat)
             Returns  the  following  structure of packet statistics:

                   struct bpf_stat {
                           u_int bs_recv;
                           u_int bs_drop;

             The fields are:

             bs_recv  Number of packets received by the  descriptor since
                      opened  or  reset  (including  any buffered
since the last
                      read call).

             bs_drop  Number of packets which  were  accepted  by
the filter but
                      dropped  by  the  kernel  because of buffer
overflows (i.e.,
                      the application's reads aren't  keeping  up
with the packet

     BIOCIMMEDIATE (u_int)
             Enable  or  disable ``immediate mode'', based on the
truth value of
             the argument.  When immediate mode is enabled, reads
return immediately
  upon  packet  reception.  Otherwise, a read
will block until
 either the kernel buffer becomes full or a timeout occurs.
             This  is  useful  for  programs like rarpd(8), which
must respond to
             messages in real time.  The default for a  new  file
is off.

     BIOCSETF (struct bpf_program)
             Sets  the  filter program used by the kernel to discard uninteresting
 packets.   An  array  of  instructions  and  its
length are passed
             in using the following structure:

                   struct bpf_program {
                           int bf_len;
                           struct bpf_insn *bf_insns;

             The  filter  program  is  pointed to by the bf_insns
field, while its
             length in units of struct bpf_insn is given  by  the
bf_len field.
             Also, the actions of BIOCFLUSH are performed.

             See section FILTER MACHINE for an explanation of the
filter language.

     BIOCSETWF (struct bpf_program)
             Sets the filter program used by the kernel to filter
the packets
             written  to  the  descriptor  before the packets are
sent out on the
             network.  See BIOCSETF for a description of the filter program.
             This ioctl also acts as BIOCFLUSH.

             Note  that  the  filter  operates on the packet data
written to the
             descriptor.  If the ``header complete'' flag is  not
set, the kernel
 sets the link-layer source address of the packet
after filtering.

     BIOCVERSION (struct bpf_version)
             Returns the major and minor version numbers  of  the
filter language
  currently  recognized  by the kernel.  Before
installing a
             filter, applications must  check  that  the  current
version is compatible
  with  the  running kernel.  Version numbers
are compatible
             if the major numbers match and the application minor
is less than
             or  equal  to  the kernel minor.  The kernel version
number is returned
 in the following structure:

                   struct bpf_version {
                           u_short bv_major;
                           u_short bv_minor;

             The current version numbers  are  given  by  BPF_MAJOR_VERSION and
             BPF_MINOR_VERSION from <net/bpf.h>.  An incompatible
filter may
             result in undefined behavior (most likely, an  error
returned by
             ioctl(2) or haphazard packet matching).

             Set  or get the receive signal.  This signal will be
sent to the
             process or process group specified by FIOSETOWN.  It
defaults to

             Set  or  get  the  status of the ``header complete''
flag.  Set to
             zero if the link  level  source  address  should  be
filled in automatically
  by  the interface output routine.  Set to
one if the
             link level source address will be written,  as  provided, to the
             wire.   This flag is initialized to zero by default.

   Standard ioctls    [Toc]    [Back]
     bpf now supports several standard ioctls which allow the user to do asynchronous
  and/or  non-blocking  I/O  to an open bpf file descriptor.

     FIONREAD (int)
             Returns the number of  bytes  that  are  immediately
available for

     SIOCGIFADDR (struct ifreq)
             Returns the address associated with the interface.

     FIONBIO (int)
             Set  or  clear non-blocking I/O.  If the argument is
non-zero, enable
 non-blocking I/O.  If  the  argument  is  zero,
disable nonblocking
  I/O.   If non-blocking I/O is enabled, the
return value
             of a read while no data is available will be 0.  The
             read  behavior  is  different  from  performing nonblocking reads on
             other file descriptors, which will return -1 and set
errno to
             EAGAIN  if no data is available.  Note: setting this
overrides the
             timeout set by BIOCSRTIMEOUT.

     FIOASYNC (int)
             Enable or disable asynchronous  I/O.   When  enabled
(argument is
             non-zero), the process or process group specified by
             will start receiving SIGIO signals when packets  arrive.  Note
             that  you must perform an FIOSETOWN command in order
for this to
             take effect, as the system will not  do  it  by  default.  The signal
             may be changed via BIOCSRSIG.

             Set  or  get  the process or process group (if negative) that should
             receive SIGIO when packets are available.  The  signal may be
             changed using BIOCSRSIG (see above).

   BPF header    [Toc]    [Back]
     The following structure is prepended to each packet returned
by read(2):

           struct bpf_hdr {
                   struct bpf_timeval bh_tstamp;
                   u_int32_t       bh_caplen;
                   u_int32_t       bh_datalen;
                   u_int16_t       bh_hdrlen;

     The fields, stored in host order, are as follows:

             Time at which the packet was processed by the packet

             Length  of the captured portion of the packet.  This
is the minimum
 of the truncation amount specified by the filter
and the
             length of the packet.

             Length  of  the  packet off the wire.  This value is
independent of
             the truncation amount specified by the filter.

             Length of the BPF header, which may not be equal  to

     The  bh_hdrlen  field  exists to account for padding between
the header and
     the link level protocol.  The purpose here is  to  guarantee
proper alignment
  of  the  packet  data structures, which is required on
 architectures and improves performance  on  many  other
     The  packet  filter ensures that the bpf_hdr and the network
layer header
     will be word aligned.  Suitable precautions  must  be  taken
when accessing
     the  link  layer protocol fields on alignment restricted machines.  (This
     isn't a problem on an Ethernet, since the type  field  is  a
short falling
     on  an  even offset, and the addresses are probably accessed
in a bytewise

     Additionally, individual packets are  padded  so  that  each
starts on a word
     boundary.  This requires that an application has some knowledge of how to
     get from packet to packet.  The macro BPF_WORDALIGN  is  defined in
     <net/bpf.h>  to  facilitate  this process.  It rounds up its
argument to the
     nearest word aligned value (where a  word  is  BPF_ALIGNMENT
bytes wide).
     For  example, if p points to the start of a packet, this expression will
     advance it to the next packet:

           p  =  (char   *)p   +   BPF_WORDALIGN(p->bh_hdrlen   +

     For  the  alignment  mechanisms to work properly, the buffer
passed to
     read(2) must itself be word aligned.  malloc(3) will  always
return an
     aligned buffer.

   Filter machine    [Toc]    [Back]
     A  filter  program  is  an  array  of  instructions with all
branches forwardly
     directed, terminated by a ``return'' instruction.  Each  instruction performs
  some  action  on the pseudo-machine state, which consists of an accumulator,
 index register, scratch memory store, and  implicit

     The following structure defines the instruction format:

           struct bpf_insn {
                   u_int16_t       code;
                   u_char          jt;
                   u_char          jf;
                   u_int32_t       k;

     The  k field is used in different ways by different instructions, and the
     jt and jf fields are used as offsets by the branch  instructions.  The opcodes
 are encoded in a semi-hierarchical fashion.  There are
eight classes
  of  instructions:  BPF_LD,  BPF_LDX,  BPF_ST,   BPF_STX,
     BPF_RET, and BPF_MISC.  Various other mode and operator bits
are logically
 OR'd into the class to give the actual instructions.  The
classes and
     modes  are  defined in <net/bpf.h>.  Below are the semantics
for each defined
 bpf instruction.  We use the convention that A is  the
     X  is  the  index register, P[] packet data, and M[] scratch
memory store.
     P[i:n] gives the data at byte offset ``i''  in  the  packet,
interpreted as
     a  word  (n=4),  unsigned  halfword  (n=2), or unsigned byte
(n=1).  M[i]
     gives the i'th word in the scratch memory  store,  which  is
only addressed
     in  word  units.   The  memory  store  is  indexed from 0 to
     jt, and jf are the corresponding fields in  the  instruction
     ``len'' refers to the length of the packet.

     BPF_LD   These  instructions copy a value into the accumulator.  The type
             of the source operand is specified by an  ``addressing mode'' and
             can  be a constant (BPF_IMM), packet data at a fixed
             (BPF_ABS),  packet  data  at   a   variable   offset
(BPF_IND), the packet
             length  (BPF_LEN),  or  a word in the scratch memory
             (BPF_MEM).  For BPF_IND and BPF_ABS, the  data  size
must be specified
  as  a  word (BPF_W), halfword (BPF_H), or byte
(BPF_B).  The
             semantics of all recognized BPF_LD instructions follow.

             BPF_LD+BPF_W+BPF_ABS              A <- P[k:4]
             BPF_LD+BPF_H+BPF_ABS              A <- P[k:2]
             BPF_LD+BPF_B+BPF_ABS              A <- P[k:1]
             BPF_LD+BPF_W+BPF_IND              A <- P[X+k:4]
             BPF_LD+BPF_H+BPF_IND              A <- P[X+k:2]
             BPF_LD+BPF_B+BPF_IND              A <- P[X+k:1]
             BPF_LD+BPF_W+BPF_LEN              A <- len
             BPF_LD+BPF_IMM                    A <- k
             BPF_LD+BPF_MEM                    A <- M[k]

             These  instructions load a value into the index register.  Note
             that the addressing modes are more  restricted  than
those of the
             accumulator  loads, but they include BPF_MSH, a hack
for efficiently
 loading the IP header length.

             BPF_LDX+BPF_W+BPF_IMM             X <- k
             BPF_LDX+BPF_W+BPF_MEM             X <- M[k]
             BPF_LDX+BPF_W+BPF_LEN             X <- len
             BPF_LDX+BPF_B+BPF_MSH                    X        <-

     BPF_ST   This  instruction  stores  the accumulator into the
scratch memory.
             We do not need an addressing mode since there is only one possibility
 for the destination.

             BPF_ST                            M[k] <- A

             This  instruction  stores  the index register in the
scratch memory

             BPF_STX                           M[k] <- X

             The ALU instructions perform operations between  the
             and index register or constant, and store the result
back in the
             accumulator.  For binary operations, a  source  mode
is required
             (BPF_K or BPF_X).

             BPF_ALU+BPF_ADD+BPF_K             A <- A + k
             BPF_ALU+BPF_SUB+BPF_K             A <- A - k
             BPF_ALU+BPF_MUL+BPF_K             A <- A * k
             BPF_ALU+BPF_DIV+BPF_K             A <- A / k
             BPF_ALU+BPF_AND+BPF_K             A <- A & k
             BPF_ALU+BPF_OR+BPF_K              A <- A | k
             BPF_ALU+BPF_LSH+BPF_K             A <- A << k
             BPF_ALU+BPF_RSH+BPF_K             A <- A >> k
             BPF_ALU+BPF_ADD+BPF_X             A <- A + X
             BPF_ALU+BPF_SUB+BPF_X             A <- A - X
             BPF_ALU+BPF_MUL+BPF_X             A <- A * X
             BPF_ALU+BPF_DIV+BPF_X             A <- A / X
             BPF_ALU+BPF_AND+BPF_X             A <- A & X
             BPF_ALU+BPF_OR+BPF_X              A <- A | X
             BPF_ALU+BPF_LSH+BPF_X             A <- A << X
             BPF_ALU+BPF_RSH+BPF_X             A <- A >> X
             BPF_ALU+BPF_NEG                   A <- -A

             The jump instructions alter flow of control.  Conditional jumps
             compare the accumulator against a  constant  (BPF_K)
or the index
             register (BPF_X).  If the result is true (or non-zero), the true
             branch is taken, otherwise the false branch is  taken.  Jump offsets
  are  encoded  in 8 bits so the longest jump is
256 instructions.
  However, the jump always (BPF_JA) opcode uses the 32-bit
             k  field as the offset, allowing arbitrarily distant
             All conditionals  use  unsigned  comparison  conventions.

             BPF_JMP+BPF_JA                    pc += k
             BPF_JMP+BPF_JGT+BPF_K             pc += (A > k) ? jt
: jf
             BPF_JMP+BPF_JGE+BPF_K             pc += (A >=  k)  ?
jt : jf
             BPF_JMP+BPF_JEQ+BPF_K              pc  += (A == k) ?
jt : jf
             BPF_JMP+BPF_JSET+BPF_K            pc += (A & k) ? jt
: jf
             BPF_JMP+BPF_JGT+BPF_X             pc += (A > X) ? jt
: jf
             BPF_JMP+BPF_JGE+BPF_X             pc += (A >=  X)  ?
jt : jf
             BPF_JMP+BPF_JEQ+BPF_X              pc  += (A == X) ?
jt : jf
             BPF_JMP+BPF_JSET+BPF_X            pc += (A & X) ? jt
: jf

             The return instructions terminate the filter program
and specify
             the amount of packet to accept  (i.e.,  they  return
the truncation
             amount)  or,  for  the write filter, the maximum acceptable size for
             the packet (i.e., the packet is  dropped  if  it  is
larger than the
             returned  amount).  A return value of zero indicates
that the
             packet should be ignored/dropped.  The return  value
is either a
             constant (BPF_K) or the accumulator (BPF_A).

             BPF_RET + BPF_A                   Accept A bytes.
             BPF_RET + BPF_K                   Accept k bytes.

             The  miscellaneous category was created for anything
that doesn't
             fit into the above classes, and for any new instructions that
             might  need  to  be added.  Currently, these are the
register transfer
 instructions that copy the index register to the
             or vice versa.

             BPF_MISC+BPF_TAX                  X <- A
             BPF_MISC+BPF_TXA                  A <- X

     The  bpf  interface provides the following macros to facilitate array initializers:

           BPF_STMT (opcode, operand)

           BPF_JUMP (opcode, operand, true_offset, false_offset)

FILES    [Toc]    [Back]

     /dev/bpf[0-9]  BPF devices

EXAMPLES    [Toc]    [Back]

     The following filter is taken from the Reverse  ARP  daemon.
It accepts
     only Reverse ARP requests.

           struct bpf_insn insns[] = {
                   BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K,  ETHERTYPE_REVARP, 0, 3),
                   BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K,     REVARP_REQUEST, 0, 1),
                   BPF_STMT(BPF_RET+BPF_K,          sizeof(struct
ether_arp) +
                       sizeof(struct ether_header)),
                   BPF_STMT(BPF_RET+BPF_K, 0),

     This  filter  accepts   only   IP   packets   between   host and

           struct bpf_insn insns[] = {
                   BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
0, 8),
                   BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0,
                   BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3,
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0,
                   BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0,
                   BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
                   BPF_STMT(BPF_RET+BPF_K, 0),

     Finally, this filter returns only TCP  finger  packets.   We
must parse the
     IP header to reach the TCP header.  The BPF_JSET instruction
checks that
     the IP fragment offset is 0 so we are sure that  we  have  a
TCP header.

           struct bpf_insn insns[] = {
                   BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
0, 10),
                   BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
0, 8),
                   BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
                   BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K,   0x1fff,  6,
                   BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
                   BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
                   BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
                   BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
                   BPF_STMT(BPF_RET+BPF_K, 0),

SEE ALSO    [Toc]    [Back]

     ioctl(2), read(2), select(2),  signal(3),  MAKEDEV(8),  tcpdump(8)

     McCanne,  S.  and Jacobson V., An efficient, extensible, and
     network monitor.

HISTORY    [Toc]    [Back]

     The Enet packet filter was created in 1980 by  Mike  Accetta
and Rick
     Rashid  at  Carnegie-Mellon  University.   Jeffrey Mogul, at
Stanford, ported
     the code to BSD and continued its development from 1983  on.
Since then,
     it  has  evolved  into  the  Ultrix  Packet Filter at DEC, a
     under SunOS 4.1, and BPF.

AUTHORS    [Toc]    [Back]

     Steve McCanne of Lawrence  Berkeley  Laboratory  implemented
BPF in Summer
     1990.  Much of the design is due to Van Jacobson.

BUGS    [Toc]    [Back]

     The  read  buffer  must  be of a fixed size (returned by the

     A file that does not request promiscuous  mode  may  receive
     received packets as a side effect of another file requesting
this mode on
     the same hardware interface.  This could  be  fixed  in  the
kernel with additional
  processing  overhead.  However, we favor the model
where all
     files must assume that the interface is promiscuous, and  if
so desired,
     must utilize a filter to reject foreign packets.

     Data  link  protocols  with  variable length headers are not
currently supported.

OpenBSD      3.6                           May      23,      1991
[ Back ]
 Similar pages
Name OS Title
ng_bpf FreeBSD Berkeley packet filter netgraph node type
ipfstat FreeBSD reports on packet filter statistics and filter list
pf OpenBSD packet filter
pfil_add_hook NetBSD packet filter interface
bpf Tru64 BSD Packet Filter Extensions
packetfilter Tru64 Ethernet packet filter
pfil NetBSD packet filter interface
pfil FreeBSD packet filter interface
pfil_hook_get FreeBSD packet filter interface
pfil_remove_hook NetBSD packet filter interface
Copyright © 2004-2005 DeniX Solutions SRL
newsletter delivery service