ip-cref.tex revision 71e5815105fb0b86af7df9c719f7c106f05f29c0
1\documentstyle[12pt,twoside]{article}
2\def\TITLE{IP Command Reference}
3\input preamble
4\begin{center}
5\Large\bf IP Command Reference.
6\end{center}
7
8
9\begin{center}
10{ \large Alexey~N.~Kuznetsov } \\
11\em Institute for Nuclear Research, Moscow \\
12\verb|kuznet@ms2.inr.ac.ru| \\
13\rm April 14, 1999
14\end{center}
15
16\vspace{5mm}
17
18\tableofcontents
19
20\newpage
21
22\section{About this document}
23
24This document presents a comprehensive description of the \verb|ip| utility
25from the \verb|iproute2| package. It is not a tutorial or user's guide.
26It is a {\em dictionary\/}, not explaining terms,
27but translating them into other terms, which may also be unknown to the reader.
28However, the document is self-contained and the reader, provided they have a
29basic networking background, will find enough information
30and examples to understand and configure Linux-2.2 IP and IPv6
31networking.
32
33This document is split into sections explaining \verb|ip| commands
34and options, decrypting \verb|ip| output and containing a few examples.
35More voluminous examples and some topics, which require more elaborate
36discussion, are in the appendix.
37
38The paragraphs beginning with NB contain side notes, warnings about
39bugs and design drawbacks. They may be skipped at the first reading.
40
41\section{{\tt ip} --- command syntax}
42
43The generic form of an \verb|ip| command is:
44\begin{verbatim}
45ip [ OPTIONS ] OBJECT [ COMMAND [ ARGUMENTS ]]
46\end{verbatim}
47where \verb|OPTIONS| is a set of optional modifiers affecting the
48general behaviour of the \verb|ip| utility or changing its output. All options
49begin with the character \verb|'-'| and may be used in either long or abbreviated 
50forms. Currently, the following options are available:
51
52\begin{itemize}
53\item \verb|-V|, \verb|-Version|
54
55--- print the version of the \verb|ip| utility and exit.
56
57
58\item \verb|-s|, \verb|-stats|, \verb|-statistics|
59
60--- output more information. If the option
61appears twice or more, the amount of information increases.
62As a rule, the information is statistics or some time values.
63
64
65\item \verb|-f|, \verb|-family| followed by a protocol family
66identifier: \verb|inet|, \verb|inet6| or \verb|link|.
67
68--- enforce the protocol family to use. If the option is not present,
69the protocol family is guessed from other arguments. If the rest of the command
70line does not give enough information to guess the family, \verb|ip| falls back to the default
71one, usually \verb|inet| or \verb|any|. \verb|link| is a special family
72identifier meaning that no networking protocol is involved.
73
74\item \verb|-4|
75
76--- shortcut for \verb|-family inet|.
77
78\item \verb|-6|
79
80--- shortcut for \verb|-family inet6|.
81
82\item \verb|-0|
83
84--- shortcut for \verb|-family link|.
85
86
87\item \verb|-o|, \verb|-oneline|
88
89--- output each record on a single line, replacing line feeds
90with the \verb|'\'| character. This is convenient when you want to
91count records with \verb|wc| or to \verb|grep| the output. The trivial
92script \verb|rtpr| converts the output back into readable form.
93
94\item \verb|-r|, \verb|-resolve|
95
96--- use the system's name resolver to print DNS names instead of
97host addresses.
98
99\begin{NB}
100 Do not use this option when reporting bugs or asking for advice.
101\end{NB}
102\begin{NB}
103 \verb|ip| never uses DNS to resolve names to addresses.
104\end{NB}
105
106\end{itemize}
107
108\verb|OBJECT| is the object to manage or to get information about.
109The object types currently understood by \verb|ip| are:
110
111\begin{itemize}
112\item \verb|link| --- network device
113\item \verb|address| --- protocol (IP or IPv6) address on a device
114\item \verb|neighbour| --- ARP or NDISC cache entry
115\item \verb|route| --- routing table entry
116\item \verb|rule| --- rule in routing policy database
117\item \verb|maddress| --- multicast address
118\item \verb|mroute| --- multicast routing cache entry
119\item \verb|tunnel| --- tunnel over IP
120\end{itemize}
121
122Again, the names of all objects may be written in full or
123abbreviated form, f.e.\ \verb|address| is abbreviated as \verb|addr|
124or just \verb|a|.
125
126\verb|COMMAND| specifies the action to perform on the object.
127The set of possible actions depends on the object type.
128As a rule, it is possible to \verb|add|, \verb|delete| and
129\verb|show| (or \verb|list|) objects, but some objects
130do not allow all of these operations or have some additional commands.
131The \verb|help| command is available for all objects. It prints
132out a list of available commands and argument syntax conventions.
133
134If no command is given, some default command is assumed.
135Usually it is \verb|list| or, if the objects of this class
136cannot be listed, \verb|help|.
137
138\verb|ARGUMENTS| is a list of arguments to the command.
139The arguments depend on the command and object. There are two types of arguments:
140{\em flags\/}, consisting of a single keyword, and {\em parameters\/},
141consisting of a keyword followed by a value. For convenience,
142each command has some {\em default parameter\/}
143which may be omitted. F.e.\ parameter \verb|dev| is the default
144for the {\tt ip link} command, so {\tt ip link ls eth0} is equivalent
145to {\tt ip link ls dev eth0}.
146In the command descriptions below such parameters
147are distinguished with the marker: ``(default)''.
148
149Almost all keywords may be abbreviated with several first (or even single)
150letters. The shortcuts are convenient when \verb|ip| is used interactively,
151but they are not recommended in scripts or when reporting bugs
152or asking for advice. ``Officially'' allowed abbreviations are listed
153in the document body.
154
155
156
157\section{{\tt ip} --- error messages}
158
159\verb|ip| may fail for one of the following reasons:
160
161\begin{itemize}
162\item
163A syntax error on the command line: an unknown keyword, incorrectly formatted
164IP address {\em et al\/}. In this case \verb|ip| prints an error message
165and exits. As a rule, the error message will contain information
166about the reason for the failure. Sometimes it also prints a help page.
167
168\item
169The arguments did not pass verification for self-consistency.
170
171\item
172\verb|ip| failed to compile a kernel request from the arguments
173because the user didn't give enough information.
174
175\item
176The kernel returned an error to some syscall. In this case \verb|ip|
177prints the error message, as it is output with \verb|perror(3)|,
178prefixed with a comment and a syscall identifier.
179
180\item
181The kernel returned an error to some RTNETLINK request.
182In this case \verb|ip| prints the error message, as it is output
183with \verb|perror(3)| prefixed with ``RTNETLINK answers:''.
184
185\end{itemize}
186
187All the operations are atomic, i.e.\ 
188if the \verb|ip| utility fails, it does not change anything
189in the system. One harmful exception is \verb|ip link| command
190(Sec.\ref{IP-LINK}, p.\pageref{IP-LINK}),
191which may change only some of the device parameters given
192on command line.
193
194It is difficult to list all the error messages (especially
195syntax errors). However, as a rule, their meaning is clear
196from the context of the command.
197
198The most common mistakes are:
199
200\begin{enumerate}
201\item Netlink is not configured in the kernel. The message is:
202\begin{verbatim}
203Cannot open netlink socket: Invalid value
204\end{verbatim}
205
206\item RTNETLINK is not configured in the kernel. In this case
207one of the following messages may be printed, depending on the command:
208\begin{verbatim}
209Cannot talk to rtnetlink: Connection refused
210Cannot send dump request: Connection refused
211\end{verbatim}
212
213\item The \verb|CONFIG_IP_MULTIPLE_TABLES| option was not selected
214when configuring the kernel. In this case any attempt to use the
215\verb|ip| \verb|rule| command will fail, f.e.
216\begin{verbatim}
217kuznet@kaiser $ ip rule list
218RTNETLINK error: Invalid argument
219dump terminated
220\end{verbatim}
221
222\end{enumerate}
223
224
225\section{{\tt ip link} --- network device configuration}
226\label{IP-LINK}
227
228\paragraph{Object:} A \verb|link| is a network device and the corresponding
229commands display and change the state of devices.
230
231\paragraph{Commands:} \verb|set| and \verb|show| (or \verb|list|).
232
233\subsection{{\tt ip link set} --- change device attributes}
234
235\paragraph{Abbreviations:} \verb|set|, \verb|s|.
236
237\paragraph{Arguments:}
238
239\begin{itemize}
240\item \verb|dev NAME| (default)
241
242--- \verb|NAME| specifies the network device on which to operate.
243
244\item \verb|up| and \verb|down|
245
246--- change the state of the device to \verb|UP| or \verb|DOWN|.
247
248\item \verb|arp on| or \verb|arp off|
249
250--- change the \verb|NOARP| flag on the device.
251
252\begin{NB}
253This operation is {\em not allowed\/} if the device is in state \verb|UP|.
254Though neither the \verb|ip| utility nor the kernel check for this condition.
255You can get unpredictable results changing this flag while the
256device is running.
257\end{NB}
258
259\item \verb|multicast on| or \verb|multicast off|
260
261--- change the \verb|MULTICAST| flag on the device.
262
263\item \verb|dynamic on| or \verb|dynamic off|
264
265--- change the \verb|DYNAMIC| flag on the device.
266
267\item \verb|name NAME|
268
269--- change the name of the device. This operation is not
270recommended if the device is running or has some addresses
271already configured.
272
273\item \verb|txqueuelen NUMBER| or \verb|txqlen NUMBER|
274
275--- change the transmit queue length of the device.
276
277\item \verb|mtu NUMBER|
278
279--- change the MTU of the device.
280
281\item \verb|address LLADDRESS|
282
283--- change the station address of the interface.
284
285\item \verb|broadcast LLADDRESS|, \verb|brd LLADDRESS| or \verb|peer LLADDRESS|
286
287--- change the link layer broadcast address or the peer address when
288the interface is \verb|POINTOPOINT|.
289
290\vskip 1mm
291\begin{NB}
292For most devices (f.e.\ for Ethernet) changing the link layer
293broadcast address will break networking.
294Do not use it, if you do not understand what this operation really does.
295\end{NB}
296
297\item \verb|netns PID|
298
299--- move the device to the network namespace associated with the process PID.
300
301\end{itemize}
302
303\vskip 1mm
304\begin{NB}
305The \verb|PROMISC| and \verb|ALLMULTI| flags are considered
306obsolete and should not be changed administratively, though
307the {\tt ip} utility will allow that.
308\end{NB}
309
310\paragraph{Warning:} If multiple parameter changes are requested,
311\verb|ip| aborts immediately after any of the changes have failed.
312This is the only case when \verb|ip| can move the system to
313an unpredictable state. The solution is to avoid changing
314several parameters with one {\tt ip link set} call.
315
316\paragraph{Examples:}
317\begin{itemize}
318\item \verb|ip link set dummy address 00:00:00:00:00:01|
319
320--- change the station address of the interface \verb|dummy|.
321
322\item \verb|ip link set dummy up|
323
324--- start the interface \verb|dummy|.
325
326\end{itemize}
327
328
329\subsection{{\tt ip link show} --- display device attributes}
330\label{IP-LINK-SHOW}
331
332\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|lst|, \verb|sh|, \verb|ls|,
333\verb|l|.
334
335\paragraph{Arguments:}
336\begin{itemize}
337\item \verb|dev NAME| (default)
338
339--- \verb|NAME| specifies the network device to show.
340If this argument is omitted all devices are listed.
341
342\item \verb|up|
343
344--- only display running interfaces.
345
346\end{itemize}
347
348
349\paragraph{Output format:}
350
351\begin{verbatim}
352kuznet@alisa:~ $ ip link ls eth0
3533: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
354    link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
355kuznet@alisa:~ $ ip link ls sit0
3565: sit0@NONE: <NOARP,UP> mtu 1480 qdisc noqueue
357    link/sit 0.0.0.0 brd 0.0.0.0
358kuznet@alisa:~ $ ip link ls dummy
3592: dummy: <BROADCAST,NOARP> mtu 1500 qdisc noop
360    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
361kuznet@alisa:~ $ 
362\end{verbatim}
363
364
365The number before each colon is an {\em interface index\/} or {\em ifindex\/}.
366This number uniquely identifies the interface. This is followed by the {\em interface name\/}
367(\verb|eth0|, \verb|sit0| etc.). The interface name is also
368unique at every given moment. However, the interface may disappear from the
369list (f.e.\ when the corresponding driver module is unloaded) and another
370one with the same name may be created later. Besides that,
371the administrator may change the name of any device with
372\verb|ip| \verb|link| \verb|set| \verb|name|
373to make it more intelligible.
374
375The interface name may have another name or \verb|NONE| appended 
376after the \verb|@| sign. This means that this device is bound to some other
377device,
378i.e.\ packets send through it are encapsulated and sent via the ``master''
379device. If the name is \verb|NONE|, the master is unknown.
380
381Then we see the interface {\em mtu\/} (``maximal transfer unit''). This determines
382the maximal size of data which can be sent as a single packet over this interface.
383
384{\em qdisc\/} (``queuing discipline'') shows the queuing algorithm used
385on the interface. Particularly, \verb|noqueue| means that this interface
386does not queue anything and \verb|noop| means that the interface is in blackhole
387mode i.e.\ all packets sent to it are immediately discarded.
388{\em qlen\/} is the default transmit queue length of the device measured
389in packets.
390
391The interface flags are summarized in the angle brackets.
392
393\begin{itemize}
394\item \verb|UP| --- the device is turned on. It is ready to accept
395packets for transmission and it may inject into the kernel packets received
396from other nodes on the network.
397
398\item \verb|LOOPBACK| --- the interface does not communicate with other
399hosts. All packets sent through it will be returned
400and nothing but bounced packets can be received.
401
402\item \verb|BROADCAST| --- the device has the facility to send packets
403to all hosts sharing the same link. A typical example is an Ethernet link.
404
405\item \verb|POINTOPOINT| --- the link has only two ends with one node
406attached to each end. All packets sent to this link will reach the peer
407and all packets received by us came from this single peer.
408
409If neither \verb|LOOPBACK| nor \verb|BROADCAST| nor \verb|POINTOPOINT|
410are set, the interface is assumed to be NMBA (Non-Broadcast Multi-Access).
411This is the most generic type of device and the most complicated one, because
412the host attached to a NBMA link has no means to send to anyone
413without additionally configured information.
414
415\item \verb|MULTICAST| --- is an advisory flag indicating that the interface
416is aware of multicasting i.e.\ sending packets to some subset of neighbouring
417nodes. Broadcasting is a particular case of multicasting, where the multicast
418group consists of all nodes on the link. It is important to emphasize
419that software {\em must not\/} interpret the absence of this flag as the inability
420to use multicasting on this interface. Any \verb|POINTOPOINT| and
421\verb|BROADCAST| link is multicasting by definition, because we have
422direct access to all the neighbours and, hence, to any part of them.
423Certainly, the use of high bandwidth multicast transfers is not recommended
424on broadcast-only links because of high expense, but it is not strictly
425prohibited.
426
427\item \verb|PROMISC| --- the device listens to and feeds to the kernel all
428traffic on the link even if it is not destined for us, not broadcasted
429and not destined for a multicast group of which we are member. Usually
430this mode exists only on broadcast links and is used by bridges and for network
431monitoring.
432
433\item \verb|ALLMULTI| --- the device receives all multicast packets
434wandering on the link. This mode is used by multicast routers.
435
436\item \verb|NOARP| --- this flag is different from the other ones. It has
437no invariant value and its interpretation depends on the network protocols
438involved. As a rule, it indicates that the device needs no address
439resolution and that the software or hardware knows how to deliver packets
440without any help from the protocol stacks.
441
442\item \verb|DYNAMIC| --- is an advisory flag indicating that the interface is
443dynamically created and destroyed.
444
445\item \verb|SLAVE| --- this interface is bonded to some other interfaces
446to share link capacities.
447
448\end{itemize}
449
450\vskip 1mm
451\begin{NB}
452There are other flags but they are either obsolete (\verb|NOTRAILERS|)
453or not implemented (\verb|DEBUG|) or specific to some devices
454(\verb|MASTER|, \verb|AUTOMEDIA| and \verb|PORTSEL|). We do not discuss
455them here.
456\end{NB}
457
458
459The second line contains information on the link layer addresses
460associated with the device. The first word (\verb|ether|, \verb|sit|)
461defines the interface hardware type. This type determines the format and semantics
462of the addresses and is logically part of the address.
463The default format of the station address and the broadcast address
464(or the peer address for pointopoint links) is a
465sequence of hexadecimal bytes separated by colons, but some link
466types may have their natural address format, f.e.\ addresses
467of tunnels over IP are printed as dotted-quad IP addresses.
468
469\vskip 1mm
470\begin{NB}
471  NBMA links have no well-defined broadcast or peer address,
472  however this field may contain useful information, f.e.\
473  about the address of broadcast relay or about the address of the ARP server.
474\end{NB}
475\begin{NB}
476Multicast addresses are not shown by this command, see
477\verb|ip maddr ls| in~Sec.\ref{IP-MADDR} (p.\pageref{IP-MADDR} of this
478document).
479\end{NB}
480
481
482\paragraph{Statistics:} With the \verb|-statistics| option, \verb|ip| also
483prints interface statistics:
484
485\begin{verbatim}
486kuznet@alisa:~ $ ip -s link ls eth0
4873: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
488    link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
489    RX: bytes  packets  errors  dropped overrun mcast   
490    2449949362 2786187  0       0       0       0      
491    TX: bytes  packets  errors  dropped carrier collsns 
492    178558497  1783945  332     0       332     35172  
493kuznet@alisa:~ $
494\end{verbatim}
495\verb|RX:| and \verb|TX:| lines summarize receiver and transmitter
496statistics. They contain:
497\begin{itemize}
498\item \verb|bytes| --- the total number of bytes received or transmitted
499on the interface. This number wraps when the maximal length of the data type
500natural for the architecture is exceeded, so continuous monitoring requires
501a user level daemon snapping it periodically.
502\item \verb|packets| --- the total number of packets received or transmitted
503on the interface.
504\item \verb|errors| --- the total number of receiver or transmitter errors.
505\item \verb|dropped| --- the total number of packets dropped due to lack
506of resources.
507\item \verb|overrun| --- the total number of receiver overruns resulting
508in dropped packets. As a rule, if the interface is overrun, it means
509serious problems in the kernel or that your machine is too slow
510for this interface.
511\item \verb|mcast| --- the total number of received multicast packets. This option
512is only supported by a few devices.
513\item \verb|carrier| --- total number of link media failures f.e.\ because
514of lost carrier.
515\item \verb|collsns| --- the total number of collision events
516on Ethernet-like media. This number may have a different sense on other
517link types.
518\item \verb|compressed| --- the total number of compressed packets. This is
519available only for links using VJ header compression.
520\end{itemize}
521
522
523If the \verb|-s| option is entered twice or more,
524\verb|ip| prints more detailed statistics on receiver
525and transmitter errors.
526
527\begin{verbatim}
528kuznet@alisa:~ $ ip -s -s link ls eth0
5293: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
530    link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
531    RX: bytes  packets  errors  dropped overrun mcast   
532    2449949362 2786187  0       0       0       0      
533    RX errors: length   crc     frame   fifo    missed
534               0        0       0       0       0      
535    TX: bytes  packets  errors  dropped carrier collsns 
536    178558497  1783945  332     0       332     35172  
537    TX errors: aborted  fifo    window  heartbeat
538               0        0       0       332    
539kuznet@alisa:~ $
540\end{verbatim}
541These error names are pure Ethernetisms. Other devices
542may have non zero values in these fields but they may be
543interpreted differently.
544
545
546\section{{\tt ip address} --- protocol address management}
547
548\paragraph{Abbreviations:} \verb|address|, \verb|addr|, \verb|a|.
549
550\paragraph{Object:} The \verb|address| is a protocol (IP or IPv6) address attached
551to a network device. Each device must have at least one address
552to use the corresponding protocol. It is possible to have several
553different addresses attached to one device. These addresses are not
554discriminated, so that the term {\em alias\/} is not quite appropriate
555for them and we do not use it in this document.
556
557The \verb|ip addr| command displays addresses and their properties,
558adds new addresses and deletes old ones.
559
560\paragraph{Commands:} \verb|add|, \verb|delete|, \verb|flush| and \verb|show|
561(or \verb|list|).
562
563
564\subsection{{\tt ip address add} --- add a new protocol address}
565\label{IP-ADDR-ADD}
566
567\paragraph{Abbreviations:} \verb|add|, \verb|a|.
568
569\paragraph{Arguments:}
570
571\begin{itemize}
572\item \verb|dev NAME|
573
574\noindent--- the name of the device to add the address to.
575
576\item \verb|local ADDRESS| (default)
577
578--- the address of the interface. The format of the address depends
579on the protocol. It is a dotted quad for IP and a sequence of hexadecimal halfwords
580separated by colons for IPv6. The \verb|ADDRESS| may be followed by
581a slash and a decimal number which encodes the network prefix length.
582
583
584\item \verb|peer ADDRESS|
585
586--- the address of the remote endpoint for pointopoint interfaces.
587Again, the \verb|ADDRESS| may be followed by a slash and a decimal number,
588encoding the network prefix length. If a peer address is specified,
589the local address {\em cannot\/} have a prefix length. The network prefix is associated
590with the peer rather than with the local address.
591
592
593\item \verb|broadcast ADDRESS|
594
595--- the broadcast address on the interface.
596
597It is possible to use the special symbols \verb|'+'| and \verb|'-'|
598instead of the broadcast address. In this case, the broadcast address
599is derived by setting/resetting the host bits of the interface prefix.
600
601\vskip 1mm
602\begin{NB}
603Unlike \verb|ifconfig|, the \verb|ip| utility {\em does not\/} set any broadcast
604address unless explicitly requested.
605\end{NB}
606
607
608\item \verb|label NAME|
609
610--- Each address may be tagged with a label string.
611In order to preserve compatibility with Linux-2.0 net aliases,
612this string must coincide with the name of the device or must be prefixed
613with the device name followed by colon.
614
615
616\item \verb|scope SCOPE_VALUE|
617
618--- the scope of the area where this address is valid.
619The available scopes are listed in file \verb|/etc/iproute2/rt_scopes|.
620Predefined scope values are:
621
622 \begin{itemize}
623	\item \verb|global| --- the address is globally valid.
624	\item \verb|site| --- (IPv6 only) the address is site local,
625	i.e.\ it is valid inside this site.
626	\item \verb|link| --- the address is link local, i.e.\ 
627	it is valid only on this device.
628	\item \verb|host| --- the address is valid only inside this host.
629 \end{itemize}
630
631Appendix~\ref{ADDR-SEL} (p.\pageref{ADDR-SEL} of this document)
632contains more details on address scopes.
633
634\end{itemize}
635
636\paragraph{Examples:}
637\begin{itemize}
638\item \verb|ip addr add 127.0.0.1/8 dev lo brd + scope host|
639
640--- add the usual loopback address to the loopback device.
641
642\item \verb|ip addr add 10.0.0.1/24 brd + dev eth0 label eth0:Alias|
643
644--- add the address 10.0.0.1 with prefix length 24 (i.e.\ netmask
645\verb|255.255.255.0|), standard broadcast and label \verb|eth0:Alias|
646to the interface \verb|eth0|.
647\end{itemize}
648
649
650\subsection{{\tt ip address delete} --- delete a protocol address}
651
652\paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|.
653
654\paragraph{Arguments:} coincide with the arguments of \verb|ip addr add|.
655The device name is a required argument. The rest are optional.
656If no arguments are given, the first address is deleted.
657
658\paragraph{Examples:}
659\begin{itemize}
660\item \verb|ip addr del 127.0.0.1/8 dev lo|
661
662--- deletes the loopback address from the loopback device.
663It would be best not to repeat this experiment.
664
665\item Disable IP on the interface \verb|eth0|:
666\begin{verbatim}
667  while ip -f inet addr del dev eth0; do
668    : nothing
669  done
670\end{verbatim}
671Another method to disable IP on an interface using {\tt ip addr flush}
672may be found in sec.\ref{IP-ADDR-FLUSH}, p.\pageref{IP-ADDR-FLUSH}.
673
674\end{itemize}
675
676
677\subsection{{\tt ip address show} --- display protocol addresses}
678
679\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|lst|, \verb|sh|, \verb|ls|,
680\verb|l|.
681
682\paragraph{Arguments:}
683
684\begin{itemize}
685\item \verb|dev NAME| (default)
686
687--- the name of the device.
688
689\item \verb|scope SCOPE_VAL|
690
691--- only list addresses with this scope.
692
693\item \verb|to PREFIX|
694
695--- only list addresses matching this prefix.
696
697\item \verb|label PATTERN|
698
699--- only list addresses with labels matching the \verb|PATTERN|.
700\verb|PATTERN| is a usual shell style pattern.
701
702
703\item \verb|dynamic| and \verb|permanent|
704
705--- (IPv6 only) only list addresses installed due to stateless
706address configuration or only list permanent (not dynamic) addresses.
707
708\item \verb|tentative|
709
710--- (IPv6 only) only list addresses which did not pass duplicate
711address detection.
712
713\item \verb|deprecated|
714
715--- (IPv6 only) only list deprecated addresses.
716
717
718\item  \verb|primary| and \verb|secondary|
719
720--- only list primary (or secondary) addresses.
721
722\end{itemize}
723
724
725\paragraph{Output format:}
726
727\begin{verbatim}
728kuznet@alisa:~ $ ip addr ls eth0
7293: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
730    link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
731    inet 193.233.7.90/24 brd 193.233.7.255 scope global eth0
732    inet6 3ffe:2400:0:1:2a0:ccff:fe66:1878/64 scope global dynamic 
733       valid_lft forever preferred_lft 604746sec
734    inet6 fe80::2a0:ccff:fe66:1878/10 scope link 
735kuznet@alisa:~ $ 
736\end{verbatim}
737
738The first two lines coincide with the output of \verb|ip link ls|.
739It is natural to interpret link layer addresses
740as addresses of the protocol family \verb|AF_PACKET|.
741
742Then the list of IP and IPv6 addresses follows, accompanied by
743additional address attributes: scope value (see Sec.\ref{IP-ADDR-ADD},
744p.\pageref{IP-ADDR-ADD} above), flags and the address label.
745
746Address flags are set by the kernel and cannot be changed
747administratively. Currently, the following flags are defined:
748
749\begin{enumerate}
750\item \verb|secondary|
751
752--- the address is not used when selecting the default source address
753of outgoing packets (Cf.\ Appendix~\ref{ADDR-SEL}, p.\pageref{ADDR-SEL}.).
754An IP address becomes secondary if another address with the same
755prefix bits already exists. The first address is primary.
756It is the leader of the group of all secondary addresses. When the leader
757is deleted, all secondaries are purged too.
758There is a tweak in \verb|/proc/sys/net/ipv4/conf/<dev>/promote_secondaries|
759which activate secondaries promotion when a primary is deleted.
760To permanently enable this feature on all devices add
761\verb|net.ipv4.conf.all.promote_secondaries=1| to \verb|/etc/sysctl.conf|.
762This tweak is available in linux 2.6.15 and later.
763
764
765\item \verb|dynamic|
766
767--- the address was created due to stateless autoconfiguration~\cite{RFC-ADDRCONF}.
768In this case the output also contains information on times, when
769the address is still valid. After \verb|preferred_lft| expires the address is
770moved to the deprecated state. After \verb|valid_lft| expires the address
771is finally invalidated.
772
773\item \verb|deprecated|
774
775--- the address is deprecated, i.e.\ it is still valid, but cannot
776be used by newly created connections.
777
778\item \verb|tentative|
779
780--- the address is not used because duplicate address detection~\cite{RFC-ADDRCONF}
781is still not complete or failed.
782
783\end{enumerate}
784
785
786\subsection{{\tt ip address flush} --- flush protocol addresses}
787\label{IP-ADDR-FLUSH}
788
789\paragraph{Abbreviations:} \verb|flush|, \verb|f|.
790
791\paragraph{Description:}This command flushes the protocol addresses
792selected by some criteria.
793
794\paragraph{Arguments:} This command has the same arguments as \verb|show|.
795The difference is that it does not run when no arguments are given.
796
797\paragraph{Warning:} This command (and other \verb|flush| commands
798described below) is pretty dangerous. If you make a mistake, it will
799not forgive it, but will cruelly purge all the addresses.
800
801\paragraph{Statistics:} With the \verb|-statistics| option, the command
802becomes verbose. It prints out the number of deleted addresses and the number
803of rounds made to flush the address list. If this option is given
804twice, \verb|ip addr flush| also dumps all the deleted addresses
805in the format described in the previous subsection.
806
807\paragraph{Example:} Delete all the addresses from the private network
80810.0.0.0/8:
809\begin{verbatim}
810netadm@amber:~ # ip -s -s a f to 10/8
8112: dummy    inet 10.7.7.7/16 brd 10.7.255.255 scope global dummy
8123: eth0    inet 10.10.7.7/16 brd 10.10.255.255 scope global eth0
8134: eth1    inet 10.8.7.7/16 brd 10.8.255.255 scope global eth1
814
815*** Round 1, deleting 3 addresses ***
816*** Flush is complete after 1 round ***
817netadm@amber:~ # 
818\end{verbatim}
819Another instructive example is disabling IP on all the Ethernets:
820\begin{verbatim}
821netadm@amber:~ # ip -4 addr flush label "eth*"
822\end{verbatim}
823And the last example shows how to flush all the IPv6 addresses
824acquired by the host from stateless address autoconfiguration
825after you enabled forwarding or disabled autoconfiguration.
826\begin{verbatim}
827netadm@amber:~ # ip -6 addr flush dynamic
828\end{verbatim}
829
830
831
832\section{{\tt ip neighbour} --- neighbour/arp tables management}
833
834\paragraph{Abbreviations:} \verb|neighbour|, \verb|neighbor|, \verb|neigh|,
835\verb|n|.
836
837\paragraph{Object:} \verb|neighbour| objects establish bindings between protocol
838addresses and link layer addresses for hosts sharing the same link.
839Neighbour entries are organized into tables. The IPv4 neighbour table
840is known by another name --- the ARP table.
841
842The corresponding commands display neighbour bindings
843and their properties, add new neighbour entries and delete old ones.
844
845\paragraph{Commands:} \verb|add|, \verb|change|, \verb|replace|,
846\verb|delete|, \verb|flush| and \verb|show| (or \verb|list|).
847
848\paragraph{See also:} Appendix~\ref{PROXY-NEIGH}, p.\pageref{PROXY-NEIGH}
849describes how to manage proxy ARP/NDISC with the \verb|ip| utility.
850
851
852\subsection{{\tt ip neighbour add} --- add a new neighbour entry\\
853	{\tt ip neighbour change} --- change an existing entry\\
854	{\tt ip neighbour replace} --- add a new entry or change an existing one}
855
856\paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|;
857\verb|replace|,	\verb|repl|.
858
859\paragraph{Description:} These commands create new neighbour records
860or update existing ones.
861
862\paragraph{Arguments:}
863
864\begin{itemize}
865\item \verb|to ADDRESS| (default)
866
867--- the protocol address of the neighbour. It is either an IPv4 or IPv6 address.
868
869\item \verb|dev NAME|
870
871--- the interface to which this neighbour is attached.
872
873
874\item \verb|lladdr LLADDRESS|
875
876--- the link layer address of the neighbour. \verb|LLADDRESS| can also be
877\verb|null|. 
878
879\item \verb|nud NUD_STATE|
880
881--- the state of the neighbour entry. \verb|nud| is an abbreviation for ``Neighbour
882Unreachability Detection''. The state can take one of the following values:
883
884\begin{enumerate}
885\item \verb|permanent| --- the neighbour entry is valid forever and can be only be removed
886administratively.
887\item \verb|noarp| --- the neighbour entry is valid. No attempts to validate
888this entry will be made but it can be removed when its lifetime expires.
889\item \verb|reachable| --- the neighbour entry is valid until the reachability
890timeout expires.
891\item \verb|stale| --- the neighbour entry is valid but suspicious.
892This option to \verb|ip neigh| does not change the neighbour state if
893it was valid and the address is not changed by this command.
894\end{enumerate}
895
896\end{itemize}
897
898\paragraph{Examples:}
899\begin{itemize}
900\item \verb|ip neigh add 10.0.0.3 lladdr 0:0:0:0:0:1 dev eth0 nud perm|
901
902--- add a permanent ARP entry for the neighbour 10.0.0.3 on the device \verb|eth0|.
903
904\item \verb|ip neigh chg 10.0.0.3 dev eth0 nud reachable|
905
906--- change its state to \verb|reachable|.
907\end{itemize}
908
909
910\subsection{{\tt ip neighbour delete} --- delete a neighbour entry}
911
912\paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|.
913
914\paragraph{Description:} This command invalidates a neighbour entry.
915
916\paragraph{Arguments:} The arguments are the same as with \verb|ip neigh add|,
917except that \verb|lladdr| and \verb|nud| are ignored.
918
919
920\paragraph{Example:}
921\begin{itemize}
922\item \verb|ip neigh del 10.0.0.3 dev eth0|
923
924--- invalidate an ARP entry for the neighbour 10.0.0.3 on the device \verb|eth0|.
925
926\end{itemize}
927
928\begin{NB}
929 The deleted neighbour entry will not disappear from the tables
930 immediately. If it is in use it cannot be deleted until the last
931 client releases it. Otherwise it will be destroyed during
932 the next garbage collection.
933\end{NB}
934
935
936\paragraph{Warning:} Attempts to delete or manually change
937a \verb|noarp| entry created by the kernel may result in unpredictable behaviour.
938Particularly, the kernel may try to resolve this address even
939on a \verb|NOARP| interface or if the address is multicast or broadcast.
940
941
942\subsection{{\tt ip neighbour show} --- list neighbour entries}
943
944\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|.
945
946\paragraph{Description:}This commands displays neighbour tables.
947
948\paragraph{Arguments:}
949
950\begin{itemize}
951
952\item \verb|to ADDRESS| (default)
953
954--- the prefix selecting the neighbours to list.
955
956\item \verb|dev NAME|
957
958--- only list the neighbours attached to this device.
959
960\item \verb|unused|
961
962--- only list neighbours which are not currently in use.
963
964\item \verb|nud NUD_STATE|
965
966--- only list neighbour entries in this state. \verb|NUD_STATE| takes
967values listed below or the special value \verb|all| which means all states.
968This option may occur more than once. If this option is absent, \verb|ip|
969lists all entries except for \verb|none| and \verb|noarp|.
970
971\end{itemize}
972
973
974\paragraph{Output format:}
975
976\begin{verbatim}
977kuznet@alisa:~ $ ip neigh ls
978:: dev lo lladdr 00:00:00:00:00:00 nud noarp
979fe80::200:cff:fe76:3f85 dev eth0 lladdr 00:00:0c:76:3f:85 router \
980    nud stale
9810.0.0.0 dev lo lladdr 00:00:00:00:00:00 nud noarp
982193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 nud reachable
983193.233.7.85 dev eth0 lladdr 00:e0:1e:63:39:00 nud stale
984kuznet@alisa:~ $ 
985\end{verbatim}
986
987The first word of each line is the protocol address of the neighbour.
988Then the device name follows. The rest of the line describes the contents of
989the neighbour entry identified by the pair (device, address).
990
991\verb|lladdr| is the link layer address of the neighbour.
992
993\verb|nud| is the state of the ``neighbour unreachability detection'' machine
994for this entry. The detailed description of the neighbour
995state machine can be found in~\cite{RFC-NDISC}. Here is the full list
996of the states with short descriptions:
997
998\begin{enumerate}
999\item\verb|none| --- the state of the neighbour is void.
1000\item\verb|incomplete| --- the neighbour is in the process of resolution.
1001\item\verb|reachable| --- the neighbour is valid and apparently reachable.
1002\item\verb|stale| --- the neighbour is valid, but is probably already
1003unreachable, so the kernel will try to check it at the first transmission.
1004\item\verb|delay| --- a packet has been sent to the stale neighbour and the kernel is waiting
1005for confirmation.
1006\item\verb|probe| --- the delay timer expired but no confirmation was received.
1007The kernel has started to probe the neighbour with ARP/NDISC messages.
1008\item\verb|failed| --- resolution has failed.
1009\item\verb|noarp| --- the neighbour is valid. No attempts to check the entry
1010will be made.
1011\item\verb|permanent| --- it is a \verb|noarp| entry, but only the administrator
1012may remove the entry from the neighbour table.
1013\end{enumerate}
1014
1015The link layer address is valid in all states except for \verb|none|,
1016\verb|failed| and \verb|incomplete|.
1017
1018IPv6 neighbours can be marked with the additional flag \verb|router|
1019which means that the neighbour introduced itself as an IPv6 router~\cite{RFC-NDISC}.
1020
1021\paragraph{Statistics:} The \verb|-statistics| option displays some usage
1022statistics, f.e.\
1023
1024\begin{verbatim}
1025kuznet@alisa:~ $ ip -s n ls 193.233.7.254
1026193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 ref 5 used 12/13/20 \
1027    nud reachable
1028kuznet@alisa:~ $ 
1029\end{verbatim}
1030
1031Here \verb|ref| is the number of users of this entry
1032and \verb|used| is a triplet of time intervals in seconds
1033separated by slashes. In this case they show that:
1034
1035\begin{enumerate}
1036\item the entry was used 12 seconds ago.
1037\item the entry was confirmed 13 seconds ago.
1038\item the entry was updated 20 seconds ago.
1039\end{enumerate}
1040
1041\subsection{{\tt ip neighbour flush} --- flush neighbour entries}
1042
1043\paragraph{Abbreviations:} \verb|flush|, \verb|f|.
1044
1045\paragraph{Description:}This command flushes neighbour tables, selecting
1046entries to flush by some criteria.
1047
1048\paragraph{Arguments:} This command has the same arguments as \verb|show|.
1049The differences are that it does not run when no arguments are given,
1050and that the default neighbour states to be flushed do not include
1051\verb|permanent| and \verb|noarp|.
1052
1053
1054\paragraph{Statistics:} With the \verb|-statistics| option, the command
1055becomes verbose. It prints out the number of deleted neighbours and the number
1056of rounds made to flush the neighbour table. If the option is given
1057twice, \verb|ip neigh flush| also dumps all the deleted neighbours
1058in the format described in the previous subsection.
1059
1060\paragraph{Example:}
1061\begin{verbatim}
1062netadm@alisa:~ # ip -s -s n f 193.233.7.254
1063193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 ref 5 used 12/13/20 \
1064    nud reachable
1065
1066*** Round 1, deleting 1 entries ***
1067*** Flush is complete after 1 round ***
1068netadm@alisa:~ # 
1069\end{verbatim}
1070
1071
1072\section{{\tt ip route} --- routing table management}
1073\label{IP-ROUTE}
1074
1075\paragraph{Abbreviations:} \verb|route|, \verb|ro|, \verb|r|.
1076
1077\paragraph{Object:} \verb|route| entries in the kernel routing tables keep
1078information about paths to other networked nodes.
1079
1080Each route entry has a {\em key\/} consisting of a {\em prefix\/}
1081(i.e.\ a pair containing a network address and the length of its mask) and,
1082optionally, the TOS value. An IP packet matches the route if the highest
1083bits of its destination address are equal to the route prefix at least
1084up to the prefix length and if the TOS of the route is zero or equal to
1085the TOS of the packet.
1086 
1087If several routes match the packet, the following pruning rules
1088are used to select the best one (see~\cite{RFC1812}):
1089\begin{enumerate}
1090\item The longest matching prefix is selected. All shorter ones
1091are dropped.
1092
1093\item If the TOS of some route with the longest prefix is equal to the TOS
1094of the packet, the routes with different TOS are dropped.
1095
1096If no exact TOS match was found and routes with TOS=0 exist,
1097the rest of routes are pruned.
1098
1099Otherwise, the route lookup fails.
1100
1101\item If several routes remain after the previous steps, then
1102the routes with the best preference values are selected.
1103
1104\item If we still have several routes, then the {\em first\/} of them
1105is selected.
1106
1107\begin{NB}
1108 Note the ambiguity of the last step. Unfortunately, Linux
1109 historically allows such a bizarre situation. The sense of the
1110word ``first'' depends on the order of route additions and it is practically
1111impossible to maintain a bundle of such routes in this order.
1112\end{NB}
1113
1114For simplicity we will limit ourselves to the case where such a situation
1115is impossible and routes are uniquely identified by the triplet
1116\{prefix, tos, preference\}. Actually, it is impossible to create
1117non-unique routes with \verb|ip| commands described in this section.
1118
1119One useful exception to this rule is the default route on non-forwarding
1120hosts. It is ``officially'' allowed to have several fallback routes
1121when several routers are present on directly connected networks.
1122In this case, Linux-2.2 makes ``dead gateway detection''~\cite{RFC1122}
1123controlled by neighbour unreachability detection and by advice
1124from transport protocols to select a working router, so the order
1125of the routes is not essential. However, in this case,
1126fiddling with default routes manually is not recommended. Use the Router Discovery
1127protocol (see Appendix~\ref{EXAMPLE-SETUP}, p.\pageref{EXAMPLE-SETUP})
1128instead. Actually, Linux-2.2 IPv6 does not give user level applications
1129any access to default routes.
1130\end{enumerate}
1131
1132Certainly, the steps above are not performed exactly
1133in this sequence. Instead, the routing table in the kernel is kept
1134in some data structure to achieve the final result
1135with minimal cost. However, not depending on a particular
1136routing algorithm implemented in the kernel, we can summarize
1137the statements above as: a route is identified by the triplet
1138\{prefix, tos, preference\}. This {\em key\/} lets us locate
1139the route in the routing table.
1140
1141\paragraph{Route attributes:} Each route key refers to a routing
1142information record containing
1143the data required to deliver IP packets (f.e.\ output device and
1144next hop router) and some optional attributes (f.e. the path MTU or
1145the preferred source address when communicating with this destination).
1146These attributes are described in the following subsection.
1147
1148\paragraph{Route types:} \label{IP-ROUTE-TYPES}
1149It is important that the set
1150of required and optional attributes depend on the route {\em type\/}.
1151The most important route type
1152is \verb|unicast|. It describes real paths to other hosts.
1153As a rule, common routing tables contain only such routes. However,
1154there are other types of routes with different semantics. The
1155full list of types understood by Linux-2.2 is:
1156\begin{itemize}
1157\item \verb|unicast| --- the route entry describes real paths to the
1158destinations covered by the route prefix.
1159\item \verb|unreachable| --- these destinations are unreachable. Packets
1160are discarded and the ICMP message {\em host unreachable\/} is generated.
1161The local senders get an \verb|EHOSTUNREACH| error.
1162\item \verb|blackhole| --- these destinations are unreachable. Packets
1163are discarded silently. The local senders get an \verb|EINVAL| error.
1164\item \verb|prohibit| --- these destinations are unreachable. Packets
1165are discarded and the ICMP message {\em communication administratively
1166prohibited\/} is generated. The local senders get an \verb|EACCES| error.
1167\item \verb|local| --- the destinations are assigned to this
1168host. The packets are looped back and delivered locally.
1169\item \verb|broadcast| --- the destinations are broadcast addresses.
1170The packets are sent as link broadcasts.
1171\item \verb|throw| --- a special control route used together with policy
1172rules (see sec.\ref{IP-RULE}, p.\pageref{IP-RULE}). If such a route is selected, lookup
1173in this table is terminated pretending that no route was found.
1174Without policy routing it is equivalent to the absence of the route in the routing
1175table. The packets are dropped and the ICMP message {\em net unreachable\/}
1176is generated. The local senders get an \verb|ENETUNREACH| error.
1177\item \verb|nat| --- a special NAT route. Destinations covered by the prefix
1178are considered to be dummy (or external) addresses which require translation
1179to real (or internal) ones before forwarding. The addresses to translate to
1180are selected with the attribute \verb|via|. More about NAT is
1181in Appendix~\ref{ROUTE-NAT}, p.\pageref{ROUTE-NAT}.
1182\item \verb|anycast| --- ({\em not implemented\/}) the destinations are
1183{\em anycast\/} addresses assigned to this host. They are mainly equivalent
1184to \verb|local| with one difference: such addresses are invalid when used
1185as the source address of any packet.
1186\item \verb|multicast| --- a special type used for multicast routing.
1187It is not present in normal routing tables.
1188\end{itemize}
1189
1190\paragraph{Route tables:} Linux-2.2 can pack routes into several routing
1191tables identified by a number in the range from 1 to 255 or by
1192name from the file \verb|/etc/iproute2/rt_tables|. By default all normal
1193routes are inserted into the \verb|main| table (ID 254) and the kernel only uses
1194this table when calculating routes.
1195
1196Actually, one other table always exists, which is invisible but
1197even more important. It is the \verb|local| table (ID 255). This table
1198consists of routes for local and broadcast addresses. The kernel maintains
1199this table automatically and the administrator usually need not modify it
1200or even look at it.
1201
1202The multiple routing tables enter the game when {\em policy routing\/}
1203is used. See sec.\ref{IP-RULE}, p.\pageref{IP-RULE}.
1204In this case, the table identifier effectively becomes
1205one more parameter, which should be added to the triplet
1206\{prefix, tos, preference\} to uniquely identify the route.
1207
1208
1209\subsection{{\tt ip route add} --- add a new route\\
1210	{\tt ip route change} --- change a route\\
1211	{\tt ip route replace} --- change a route or add a new one}
1212\label{IP-ROUTE-ADD}
1213
1214\paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|;
1215	\verb|replace|, \verb|repl|.
1216
1217
1218\paragraph{Arguments:}
1219\begin{itemize}
1220\item \verb|to PREFIX| or \verb|to TYPE PREFIX| (default)
1221
1222--- the destination prefix of the route. If \verb|TYPE| is omitted,
1223\verb|ip| assumes type \verb|unicast|. Other values of \verb|TYPE|
1224are listed above. \verb|PREFIX| is an IP or IPv6 address optionally followed
1225by a slash and the prefix length. If the length of the prefix is missing,
1226\verb|ip| assumes a full-length host route. There is also a special
1227\verb|PREFIX| --- \verb|default| --- which is equivalent to IP \verb|0/0| or
1228to IPv6 \verb|::/0|.
1229
1230\item \verb|tos TOS| or \verb|dsfield TOS|
1231
1232--- the Type Of Service (TOS) key. This key has no associated mask and
1233the longest match is understood as: First, compare the TOS
1234of the route and of the packet. If they are not equal, then the packet
1235may still match a route with a zero TOS. \verb|TOS| is either an 8 bit hexadecimal
1236number or an identifier from {\tt /etc/iproute2/rt\_dsfield}.
1237
1238
1239\item \verb|metric NUMBER| or \verb|preference NUMBER|
1240
1241--- the preference value of the route. \verb|NUMBER| is an arbitrary 32bit number.
1242
1243\item \verb|table TABLEID|
1244
1245--- the table to add this route to.
1246\verb|TABLEID| may be a number or a string from the file
1247\verb|/etc/iproute2/rt_tables|. If this parameter is omitted,
1248\verb|ip| assumes the \verb|main| table, with the exception of
1249\verb|local|, \verb|broadcast| and \verb|nat| routes, which are
1250put into the \verb|local| table by default.
1251
1252\item \verb|dev NAME|
1253
1254--- the output device name.
1255
1256\item \verb|via ADDRESS|
1257
1258--- the address of the nexthop router. Actually, the sense of this field depends
1259on the route type. For normal \verb|unicast| routes it is either the true nexthop
1260router or, if it is a direct route installed in BSD compatibility mode,
1261it can be a local address of the interface.
1262For NAT routes it is the first address of the block of translated IP destinations.
1263
1264\item \verb|src ADDRESS|
1265
1266--- the source address to prefer when sending to the destinations
1267covered by the route prefix.
1268
1269\item \verb|realm REALMID|
1270
1271--- the realm to which this route is assigned.
1272\verb|REALMID| may be a number or a string from the file
1273\verb|/etc/iproute2/rt_realms|. Sec.\ref{RT-REALMS} (p.\pageref{RT-REALMS})
1274contains more information on realms.
1275
1276\item \verb|mtu MTU| or \verb|mtu lock MTU|
1277
1278--- the MTU along the path to the destination. If the modifier \verb|lock| is
1279not used, the MTU may be updated by the kernel due to Path MTU Discovery.
1280If the modifier \verb|lock| is used, no path MTU discovery will be tried,
1281all packets will be sent without the DF bit in IPv4 case
1282or fragmented to MTU for IPv6.
1283
1284\item \verb|window NUMBER|
1285
1286--- the maximal window for TCP to advertise to these destinations,
1287measured in bytes. It limits maximal data bursts that our TCP
1288peers are allowed to send to us.
1289
1290\item \verb|rtt NUMBER|
1291
1292--- the initial RTT (``Round Trip Time'') estimate.
1293
1294
1295\item \verb|rttvar NUMBER|
1296
1297--- \threeonly the initial RTT variance estimate.
1298
1299
1300\item \verb|ssthresh NUMBER|
1301
1302--- \threeonly an estimate for the initial slow start threshold.
1303
1304
1305\item \verb|cwnd NUMBER|
1306
1307--- \threeonly the clamp for congestion window. It is ignored if the \verb|lock|
1308    flag is not used.
1309
1310
1311\item \verb|advmss NUMBER|
1312
1313--- \threeonly the MSS (``Maximal Segment Size'') to advertise to these
1314    destinations when establishing TCP connections. If it is not given,
1315    Linux uses a default value calculated from the first hop device MTU.
1316
1317\begin{NB}
1318  If the path to these destination is asymmetric, this guess may be wrong.
1319\end{NB}
1320
1321\item \verb|reordering NUMBER|
1322
1323--- \threeonly Maximal reordering on the path to this destination.
1324    If it is not given, Linux uses the value selected with \verb|sysctl|
1325    variable \verb|net/ipv4/tcp_reordering|.
1326
1327\item \verb|hoplimit NUMBER|
1328
1329--- [2.5.74+ only] Maximum number of hops on the path to this destination.
1330    The default is the value selected with the \verb|sysctl| variable
1331    \verb|net/ipv4/ip_default_ttl|.
1332
1333\item \verb|initcwnd NUMBER|
1334--- [2.5.70+ only] Initial congestion window size for connections to
1335    this destination. Actual window size is this value multiplied by the
1336    MSS (``Maximal Segment Size'') for same connection. The default is
1337    zero, meaning to use the values specified in~\cite{RFC2414}.
1338
1339\item \verb|nexthop NEXTHOP|
1340
1341--- the nexthop of a multipath route. \verb|NEXTHOP| is a complex value
1342with its own syntax similar to the top level argument lists:
1343\begin{itemize}
1344\item \verb|via ADDRESS| is the nexthop router.
1345\item \verb|dev NAME| is the output device.
1346\item \verb|weight NUMBER| is a weight for this element of a multipath
1347route reflecting its relative bandwidth or quality.
1348\end{itemize}
1349
1350\item \verb|scope SCOPE_VAL|
1351
1352--- the scope of the destinations covered by the route prefix.
1353\verb|SCOPE_VAL| may be a number or a string from the file
1354\verb|/etc/iproute2/rt_scopes|.
1355If this parameter is omitted,
1356\verb|ip| assumes scope \verb|global| for all gatewayed \verb|unicast|
1357routes, scope \verb|link| for direct \verb|unicast| and \verb|broadcast| routes
1358and scope \verb|host| for \verb|local| routes.
1359
1360\item \verb|protocol RTPROTO|
1361
1362--- the routing protocol identifier of this route.
1363\verb|RTPROTO| may be a number or a string from the file
1364\verb|/etc/iproute2/rt_protos|. If the routing protocol ID is
1365not given, \verb|ip| assumes protocol \verb|boot| (i.e.\
1366it assumes the route was added by someone who doesn't
1367understand what they are doing). Several protocol values have a fixed interpretation.
1368Namely:
1369\begin{itemize}
1370\item \verb|redirect| --- the route was installed due to an ICMP redirect.
1371\item \verb|kernel| --- the route was installed by the kernel during
1372autoconfiguration.
1373\item \verb|boot| --- the route was installed during the bootup sequence.
1374If a routing daemon starts, it will purge all of them.
1375\item \verb|static| --- the route was installed by the administrator
1376to override dynamic routing. Routing daemon will respect them
1377and, probably, even advertise them to its peers.
1378\item \verb|ra| --- the route was installed by Router Discovery protocol.
1379\end{itemize}
1380The rest of the values are not reserved and the administrator is free
1381to assign (or not to assign) protocol tags. At least, routing
1382daemons should take care of setting some unique protocol values,
1383f.e.\ as they are assigned in \verb|rtnetlink.h| or in \verb|rt_protos|
1384database.
1385
1386
1387\item \verb|onlink|
1388
1389--- pretend that the nexthop is directly attached to this link,
1390even if it does not match any interface prefix. One application of this
1391option may be found in~\cite{IP-TUNNELS}.
1392
1393\item \verb|equalize|
1394
1395--- allow packet by packet randomization on multipath routes.
1396Without this modifier, the route will be frozen to one selected
1397nexthop, so that load splitting will only occur on per-flow base.
1398\verb|equalize| only works if the kernel is patched.
1399
1400
1401\end{itemize}
1402
1403
1404\begin{NB}
1405  Actually there are more commands: \verb|prepend| does the same
1406  thing as classic \verb|route add|, i.e.\ adds a route, even if another
1407  route to the same destination exists. Its opposite case is \verb|append|,
1408  which adds the route to the end of the list. Avoid these
1409  features.
1410\end{NB}
1411\begin{NB}
1412  More sad news, IPv6 only understands the \verb|append| command correctly.
1413  All the others are translated into \verb|append| commands. Certainly,
1414  this will change in the future.
1415\end{NB}
1416
1417\paragraph{Examples:}
1418\begin{itemize}
1419\item add a plain route to network 10.0.0/24 via gateway 193.233.7.65
1420\begin{verbatim}
1421  ip route add 10.0.0/24 via 193.233.7.65
1422\end{verbatim}
1423\item change it to a direct route via the \verb|dummy| device
1424\begin{verbatim}
1425  ip ro chg 10.0.0/24 dev dummy
1426\end{verbatim}
1427\item add a default multipath route splitting the load between \verb|ppp0|
1428and \verb|ppp1|
1429\begin{verbatim}
1430  ip route add default scope global nexthop dev ppp0 \
1431                                    nexthop dev ppp1
1432\end{verbatim}
1433Note the scope value. It is not necessary but it informs the kernel
1434that this route is gatewayed rather than direct. Actually, if you
1435know the addresses of remote endpoints it would be better to use the
1436\verb|via| parameter.
1437\item announce that the address 192.203.80.144 is not a real one, but
1438should be translated to 193.233.7.83 before forwarding
1439\begin{verbatim}
1440  ip route add nat 192.203.80.144 via 193.233.7.83
1441\end{verbatim}
1442Backward translation is setup with policy rules described
1443in the following section (sec.\ref{IP-RULE}, p.\pageref{IP-RULE}).
1444\end{itemize}
1445
1446\subsection{{\tt ip route delete} --- delete a route}
1447
1448\paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|.
1449
1450\paragraph{Arguments:} \verb|ip route del| has the same arguments as
1451\verb|ip route add|, but their semantics are a bit different.
1452
1453Key values (\verb|to|, \verb|tos|, \verb|preference| and \verb|table|)
1454select the route to delete. If optional attributes are present, \verb|ip|
1455verifies that they coincide with the attributes of the route to delete.
1456If no route with the given key and attributes was found, \verb|ip route del|
1457fails.
1458\begin{NB}
1459Linux-2.0 had the option to delete a route selected only by prefix address,
1460ignoring its length (i.e.\ netmask). This option no longer exists
1461because it was ambiguous. However, look at {\tt ip route flush}
1462(sec.\ref{IP-ROUTE-FLUSH}, p.\pageref{IP-ROUTE-FLUSH}) which
1463provides similar and even richer functionality.
1464\end{NB}
1465
1466\paragraph{Example:}
1467\begin{itemize}
1468\item delete the multipath route created by the command in previous subsection
1469\begin{verbatim}
1470  ip route del default scope global nexthop dev ppp0 \
1471                                    nexthop dev ppp1
1472\end{verbatim}
1473\end{itemize}
1474
1475
1476
1477\subsection{{\tt ip route show} --- list routes}
1478
1479\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
1480
1481\paragraph{Description:} the command displays the contents of the routing tables
1482or the route(s) selected by some criteria.
1483
1484
1485\paragraph{Arguments:}
1486\begin{itemize}
1487\item \verb|to SELECTOR| (default)
1488
1489--- only select routes from the given range of destinations. \verb|SELECTOR|
1490consists of an optional modifier (\verb|root|, \verb|match| or \verb|exact|)
1491and a prefix. \verb|root PREFIX| selects routes with prefixes not shorter
1492than \verb|PREFIX|. F.e.\ \verb|root 0/0| selects the entire routing table.
1493\verb|match PREFIX| selects routes with prefixes not longer than
1494\verb|PREFIX|. F.e.\ \verb|match 10.0/16| selects \verb|10.0/16|,
1495\verb|10/8| and \verb|0/0|, but it does not select \verb|10.1/16| and
1496\verb|10.0.0/24|. And \verb|exact PREFIX| (or just \verb|PREFIX|)
1497selects routes with this exact prefix. If neither of these options
1498are present, \verb|ip| assumes \verb|root 0/0| i.e.\ it lists the entire table.
1499
1500
1501\item \verb|tos TOS| or \verb|dsfield TOS|
1502
1503 --- only select routes with the given TOS.
1504
1505
1506\item \verb|table TABLEID|
1507
1508 --- show the routes from this table(s). The default setting is to show
1509\verb|table| \verb|main|. \verb|TABLEID| may either be the ID of a real table
1510or one of the special values:
1511  \begin{itemize}
1512  \item \verb|all| --- list all of the tables.
1513  \item \verb|cache| --- dump the routing cache.
1514  \end{itemize}
1515\begin{NB}
1516  IPv6 has a single table. However, splitting it into \verb|main|, \verb|local|
1517  and \verb|cache| is emulated by the \verb|ip| utility.
1518\end{NB}
1519
1520\item \verb|cloned| or \verb|cached|
1521
1522--- list cloned routes i.e.\ routes which were dynamically forked from
1523other routes because some route attribute (f.e.\ MTU) was updated.
1524Actually, it is equivalent to \verb|table cache|.
1525
1526\item \verb|from SELECTOR|
1527
1528--- the same syntax as for \verb|to|, but it binds the source address range
1529rather than destinations. Note that the \verb|from| option only works with
1530cloned routes.
1531
1532\item \verb|protocol RTPROTO|
1533
1534--- only list routes of this protocol.
1535
1536
1537\item \verb|scope SCOPE_VAL|
1538
1539--- only list routes with this scope.
1540
1541\item \verb|type TYPE|
1542
1543--- only list routes of this type.
1544
1545\item \verb|dev NAME|
1546
1547--- only list routes going via this device.
1548
1549\item \verb|via PREFIX|
1550
1551--- only list routes going via the nexthop routers selected by \verb|PREFIX|.
1552
1553\item \verb|src PREFIX|
1554
1555--- only list routes with preferred source addresses selected
1556by \verb|PREFIX|.
1557
1558\item \verb|realm REALMID| or \verb|realms FROMREALM/TOREALM|
1559
1560--- only list routes with these realms.
1561
1562\end{itemize}
1563
1564\paragraph{Examples:} Let us count routes of protocol \verb|gated/bgp|
1565on a router:
1566\begin{verbatim}
1567kuznet@amber:~ $ ip ro ls proto gated/bgp | wc
1568   1413    9891    79010
1569kuznet@amber:~ $
1570\end{verbatim}
1571To count the size of the routing cache, we have to use the \verb|-o| option
1572because cached attributes can take more than one line of output:
1573\begin{verbatim}
1574kuznet@amber:~ $ ip -o ro ls cloned | wc
1575   159    2543    18707
1576kuznet@amber:~ $
1577\end{verbatim}
1578
1579
1580\paragraph{Output format:} The output of this command consists
1581of per route records separated by line feeds.
1582However, some records may consist
1583of more than one line: particularly, this is the case when the route
1584is cloned or you requested additional statistics. If the
1585\verb|-o| option was given, then line feeds separating lines inside
1586records are replaced with the backslash sign.
1587
1588The output has the same syntax as arguments given to {\tt ip route add},
1589so that it can be understood easily. F.e.\
1590\begin{verbatim}
1591kuznet@amber:~ $ ip ro ls 193.233.7/24
1592193.233.7.0/24 dev eth0  proto gated/conn  scope link \
1593    src 193.233.7.65 realms inr.ac 
1594kuznet@amber:~ $
1595\end{verbatim}
1596
1597If you list cloned entries, the output contains other attributes which
1598are evaluated during route calculation and updated during route
1599lifetime. An example of the output is:
1600\begin{verbatim}
1601kuznet@amber:~ $ ip ro ls 193.233.7.82 tab cache
1602193.233.7.82 from 193.233.7.82 dev eth0  src 193.233.7.65 \
1603  realms inr.ac/inr.ac 
1604    cache <src-direct,redirect>  mtu 1500 rtt 300 iif eth0
1605193.233.7.82 dev eth0  src 193.233.7.65 realms inr.ac 
1606    cache  mtu 1500 rtt 300
1607kuznet@amber:~ $
1608\end{verbatim}
1609\begin{NB}
1610  \label{NB-strange-route}
1611  The route looks a bit strange, doesn't it? Did you notice that
1612  it is a path from 193.233.7.82 back to 193.233.82? Well, you will
1613  see in the section on \verb|ip route get| (p.\pageref{NB-nature-of-strangeness})
1614  how it appeared.
1615\end{NB}
1616The second line, starting with the word \verb|cache|, shows
1617additional attributes which normal routes do not possess.
1618Cached flags are summarized in angle brackets:
1619\begin{itemize}
1620\item \verb|local| --- packets are delivered locally.
1621It stands for loopback unicast routes, for broadcast routes
1622and for multicast routes, if this host is a member of the corresponding
1623group.
1624
1625\item \verb|reject| --- the path is bad. Any attempt to use it results
1626in an error. See attribute \verb|error| below (p.\pageref{IP-ROUTE-GET-error}).
1627
1628\item \verb|mc| --- the destination is multicast.
1629
1630\item \verb|brd| --- the destination is broadcast.
1631
1632\item \verb|src-direct| --- the source is on a directly connected
1633interface.
1634
1635\item \verb|redirected| --- the route was created by an ICMP Redirect.
1636
1637\item \verb|redirect| --- packets going via this route will 
1638trigger an ICMP redirect.
1639
1640\item \verb|fastroute| --- the route is eligible to be used for fastroute.
1641
1642\item \verb|equalize| --- make packet by packet randomization
1643along this path.
1644
1645\item \verb|dst-nat| --- the destination address requires translation.
1646
1647\item \verb|src-nat| --- the source address requires translation.
1648
1649\item \verb|masq| --- the source address requires masquerading.
1650This feature disappeared in linux-2.4.
1651
1652\item \verb|notify| --- ({\em not implemented}) change/deletion
1653of this route will trigger RTNETLINK notification.
1654\end{itemize}
1655
1656Then some optional attributes follow:
1657\begin{itemize}
1658\item \verb|error| --- on \verb|reject| routes it is error code
1659returned to local senders when they try to use this route.
1660These error codes are translated into ICMP error codes, sent to remote
1661senders, according to the rules described above in the subsection
1662devoted to route types (p.\pageref{IP-ROUTE-TYPES}).
1663\label{IP-ROUTE-GET-error}
1664
1665\item \verb|expires| --- this entry will expire after this timeout.
1666
1667\item \verb|iif| --- the packets for this path are expected to arrive
1668on this interface.
1669\end{itemize}
1670
1671\paragraph{Statistics:} With the \verb|-statistics| option, more
1672information about this route is shown:
1673\begin{itemize}
1674\item \verb|users| --- the number of users of this entry.
1675\item \verb|age| --- shows when this route was last used.
1676\item \verb|used| --- the number of lookups of this route since its creation.
1677\end{itemize}
1678
1679
1680\subsection{{\tt ip route flush} --- flush routing tables}
1681\label{IP-ROUTE-FLUSH}
1682
1683\paragraph{Abbreviations:} \verb|flush|, \verb|f|.
1684
1685\paragraph{Description:} this command flushes routes selected
1686by some criteria.
1687
1688\paragraph{Arguments:} the arguments have the same syntax and semantics
1689as the arguments of \verb|ip route show|, but routing tables are not
1690listed but purged. The only difference is the default action: \verb|show|
1691dumps all the IP main routing table but \verb|flush| prints the helper page.
1692The reason for this difference does not require any explanation, does it?
1693
1694
1695\paragraph{Statistics:} With the \verb|-statistics| option, the command
1696becomes verbose. It prints out the number of deleted routes and the number
1697of rounds made to flush the routing table. If the option is given
1698twice, \verb|ip route flush| also dumps all the deleted routes
1699in the format described in the previous subsection.
1700
1701\paragraph{Examples:} The first example flushes all the
1702gatewayed routes from the main table (f.e.\ after a routing daemon crash).
1703\begin{verbatim}
1704netadm@amber:~ # ip -4 ro flush scope global type unicast
1705\end{verbatim}
1706This option deserves to be put into a scriptlet \verb|routef|.
1707\begin{NB}
1708This option was described in the \verb|route(8)| man page borrowed
1709from BSD, but was never implemented in Linux.
1710\end{NB}
1711
1712The second example flushes all IPv6 cloned routes:
1713\begin{verbatim}
1714netadm@amber:~ # ip -6 -s -s ro flush cache
17153ffe:2400::220:afff:fef4:c5d1 via 3ffe:2400::220:afff:fef4:c5d1 \
1716  dev eth0  metric 0 
1717    cache  used 2 age 12sec mtu 1500 rtt 300
17183ffe:2400::280:adff:feb7:8034 via 3ffe:2400::280:adff:feb7:8034 \
1719  dev eth0  metric 0 
1720    cache  used 2 age 15sec mtu 1500 rtt 300
17213ffe:2400::280:c8ff:fe59:5bcc via 3ffe:2400::280:c8ff:fe59:5bcc \
1722  dev eth0  metric 0 
1723    cache  users 1 used 1 age 23sec mtu 1500 rtt 300
17243ffe:2400:0:1:2a0:ccff:fe66:1878 via 3ffe:2400:0:1:2a0:ccff:fe66:1878 \
1725  dev eth1  metric 0 
1726    cache  used 2 age 20sec mtu 1500 rtt 300
17273ffe:2400:0:1:a00:20ff:fe71:fb30 via 3ffe:2400:0:1:a00:20ff:fe71:fb30 \
1728  dev eth1  metric 0 
1729    cache  used 2 age 33sec mtu 1500 rtt 300
1730ff02::1 via ff02::1 dev eth1  metric 0 
1731    cache  users 1 used 1 age 45sec mtu 1500 rtt 300
1732
1733*** Round 1, deleting 6 entries ***
1734*** Flush is complete after 1 round ***
1735netadm@amber:~ # ip -6 -s -s ro flush cache
1736Nothing to flush.
1737netadm@amber:~ #
1738\end{verbatim}
1739
1740The third example flushes BGP routing tables after a \verb|gated|
1741death.
1742\begin{verbatim}
1743netadm@amber:~ # ip ro ls proto gated/bgp | wc
1744   1408    9856    78730
1745netadm@amber:~ # ip -s ro f proto gated/bgp
1746
1747*** Round 1, deleting 1408 entries ***
1748*** Flush is complete after 1 round ***
1749netadm@amber:~ # ip ro f proto gated/bgp
1750Nothing to flush.
1751netadm@amber:~ # ip ro ls proto gated/bgp
1752netadm@amber:~ #
1753\end{verbatim}
1754
1755
1756\subsection{{\tt ip route get} --- get a single route}
1757\label{IP-ROUTE-GET}
1758
1759\paragraph{Abbreviations:} \verb|get|, \verb|g|.
1760
1761\paragraph{Description:} this command gets a single route to a destination
1762and prints its contents exactly as the kernel sees it.
1763
1764\paragraph{Arguments:} 
1765\begin{itemize}
1766\item \verb|to ADDRESS| (default)
1767
1768--- the destination address.
1769
1770\item \verb|from ADDRESS|
1771
1772--- the source address.
1773
1774\item \verb|tos TOS| or \verb|dsfield TOS|
1775
1776--- the Type Of Service.
1777
1778\item \verb|iif NAME|
1779
1780--- the device from which this packet is expected to arrive.
1781
1782\item \verb|oif NAME|
1783
1784--- force the output device on which this packet will be routed.
1785
1786\item \verb|connected|
1787
1788--- if no source address (option \verb|from|) was given, relookup
1789the route with the source set to the preferred address received from the first lookup.
1790If policy routing is used, it may be a different route.
1791
1792\end{itemize}
1793
1794Note that this operation is not equivalent to \verb|ip route show|.
1795\verb|show| shows existing routes. \verb|get| resolves them and
1796creates new clones if necessary. Essentially, \verb|get|
1797is equivalent to sending a packet along this path.
1798If the \verb|iif| argument is not given, the kernel creates a route
1799to output packets towards the requested destination.
1800This is equivalent to pinging the destination
1801with a subsequent {\tt ip route ls cache}, however, no packets are
1802actually sent. With the \verb|iif| argument, the kernel pretends
1803that a packet arrived from this interface and searches for
1804a path to forward the packet.
1805
1806\paragraph{Output format:} This command outputs routes in the same
1807format as \verb|ip route ls|.
1808
1809\paragraph{Examples:} 
1810\begin{itemize}
1811\item Find a route to output packets to 193.233.7.82:
1812\begin{verbatim}
1813kuznet@amber:~ $ ip route get 193.233.7.82
1814193.233.7.82 dev eth0  src 193.233.7.65 realms inr.ac
1815    cache  mtu 1500 rtt 300
1816kuznet@amber:~ $
1817\end{verbatim}
1818
1819\item Find a route to forward packets arriving on \verb|eth0|
1820from 193.233.7.82 and destined for 193.233.7.82:
1821\begin{verbatim}
1822kuznet@amber:~ $ ip r g 193.233.7.82 from 193.233.7.82 iif eth0
1823193.233.7.82 from 193.233.7.82 dev eth0  src 193.233.7.65 \
1824  realms inr.ac/inr.ac 
1825    cache <src-direct,redirect>  mtu 1500 rtt 300 iif eth0
1826kuznet@amber:~ $
1827\end{verbatim}
1828\begin{NB}
1829  \label{NB-nature-of-strangeness}
1830  This is the command that created the funny route from 193.233.7.82
1831  looped back to 193.233.7.82 (cf.\ NB on~p.\pageref{NB-strange-route}).
1832  Note the \verb|redirect| flag on it.
1833\end{NB}
1834
1835\item Find a multicast route for packets arriving on \verb|eth0|
1836from host 193.233.7.82 and destined for multicast group 224.2.127.254
1837(it is assumed that a multicast routing daemon is running.
1838In this case, it is \verb|pimd|)
1839\begin{verbatim}
1840kuznet@amber:~ $ ip r g 224.2.127.254 from 193.233.7.82 iif eth0
1841multicast 224.2.127.254 from 193.233.7.82 dev lo  \
1842  src 193.233.7.65 realms inr.ac/cosmos 
1843    cache <mc> iif eth0 Oifs: eth1 pimreg
1844kuznet@amber:~ $
1845\end{verbatim}
1846This route differs from the ones seen before. It contains a ``normal'' part
1847and a ``multicast'' part. The normal part is used to deliver (or not to
1848deliver) the packet to local IP listeners. In this case the router
1849is not a member
1850of this group, so that route has no \verb|local| flag and only
1851forwards packets. The output device for such entries is always loopback.
1852The multicast part consists of an additional \verb|Oifs:| list showing
1853the output interfaces.
1854\end{itemize}
1855
1856
1857It is time for a more complicated example. Let us add an invalid
1858gatewayed route for a destination which is really directly connected:
1859\begin{verbatim}
1860netadm@alisa:~ # ip route add 193.233.7.98 via 193.233.7.254
1861netadm@alisa:~ # ip route get 193.233.7.98
1862193.233.7.98 via 193.233.7.254 dev eth0  src 193.233.7.90
1863    cache  mtu 1500 rtt 3072
1864netadm@alisa:~ #
1865\end{verbatim}
1866and probe it with ping:
1867\begin{verbatim}
1868netadm@alisa:~ # ping -n 193.233.7.98
1869PING 193.233.7.98 (193.233.7.98) from 193.233.7.90 : 56 data bytes
1870From 193.233.7.254: Redirect Host(New nexthop: 193.233.7.98)
187164 bytes from 193.233.7.98: icmp_seq=0 ttl=255 time=3.5 ms
1872From 193.233.7.254: Redirect Host(New nexthop: 193.233.7.98)
187364 bytes from 193.233.7.98: icmp_seq=1 ttl=255 time=2.2 ms
187464 bytes from 193.233.7.98: icmp_seq=2 ttl=255 time=0.4 ms
187564 bytes from 193.233.7.98: icmp_seq=3 ttl=255 time=0.4 ms
187664 bytes from 193.233.7.98: icmp_seq=4 ttl=255 time=0.4 ms
1877^C
1878--- 193.233.7.98 ping statistics ---
18795 packets transmitted, 5 packets received, 0% packet loss
1880round-trip min/avg/max = 0.4/1.3/3.5 ms
1881netadm@alisa:~ #
1882\end{verbatim}
1883What happened? Router 193.233.7.254 understood that we have a much
1884better path to the destination and sent us an ICMP redirect message.
1885We may retry \verb|ip route get| to see what we have in the routing
1886tables now:
1887\begin{verbatim}
1888netadm@alisa:~ # ip route get 193.233.7.98
1889193.233.7.98 dev eth0  src 193.233.7.90 
1890    cache <redirected>  mtu 1500 rtt 3072
1891netadm@alisa:~ #
1892\end{verbatim}
1893
1894
1895
1896\section{{\tt ip rule} --- routing policy database management}
1897\label{IP-RULE}
1898
1899\paragraph{Abbreviations:} \verb|rule|, \verb|ru|.
1900
1901\paragraph{Object:} \verb|rule|s in the routing policy database control
1902the route selection algorithm.
1903
1904Classic routing algorithms used in the Internet make routing decisions
1905based only on the destination address of packets (and in theory,
1906but not in practice, on the TOS field). The seminal review of classic
1907routing algorithms and their modifications can be found in~\cite{RFC1812}.
1908
1909In some circumstances we want to route packets differently depending not only
1910on destination addresses, but also on other packet fields: source address,
1911IP protocol, transport protocol ports or even packet payload.
1912This task is called ``policy routing''.
1913
1914\begin{NB}
1915  ``policy routing'' $\neq$ ``routing policy''.
1916
1917\noindent	``policy routing'' $=$ ``cunning routing''.
1918
1919\noindent	``routing policy'' $=$ ``routing tactics'' or ``routing plan''.
1920\end{NB}
1921
1922To solve this task, the conventional destination based routing table, ordered
1923according to the longest match rule, is replaced with a ``routing policy
1924database'' (or RPDB), which selects routes
1925by executing some set of rules. The rules may have lots of keys of different
1926natures and therefore they have no natural ordering, but one imposed
1927by the administrator. Linux-2.2 RPDB is a linear list of rules
1928ordered by numeric priority value.
1929RPDB explicitly allows matching a few packet fields:
1930
1931\begin{itemize}
1932\item packet source address.
1933\item packet destination address.
1934\item TOS.
1935\item incoming interface (which is packet metadata, rather than a packet field).
1936\end{itemize}
1937
1938Matching IP protocols and transport ports is also possible,
1939indirectly, via \verb|ipchains|, by exploiting their ability
1940to mark some classes of packets with \verb|fwmark|. Therefore,
1941\verb|fwmark| is also included in the set of keys checked by rules.
1942
1943Each policy routing rule consists of a {\em selector\/} and an {\em action\/}
1944predicate. The RPDB is scanned in the order of increasing priority. The selector
1945of each rule is applied to \{source address, destination address, incoming
1946interface, tos, fwmark\} and, if the selector matches the packet,
1947the action is performed.  The action predicate may return with success.
1948In this case, it will either give a route or failure indication
1949and the RPDB lookup is terminated. Otherwise, the RPDB program
1950continues on the next rule.
1951
1952What is the action, semantically? The natural action is to select the
1953nexthop and the output device. This is what
1954Cisco IOS~\cite{IOS} does. Let us call it ``match \& set''.
1955The Linux-2.2 approach is more flexible. The action includes
1956lookups in destination-based routing tables and selecting
1957a route from these tables according to the classic longest match algorithm.
1958The ``match \& set'' approach is the simplest case of the Linux one. It is realized
1959when a second level routing table contains a single default route.
1960Recall that Linux-2.2 supports multiple tables
1961managed with the \verb|ip route| command, described in the previous section.
1962
1963At startup time the kernel configures the default RPDB consisting of three
1964rules:
1965
1966\begin{enumerate}
1967\item Priority: 0, Selector: match anything, Action: lookup routing
1968table \verb|local| (ID 255).
1969The \verb|local| table is a special routing table containing
1970high priority control routes for local and broadcast addresses.
1971
1972Rule 0 is special. It cannot be deleted or overridden.
1973
1974
1975\item Priority: 32766, Selector: match anything, Action: lookup routing
1976table \verb|main| (ID 254).
1977The \verb|main| table is the normal routing table containing all non-policy
1978routes. This rule may be deleted and/or overridden with other
1979ones by the administrator.
1980
1981\item Priority: 32767, Selector: match anything, Action: lookup routing
1982table \verb|default| (ID 253).
1983The \verb|default| table is empty. It is reserved for some
1984post-processing if no previous default rules selected the packet.
1985This rule may also be deleted.
1986
1987\end{enumerate}
1988
1989Do not confuse routing tables with rules: rules point to routing tables,
1990several rules may refer to one routing table and some routing tables
1991may have no rules pointing to them. If the administrator deletes all the rules
1992referring to a table, the table is not used, but it still exists
1993and will disappear only after all the routes contained in it are deleted.
1994
1995
1996\paragraph{Rule attributes:} Each RPDB entry has additional
1997attributes. F.e.\ each rule has a pointer to some routing
1998table. NAT and masquerading rules have an attribute to select new IP
1999address to translate/masquerade. Besides that, rules have some
2000optional attributes, which routes have, namely \verb|realms|.
2001These values do not override those contained in the routing tables. They
2002are only used if the route did not select any attributes.
2003
2004
2005\paragraph{Rule types:} The RPDB may contain rules of the following
2006types:
2007\begin{itemize}
2008\item \verb|unicast| --- the rule prescribes to return the route found
2009in the routing table referenced by the rule.
2010\item \verb|blackhole| --- the rule prescribes to silently drop the packet.
2011\item \verb|unreachable| --- the rule prescribes to generate a ``Network
2012is unreachable'' error.
2013\item \verb|prohibit| --- the rule prescribes to generate
2014``Communication is administratively prohibited'' error.
2015\item \verb|nat| --- the rule prescribes to translate the source address
2016of the IP packet into some other value. More about NAT is
2017in Appendix~\ref{ROUTE-NAT}, p.\pageref{ROUTE-NAT}.
2018\end{itemize}
2019
2020
2021\paragraph{Commands:} \verb|add|, \verb|delete| and \verb|show|
2022(or \verb|list|).
2023
2024\subsection{{\tt ip rule add} --- insert a new rule\\
2025	{\tt ip rule delete} --- delete a rule}
2026\label{IP-RULE-ADD}
2027
2028\paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|delete|, \verb|del|,
2029	\verb|d|.
2030
2031\paragraph{Arguments:}
2032
2033\begin{itemize}
2034\item \verb|type TYPE| (default)
2035
2036--- the type of this rule. The list of valid types was given in the previous
2037subsection.
2038
2039\item \verb|from PREFIX|
2040
2041--- select the source prefix to match.
2042
2043\item \verb|to PREFIX|
2044
2045--- select the destination prefix to match.
2046
2047\item \verb|iif NAME|
2048
2049--- select the incoming device to match. If the interface is loopback,
2050the rule only matches packets originating from this host. This means that you
2051may create separate routing tables for forwarded and local packets and,
2052hence, completely segregate them.
2053
2054\item \verb|tos TOS| or \verb|dsfield TOS|
2055
2056--- select the TOS value to match.
2057
2058\item \verb|fwmark MARK|
2059
2060--- select the \verb|fwmark| value to match.
2061
2062\item \verb|priority PREFERENCE|
2063
2064--- the priority of this rule. Each rule should have an explicitly
2065set {\em unique\/} priority value.
2066\begin{NB}
2067  Really, for historical reasons \verb|ip rule add| does not require a
2068  priority value and allows them to be non-unique.
2069  If the user does not supplied a priority, it is selected by the kernel.
2070  If the user creates a rule with a priority value that
2071  already exists, the kernel does not reject the request. It adds
2072  the new rule before all old rules of the same priority.
2073
2074  It is mistake in design, no more. And it will be fixed one day,
2075  so do not rely on this feature. Use explicit priorities.
2076\end{NB}
2077
2078
2079\item \verb|table TABLEID|
2080
2081--- the routing table identifier to lookup if the rule selector matches.
2082
2083\item \verb|realms FROM/TO|
2084
2085--- Realms to select if the rule matched and the routing table lookup
2086succeeded. Realm \verb|TO| is only used if the route did not select
2087any realm.
2088
2089\item \verb|nat ADDRESS|
2090
2091--- The base of the IP address block to translate (for source addresses).
2092The \verb|ADDRESS| may be either the start of the block of NAT addresses
2093(selected by NAT routes) or in linux-2.2 a local host address (or even zero).
2094In the last case the router does not translate the packets,
2095but masquerades them to this address; this feature disappered in 2.4.
2096More about NAT is in Appendix~\ref{ROUTE-NAT},
2097p.\pageref{ROUTE-NAT}.
2098
2099\end{itemize}
2100
2101\paragraph{Warning:} Changes to the RPDB made with these commands
2102do not become active immediately. It is assumed that after
2103a script finishes a batch of updates, it flushes the routing cache
2104with \verb|ip route flush cache|.
2105
2106\paragraph{Examples:}
2107\begin{itemize}
2108\item Route packets with source addresses from 192.203.80/24
2109according to routing table \verb|inr.ruhep|:
2110\begin{verbatim}
2111ip ru add from 192.203.80.0/24 table inr.ruhep prio 220
2112\end{verbatim}
2113
2114\item Translate packet source address 193.233.7.83 into 192.203.80.144
2115and route it according to table \#1 (actually, it is \verb|inr.ruhep|):
2116\begin{verbatim}
2117ip ru add from 193.233.7.83 nat 192.203.80.144 table 1 prio 320
2118\end{verbatim}
2119
2120\item Delete the unused default rule:
2121\begin{verbatim}
2122ip ru del prio 32767
2123\end{verbatim}
2124
2125\end{itemize}
2126
2127
2128
2129\subsection{{\tt ip rule show} --- list rules}
2130\label{IP-RULE-SHOW}
2131
2132\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
2133
2134
2135\paragraph{Arguments:} Good news, this is one command that has no arguments.
2136
2137\paragraph{Output format:}
2138
2139\begin{verbatim}
2140kuznet@amber:~ $ ip ru ls
21410:	from all lookup local 
2142200:	from 192.203.80.0/24 to 193.233.7.0/24 lookup main
2143210:	from 192.203.80.0/24 to 192.203.80.0/24 lookup main
2144220:	from 192.203.80.0/24 lookup inr.ruhep realms inr.ruhep/radio-msu
2145300:	from 193.233.7.83 to 193.233.7.0/24 lookup main
2146310:	from 193.233.7.83 to 192.203.80.0/24 lookup main
2147320:	from 193.233.7.83 lookup inr.ruhep map-to 192.203.80.144
214832766:	from all lookup main 
2149kuznet@amber:~ $
2150\end{verbatim}
2151
2152In the first column is the rule priority value followed
2153by a colon. Then the selectors follow. Each key is prefixed
2154with the same keyword that was used to create the rule.
2155
2156The keyword \verb|lookup| is followed by a routing table identifier,
2157as it is recorded in the file \verb|/etc/iproute2/rt_tables|.
2158
2159If the rule does NAT (f.e.\ rule \#320), it is shown by the keyword
2160\verb|map-to| followed by the start of the block of addresses to map.
2161
2162The sense of this example is pretty simple. The prefixes
2163192.203.80.0/24 and 193.233.7.0/24 form the internal network, but
2164they are routed differently when the packets leave it.
2165Besides that, the host 193.233.7.83 is translated into
2166another prefix to look like 192.203.80.144 when talking
2167to the outer world.
2168
2169
2170
2171\section{{\tt ip maddress} --- multicast addresses management}
2172\label{IP-MADDR}
2173
2174\paragraph{Object:} \verb|maddress| objects are multicast addresses.
2175
2176\paragraph{Commands:} \verb|add|, \verb|delete|, \verb|show| (or \verb|list|).
2177
2178\subsection{{\tt ip maddress show} --- list multicast addresses}
2179
2180\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
2181
2182\paragraph{Arguments:}
2183
2184\begin{itemize}
2185
2186\item \verb|dev NAME| (default)
2187
2188--- the device name.
2189
2190\end{itemize}
2191
2192\paragraph{Output format:}
2193
2194\begin{verbatim}
2195kuznet@alisa:~ $ ip maddr ls dummy
21962:  dummy
2197    link  33:33:00:00:00:01
2198    link  01:00:5e:00:00:01
2199    inet  224.0.0.1 users 2
2200    inet6 ff02::1
2201kuznet@alisa:~ $ 
2202\end{verbatim}
2203
2204The first line of the output shows the interface index and its name.
2205Then the multicast address list follows. Each line starts with the
2206protocol identifier. The word \verb|link| denotes a link layer
2207multicast addresses.
2208
2209If a multicast address has more than one user, the number
2210of users is shown after the \verb|users| keyword.
2211
2212One additional feature not present in the example above
2213is the \verb|static| flag, which indicates that the address was joined
2214with \verb|ip maddr add|. See the following subsection.
2215
2216
2217
2218\subsection{{\tt ip maddress add} --- add a multicast address\\
2219	    {\tt ip maddress delete} --- delete a multicast address}
2220
2221\paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|delete|, \verb|del|, \verb|d|.
2222
2223\paragraph{Description:} these commands attach/detach
2224a static link layer multicast address to listen on the interface.
2225Note that it is impossible to join protocol multicast groups
2226statically. This command only manages link layer addresses.
2227
2228
2229\paragraph{Arguments:}
2230
2231\begin{itemize}
2232\item \verb|address LLADDRESS| (default)
2233
2234--- the link layer multicast address.
2235
2236\item \verb|dev NAME|
2237
2238--- the device to join/leave this multicast address.
2239
2240\end{itemize}
2241
2242
2243\paragraph{Example:} Let us continue with the example from the previous subsection.
2244
2245\begin{verbatim}
2246netadm@alisa:~ # ip maddr add 33:33:00:00:00:01 dev dummy
2247netadm@alisa:~ # ip -0 maddr ls dummy
22482:  dummy
2249    link  33:33:00:00:00:01 users 2 static
2250    link  01:00:5e:00:00:01
2251netadm@alisa:~ # ip maddr del 33:33:00:00:00:01 dev dummy
2252\end{verbatim}
2253
2254\begin{NB}
2255 Neither \verb|ip| nor the kernel check for multicast address validity.
2256 Particularly, this means that you can try to load a unicast address
2257 instead of a multicast address. Most drivers will ignore such addresses,
2258 but several (f.e.\ Tulip) will intern it to their on-board filter.
2259 The effects may be strange. Namely, the addresses become additional
2260 local link addresses and, if you loaded the address of another host
2261 to the router, wait for duplicated packets on the wire.
2262 It is not a bug, but rather a hole in the API and intra-kernel interfaces.
2263 This feature is really more useful for traffic monitoring, but using it
2264 with Linux-2.2 you {\em have to\/} be sure that the host is not
2265 a router and, especially, that it is not a transparent proxy or masquerading
2266 agent.
2267\end{NB}
2268
2269
2270
2271\section{{\tt ip mroute} --- multicast routing cache management}
2272\label{IP-MROUTE}
2273
2274\paragraph{Abbreviations:} \verb|mroute|, \verb|mr|.
2275
2276\paragraph{Object:} \verb|mroute| objects are multicast routing cache
2277entries created by a user level mrouting daemon
2278(f.e.\ \verb|pimd| or \verb|mrouted|).
2279
2280Due to the limitations of the current interface to the multicast routing
2281engine, it is impossible to change \verb|mroute| objects administratively,
2282so we may only display them. This limitation will be removed
2283in the future.
2284
2285\paragraph{Commands:} \verb|show| (or \verb|list|).
2286
2287
2288\subsection{{\tt ip mroute show} --- list mroute cache entries}
2289
2290\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
2291
2292\paragraph{Arguments:}
2293
2294\begin{itemize}
2295\item \verb|to PREFIX| (default)
2296
2297--- the prefix selecting the destination multicast addresses to list.
2298
2299
2300\item \verb|iif NAME|
2301
2302--- the interface on which multicast packets are received.
2303
2304
2305\item \verb|from PREFIX|
2306
2307--- the prefix selecting the IP source addresses of the multicast route.
2308
2309
2310\end{itemize}
2311
2312\paragraph{Output format:}
2313
2314\begin{verbatim}
2315kuznet@amber:~ $ ip mroute ls
2316(193.232.127.6, 224.0.1.39)      Iif: unresolved 
2317(193.232.244.34, 224.0.1.40)     Iif: unresolved 
2318(193.233.7.65, 224.66.66.66)     Iif: eth0       Oifs: pimreg 
2319kuznet@amber:~ $ 
2320\end{verbatim}
2321
2322Each line shows one (S,G) entry in the multicast routing cache,
2323where S is the source address and G is the multicast group. \verb|Iif| is
2324the interface on which multicast packets are expected to arrive.
2325If the word \verb|unresolved| is there instead of the interface name,
2326it means that the routing daemon still hasn't resolved this entry.
2327The keyword \verb|oifs| is followed by a list of output interfaces, separated
2328by spaces. If a multicast routing entry is created with non-trivial
2329TTL scope, administrative distances are appended to the device names
2330in the \verb|oifs| list.
2331
2332\paragraph{Statistics:} The \verb|-statistics| option also prints the
2333number of packets and bytes forwarded along this route and
2334the number of packets that arrived on the wrong interface, if this number is not zero.
2335
2336\begin{verbatim}
2337kuznet@amber:~ $ ip -s mr ls 224.66/16
2338(193.233.7.65, 224.66.66.66)     Iif: eth0       Oifs: pimreg 
2339  9383 packets, 300256 bytes
2340kuznet@amber:~ $
2341\end{verbatim}
2342
2343
2344\section{{\tt ip tunnel} --- tunnel configuration}
2345\label{IP-TUNNEL}
2346
2347\paragraph{Abbreviations:} \verb|tunnel|, \verb|tunl|.
2348
2349\paragraph{Object:} \verb|tunnel| objects are tunnels, encapsulating
2350packets in IPv4 packets and then sending them over the IP infrastructure.
2351
2352\paragraph{Commands:} \verb|add|, \verb|delete|, \verb|change|, \verb|show|
2353(or \verb|list|).
2354
2355\paragraph{See also:} A more informal discussion of tunneling
2356over IP and the \verb|ip tunnel| command can be found in~\cite{IP-TUNNELS}.
2357
2358\subsection{{\tt ip tunnel add} --- add a new tunnel\\
2359	{\tt ip tunnel change} --- change an existing tunnel\\
2360	{\tt ip tunnel delete} --- destroy a tunnel}
2361
2362\paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|;
2363\verb|delete|, \verb|del|, \verb|d|.
2364
2365
2366\paragraph{Arguments:}
2367
2368\begin{itemize}
2369
2370\item \verb|name NAME| (default)
2371
2372--- select the tunnel device name.
2373
2374\item \verb|mode MODE|
2375
2376--- set the tunnel mode. Three modes are currently available:
2377	\verb|ipip|, \verb|sit| and \verb|gre|.
2378
2379\item \verb|remote ADDRESS|
2380
2381--- set the remote endpoint of the tunnel.
2382
2383\item \verb|local ADDRESS|
2384
2385--- set the fixed local address for tunneled packets.
2386It must be an address on another interface of this host.
2387
2388\item \verb|ttl N|
2389
2390--- set a fixed TTL \verb|N| on tunneled packets.
2391	\verb|N| is a number in the range 1--255. 0 is a special value
2392	meaning that packets inherit the TTL value. 
2393		The default value is: \verb|inherit|.
2394
2395\item \verb|tos T| or \verb|dsfield T|
2396
2397--- set a fixed TOS \verb|T| on tunneled packets.
2398		The default value is: \verb|inherit|.
2399
2400
2401
2402\item \verb|dev NAME| 
2403
2404--- bind the tunnel to the device \verb|NAME| so that
2405	tunneled packets will only be routed via this device and will
2406	not be able to escape to another device when the route to endpoint changes.
2407
2408\item \verb|nopmtudisc|
2409
2410--- disable Path MTU Discovery on this tunnel.
2411	It is enabled by default. Note that a fixed ttl is incompatible
2412	with this option: tunnelling with a fixed ttl always makes pmtu discovery.
2413
2414\item \verb|key K|, \verb|ikey K|, \verb|okey K|
2415
2416--- (only GRE tunnels) use keyed GRE with key \verb|K|. \verb|K| is
2417	either a number or an IP address-like dotted quad.
2418   The \verb|key| parameter sets the key to use in both directions.
2419   The \verb|ikey| and \verb|okey| parameters set different keys for input and output.
2420   
2421
2422\item \verb|csum|, \verb|icsum|, \verb|ocsum|
2423
2424--- (only GRE tunnels) generate/require checksums for tunneled packets.
2425   The \verb|ocsum| flag calculates checksums for outgoing packets.
2426   The \verb|icsum| flag requires that all input packets have the correct
2427   checksum. The \verb|csum| flag is equivalent to the combination
2428  ``\verb|icsum| \verb|ocsum|''.
2429
2430\item \verb|seq|, \verb|iseq|, \verb|oseq|
2431
2432--- (only GRE tunnels) serialize packets.
2433   The \verb|oseq| flag enables sequencing of outgoing packets.
2434   The \verb|iseq| flag requires that all input packets are serialized.
2435   The \verb|seq| flag is equivalent to the combination ``\verb|iseq| \verb|oseq|''.
2436
2437\begin{NB}
2438 I think this option does not
2439	work. At least, I did not test it, did not debug it and
2440	do not even understand how it is supposed to work or for what
2441	purpose Cisco planned to use it. Do not use it.
2442\end{NB}
2443
2444
2445\end{itemize}
2446
2447\paragraph{Example:} Create a pointopoint IPv6 tunnel with maximal TTL of 32.
2448\begin{verbatim}
2449netadm@amber:~ # ip tunl add Cisco mode sit remote 192.31.7.104 \
2450    local 192.203.80.142 ttl 32 
2451\end{verbatim}
2452
2453\subsection{{\tt ip tunnel show} --- list tunnels}
2454
2455\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
2456
2457
2458\paragraph{Arguments:} None.
2459
2460\paragraph{Output format:}
2461\begin{verbatim}
2462kuznet@amber:~ $ ip tunl ls Cisco
2463Cisco: ipv6/ip  remote 192.31.7.104  local 192.203.80.142  ttl 32 
2464kuznet@amber:~ $ 
2465\end{verbatim}
2466The line starts with the tunnel device name followed by a colon.
2467Then the tunnel mode follows. The parameters of the tunnel are listed
2468with the same keywords that were used when creating the tunnel.
2469
2470\paragraph{Statistics:}
2471
2472\begin{verbatim}
2473kuznet@amber:~ $ ip -s tunl ls Cisco
2474Cisco: ipv6/ip  remote 192.31.7.104  local 192.203.80.142  ttl 32 
2475RX: Packets    Bytes        Errors CsumErrs OutOfSeq Mcasts
2476    12566      1707516      0      0        0        0       
2477TX: Packets    Bytes        Errors DeadLoop NoRoute  NoBufs
2478    13445      1879677      0      0        0        0     
2479kuznet@amber:~ $ 
2480\end{verbatim}
2481Essentially, these numbers are the same as the numbers
2482printed with {\tt ip -s link show}
2483(sec.\ref{IP-LINK-SHOW}, p.\pageref{IP-LINK-SHOW}) but the tags are different
2484to reflect that they are tunnel specific.
2485\begin{itemize}
2486\item \verb|CsumErrs| --- the total number of packets dropped
2487because of checksum failures for a GRE tunnel with checksumming enabled.
2488\item \verb|OutOfSeq| --- the total number of packets dropped
2489because they arrived out of sequence for a GRE tunnel with
2490serialization enabled.
2491\item \verb|Mcasts| --- the total number of multicast packets
2492received on a broadcast GRE tunnel.
2493\item \verb|DeadLoop| --- the total number of packets which were not
2494transmitted because the tunnel is looped back to itself.
2495\item \verb|NoRoute| --- the total number of packets which were not
2496transmitted because there is no IP route to the remote endpoint.
2497\item \verb|NoBufs| --- the total number of packets which were not
2498transmitted because the kernel failed to allocate a buffer.
2499\end{itemize}
2500
2501
2502\section{{\tt ip monitor} and {\tt rtmon} --- state monitoring}
2503\label{IP-MONITOR}
2504
2505The \verb|ip| utility can monitor the state of devices, addresses
2506and routes continuously. This option has a slightly different format.
2507Namely,
2508the \verb|monitor| command is the first in the command line and then
2509the object list follows:
2510\begin{verbatim}
2511  ip monitor [ file FILE ] [ all | OBJECT-LIST ]
2512\end{verbatim}
2513\verb|OBJECT-LIST| is the list of object types that we want to monitor.
2514It may contain \verb|link|, \verb|address| and \verb|route|.
2515If no \verb|file| argument is given, \verb|ip| opens RTNETLINK,
2516listens on it and dumps state changes in the format described
2517in previous sections.
2518
2519If a file name is given, it does not listen on RTNETLINK,
2520but opens the file containing RTNETLINK messages saved in binary format
2521and dumps them. Such a history file can be generated with the
2522\verb|rtmon| utility. This utility has a command line syntax similar to
2523\verb|ip monitor|.
2524Ideally, \verb|rtmon| should be started before
2525the first network configuration command is issued. F.e.\ if
2526you insert:
2527\begin{verbatim}
2528  rtmon file /var/log/rtmon.log
2529\end{verbatim}
2530in a startup script, you will be able to view the full history
2531later.
2532
2533Certainly, it is possible to start \verb|rtmon| at any time.
2534It prepends the history with the state snapshot dumped at the moment
2535of starting.
2536
2537
2538\section{Route realms and policy propagation, {\tt rtacct}}
2539\label{RT-REALMS}
2540
2541On routers using OSPF ASE or, especially, the BGP protocol, routing
2542tables may be huge. If we want to classify or to account for the packets
2543per route, we will have to keep lots of information. Even worse, if we
2544want to distinguish the packets not only by their destination, but
2545also by their source, the task gets quadratic complexity and its solution
2546is physically impossible.
2547
2548One approach to propagating the policy from routing protocols
2549to the forwarding engine has been proposed in~\cite{IOS-BGP-PP}.
2550Essentially, Cisco Policy Propagation via BGP is based on the fact
2551that dedicated routers all have the RIB (Routing Information Base)
2552close to the forwarding engine, so policy routing rules can
2553check all the route attributes, including ASPATH information
2554and community strings.
2555
2556The Linux architecture, splitting the RIB (maintained by a user level
2557daemon) and the kernel based FIB (Forwarding Information Base),
2558does not allow such a simple approach.
2559
2560It is to our fortune because there is another solution
2561which allows even more flexible policy and richer semantics.
2562
2563Namely, routes can be clustered together in user space, based on their
2564attributes.  F.e.\ a BGP router knows route ASPATH, its community;
2565an OSPF router knows the route tag or its area. The administrator, when adding
2566routes manually, also knows their nature. Providing that the number of such
2567aggregates (we call them {\em realms\/}) is low, the task of full
2568classification both by source and destination becomes quite manageable.
2569
2570So each route may be assigned to a realm. It is assumed that
2571this identification is made by a routing daemon, but static routes
2572can also be handled manually with \verb|ip route| (see sec.\ref{IP-ROUTE},
2573p.\pageref{IP-ROUTE}).
2574\begin{NB}
2575  There is a patch to \verb|gated|, allowing classification of routes
2576  to realms with all the set of policy rules implemented in \verb|gated|:
2577  by prefix, by ASPATH, by origin, by tag etc.
2578\end{NB}
2579
2580To facilitate the construction (f.e.\ in case the routing
2581daemon is not aware of realms), missing realms may be completed
2582with routing policy rules, see sec.~\ref{IP-RULE}, p.\pageref{IP-RULE}.
2583
2584For each packet the kernel calculates a tuple of realms: source realm
2585and destination realm, using the following algorithm:
2586
2587\begin{enumerate}
2588\item If the route has a realm, the destination realm of the packet is set to it.
2589\item If the rule has a source realm, the source realm of the packet is set to it.
2590If the destination realm was not inherited from the route and the rule has a destination realm,
2591it is also set.
2592\item If at least one of the realms is still unknown, the kernel finds
2593the reversed route to the source of the packet.
2594\item If the source realm is still unknown, get it from the reversed route.
2595\item If one of the realms is still unknown, swap the realms of reversed
2596routes and apply step 2 again.
2597\end{enumerate}
2598
2599After this procedure is completed we know what realm the packet
2600arrived from and the realm where it is going to propagate to.
2601If some of the realms are unknown, they are initialized to zero
2602(or realm \verb|unknown|).
2603
2604The main application of realms is the TC \verb|route| classifier~\cite{TC-CREF},
2605where they are used to help assign packets to traffic classes,
2606to account, police and schedule them according to this
2607classification.
2608
2609A much simpler but still very useful application is incoming packet
2610accounting by realms. The kernel gathers a packet statistics summary
2611which can be viewed with the \verb|rtacct| utility.
2612\begin{verbatim}
2613kuznet@amber:~ $ rtacct russia
2614Realm      BytesTo    PktsTo     BytesFrom  PktsFrom   
2615russia     20576778   169176     47080168   153805     
2616kuznet@amber:~ $
2617\end{verbatim}
2618This shows that this router received 153805 packets from
2619the realm \verb|russia| and forwarded 169176 packets to \verb|russia|.
2620The realm \verb|russia| consists of routes with ASPATHs not leaving
2621Russia.
2622
2623Note that locally originating packets are not accounted here,
2624\verb|rtacct| shows incoming packets only. Using the \verb|route|
2625classifier (see~\cite{TC-CREF}) you can get even more detailed
2626accounting information about outgoing packets, optionally
2627summarizing traffic not only by source or destination, but
2628by any pair of source and destination realms.
2629
2630
2631\begin{thebibliography}{99}
2632\addcontentsline{toc}{section}{References}
2633\bibitem{RFC-NDISC} T.~Narten, E.~Nordmark, W.~Simpson.
2634``Neighbor Discovery for IP Version 6 (IPv6)'', RFC-2461.
2635
2636\bibitem{RFC-ADDRCONF} S.~Thomson, T.~Narten.
2637``IPv6 Stateless Address Autoconfiguration'', RFC-2462.
2638
2639\bibitem{RFC1812} F.~Baker.
2640``Requirements for IP Version 4 Routers'', RFC-1812.
2641
2642\bibitem{RFC1122} R.~T.~Braden.
2643``Requirements for Internet hosts --- communication layers'', RFC-1122.
2644
2645\bibitem{IOS} ``Cisco IOS Release 12.0 Network Protocols
2646Command Reference, Part 1'' and
2647``Cisco IOS Release 12.0 Quality of Service Solutions
2648Configuration Guide: Configuring Policy-Based Routing'',\\
2649http://www.cisco.com/univercd/cc/td/doc/product/software/ios120.
2650
2651\bibitem{IP-TUNNELS} A.~N.~Kuznetsov.
2652``Tunnels over IP in Linux-2.2'', \\
2653In: {\tt ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz}.
2654
2655\bibitem{TC-CREF} A.~N.~Kuznetsov. ``TC Command Reference'',\\
2656In: {\tt ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz}.
2657
2658\bibitem{IOS-BGP-PP} ``Cisco IOS Release 12.0 Quality of Service Solutions
2659Configuration Guide: Configuring QoS Policy Propagation via
2660Border Gateway Protocol'',\\
2661http://www.cisco.com/univercd/cc/td/doc/product/software/ios120.
2662
2663\bibitem{RFC-DHCP} R.~Droms.
2664``Dynamic Host Configuration Protocol.'', RFC-2131
2665
2666\bibitem{RFC2414}  M.~Allman, S.~Floyd, C.~Partridge.
2667``Increasing TCP's Initial Window'', RFC-2414.
2668
2669\end{thebibliography}
2670
2671
2672
2673
2674\appendix
2675\addcontentsline{toc}{section}{Appendix}
2676
2677\section{Source address selection}
2678\label{ADDR-SEL}
2679
2680When a host creates an IP packet, it must select some source
2681address. Correct source address selection is a critical procedure,
2682because it gives the receiver the information needed to deliver a
2683reply. If the source is selected incorrectly, in the best case,
2684the backward path may appear different to the forward one which
2685is harmful for performance. In the worst case, when the addresses
2686are administratively scoped, the reply may be lost entirely.
2687
2688Linux-2.2 selects source addresses using the following algorithm:
2689
2690\begin{itemize}
2691\item
2692The application may select a source address explicitly with \verb|bind(2)|
2693syscall or supplying it to \verb|sendmsg(2)| via the ancillary data object
2694\verb|IP_PKTINFO|. In this case the kernel only checks the validity
2695of the address and never tries to ``improve'' an incorrect user choice,
2696generating an error instead.
2697\begin{NB}
2698 Never say ``Never''. The sysctl option \verb|ip_dynaddr| breaks
2699 this axiom. It has been made deliberately with the purpose
2700 of automatically reselecting the address on hosts with dynamic dial-out interfaces.
2701 However, this hack {\em must not\/} be used on multihomed hosts
2702 and especially on routers: it would break them.
2703\end{NB}
2704
2705
2706\item Otherwise, IP routing tables can contain an explicit source
2707address hint for this destination. The hint is set with the \verb|src| parameter
2708to the \verb|ip route| command, sec.\ref{IP-ROUTE}, p.\pageref{IP-ROUTE}.
2709
2710
2711\item Otherwise, the kernel searches through the list of addresses
2712attached to the interface through which the packets will be routed.
2713The search strategies are different for IP and IPv6. Namely:
2714
2715\begin{itemize}
2716\item IPv6 searches for the first valid, not deprecated address
2717with the same scope as the destination.
2718
2719\item IP searches for the first valid address with a scope wider
2720than the scope of the destination but it prefers addresses
2721which fall to the same subnet as the nexthop of the route
2722to the destination. Unlike IPv6, the scopes of IPv4 destinations
2723are not encoded in their addresses but are supplied
2724in routing tables instead (the \verb|scope| parameter to the \verb|ip route| command,
2725sec.\ref{IP-ROUTE}, p.\pageref{IP-ROUTE}).
2726
2727\end{itemize}
2728
2729
2730\item Otherwise, if the scope of the destination is \verb|link| or \verb|host|,
2731the algorithm fails and returns a zero source address.
2732
2733\item Otherwise, all interfaces are scanned to search for an address
2734with an appropriate scope. The loopback device \verb|lo| is always the first
2735in the search list, so that if an address with global scope (not 127.0.0.1!)
2736is configured on loopback, it is always preferred.
2737
2738\end{itemize}
2739
2740
2741\section{Proxy ARP/NDISC}
2742\label{PROXY-NEIGH}
2743
2744Routers may answer ARP/NDISC solicitations on behalf of other hosts.
2745In Linux-2.2 proxy ARP on an interface may be enabled
2746by setting the kernel \verb|sysctl| variable 
2747\verb|/proc/sys/net/ipv4/conf/<dev>/proxy_arp| to 1. After this, the router
2748starts to answer ARP requests on the interface \verb|<dev>|, provided
2749the route to the requested destination does {\em not\/} go back via the same
2750device.
2751
2752The variable \verb|/proc/sys/net/ipv4/conf/all/proxy_arp| enables proxy
2753ARP on all the IP devices.
2754
2755However, this approach fails in the case of IPv6 because the router
2756must join the solicited node multicast address to listen for the corresponding
2757NDISC queries. It means that proxy NDISC is possible only on a per destination
2758basis.
2759
2760Logically, proxy ARP/NDISC is not a kernel task. It can easily be implemented
2761in user space. However, similar functionality was present in BSD kernels
2762and in Linux-2.0, so we have to preserve it at least to the extent that
2763is standardized in BSD.
2764\begin{NB}
2765  Linux-2.0 ARP had a feature called {\em subnet\/} proxy ARP.
2766  It is replaced with the sysctl flag in Linux-2.2.
2767\end{NB}
2768
2769
2770The \verb|ip| utility provides a way to manage proxy ARP/NDISC
2771with the \verb|ip neigh| command, namely:
2772\begin{verbatim}
2773  ip neigh add proxy ADDRESS [ dev NAME ]
2774\end{verbatim}
2775adds a new proxy ARP/NDISC record and
2776\begin{verbatim}
2777  ip neigh del proxy ADDRESS [ dev NAME ]
2778\end{verbatim}
2779deletes it.
2780
2781If the name of the device is not given, the router will answer solicitations
2782for address \verb|ADDRESS| on all devices, otherwise it will only serve
2783the device \verb|NAME|. Even if the proxy entry is created with
2784\verb|ip neigh|, the router {\em will not\/} answer a query if the route
2785to the destination goes back via the interface from which the solicitation
2786was received.
2787
2788It is important to emphasize that proxy entries have {\em no\/}
2789parameters other than these (IP/IPv6 address and optional device).
2790Particularly, the entry does not store any link layer address.
2791It always advertises the station address of the interface
2792on which it sends advertisements (i.e. it's own station address).
2793
2794\section{Route NAT status}
2795\label{ROUTE-NAT}
2796
2797NAT (or ``Network Address Translation'') remaps some parts
2798of the IP address space into other ones. Linux-2.2 route NAT is supposed
2799to be used to facilitate policy routing by rewriting addresses
2800to other routing domains or to help while renumbering sites
2801to another prefix.
2802
2803\paragraph{What it is not:}
2804It is necessary to emphasize that {\em it is not supposed\/}
2805to be used to compress address space or to split load.
2806This is not missing functionality but a design principle.
2807Route NAT is {\em stateless\/}. It does not hold any state
2808about translated sessions. This means that it handles any number
2809of sessions flawlessly. But it also means that it is {\em static\/}.
2810It cannot detect the moment when the last TCP client stops
2811using an address. For the same reason, it will not help to split
2812load between several servers.
2813\begin{NB}
2814It is a pretty commonly held belief that it is useful to split load between
2815several servers with NAT. This is a mistake. All you get from this
2816is the requirement that the router keep the state of all the TCP connections
2817going via it. Well, if the router is so powerful, run apache on it. 8)
2818\end{NB}
2819
2820The second feature: it does not touch packet payload,
2821does not try to ``improve'' broken protocols by looking
2822through its data and mangling it. It mangles IP addresses,
2823only IP addresses and nothing but IP addresses.
2824This also, is not missing any functionality.
2825
2826To resume: if you need to compress address space or keep
2827active FTP clients happy, your choice is not route NAT but masquerading,
2828port forwarding, NAPT etc. 
2829\begin{NB}
2830By the way, you may also want to look at
2831http://www.suse.com/\~mha/HyperNews/get/linux-ip-nat.html
2832\end{NB}
2833
2834
2835\paragraph{How it works.}
2836Some part of the address space is reserved for dummy addresses
2837which will look for all the world like some host addresses
2838inside your network. No other hosts may use these addresses,
2839however other routers may also be configured to translate them.
2840\begin{NB}
2841A great advantage of route NAT is that it may be used not
2842only in stub networks but in environments with arbitrarily complicated
2843structure. It does not firewall, it {\em forwards.}
2844\end{NB}
2845These addresses are selected by the \verb|ip route| command
2846(sec.\ref{IP-ROUTE-ADD}, p.\pageref{IP-ROUTE-ADD}). F.e.\
2847\begin{verbatim}
2848  ip route add nat 192.203.80.144 via 193.233.7.83
2849\end{verbatim}
2850states that the single address 192.203.80.144 is a dummy NAT address.
2851For all the world it looks like a host address inside our network.
2852For neighbouring hosts and routers it looks like the local address
2853of the translating router. The router answers ARP for it, advertises
2854this address as routed via it, {\em et al\/}. When the router
2855receives a packet destined for 192.203.80.144, it replaces 
2856this address with 193.233.7.83 which is the address of some real
2857host and forwards the packet. If you need to remap
2858blocks of addresses, you may use a command like:
2859\begin{verbatim}
2860  ip route add nat 192.203.80.192/26 via 193.233.7.64
2861\end{verbatim}
2862This command will map a block of 63 addresses 192.203.80.192-255 to
2863193.233.7.64-127.
2864
2865When an internal host (193.233.7.83 in the example above)
2866sends something to the outer world and these packets are forwarded
2867by our router, it should translate the source address 193.233.7.83
2868into 192.203.80.144. This task is solved by setting a special
2869policy rule (sec.\ref{IP-RULE-ADD}, p.\pageref{IP-RULE-ADD}):
2870\begin{verbatim}
2871  ip rule add prio 320 from 193.233.7.83 nat 192.203.80.144
2872\end{verbatim}
2873This rule says that the source address 193.233.7.83
2874should be translated into 192.203.80.144 before forwarding.
2875It is important that the address after the \verb|nat| keyword
2876is some NAT address, declared by {\tt ip route add nat}.
2877If it is just a random address the router will not map to it.
2878\begin{NB}
2879The exception is when the address is a local address of this
2880router (or 0.0.0.0) and masquerading is configured in the linux-2.2
2881kernel. In this case the router will masquerade the packets as this address.
2882If 0.0.0.0 is selected, the result is equivalent to one
2883obtained with firewalling rules. Otherwise, you have the way
2884to order Linux to masquerade to this fixed address.
2885NAT mechanism used in linux-2.4 is more flexible than
2886masquerading, so that this feature has lost meaning and disabled.
2887\end{NB}
2888
2889If the network has non-trivial internal structure, it is
2890useful and even necessary to add rules disabling translation
2891when a packet does not leave this network. Let us return to the
2892example from sec.\ref{IP-RULE-SHOW} (p.\pageref{IP-RULE-SHOW}).
2893\begin{verbatim}
2894300:	from 193.233.7.83 to 193.233.7.0/24 lookup main
2895310:	from 193.233.7.83 to 192.203.80.0/24 lookup main
2896320:	from 193.233.7.83 lookup inr.ruhep map-to 192.203.80.144
2897\end{verbatim}
2898This block of rules causes normal forwarding when
2899packets from 193.233.7.83 do not leave networks 193.233.7/24
2900and 192.203.80/24. Also, if the \verb|inr.ruhep| table does not
2901contain a route to the destination (which means that the routing
2902domain owning addresses from 192.203.80/24 is dead), no translation
2903will occur. Otherwise, the packets are translated.
2904
2905\paragraph{How to only translate selected ports:}
2906If you only want to translate selected ports (f.e.\ http)
2907and leave the rest intact, you may use \verb|ipchains|
2908to \verb|fwmark| a class of packets.
2909Suppose you did and all the packets from 193.233.7.83
2910destined for port 80 are marked with marker 0x1234 in input fwchain.
2911In this case you may replace rule \#320 with:
2912\begin{verbatim}
2913320:	from 193.233.7.83 fwmark 1234 lookup main map-to 192.203.80.144
2914\end{verbatim}
2915and translation will only be enabled for outgoing http requests.
2916
2917\section{Example: minimal host setup}
2918\label{EXAMPLE-SETUP}
2919
2920The following script gives an example of a fault safe
2921setup of IP (and IPv6, if it is compiled into the kernel)
2922in the common case of a node attached to a single broadcast
2923network. A more advanced script, which may be used both on multihomed
2924hosts and on routers, is described in the following
2925section.
2926
2927The utilities used in the script may be found in the
2928directory ftp://ftp.inr.ac.ru/ip-routing/:
2929\begin{enumerate}
2930\item \verb|ip| --- package \verb|iproute2|.
2931\item \verb|arping| --- package \verb|iputils|.
2932\item \verb|rdisc| --- package \verb|iputils|.
2933\end{enumerate}
2934\begin{NB}
2935It also refers to a DHCP client, \verb|dhcpcd|. I should refrain from
2936recommending a good DHCP client to use. All that I can
2937say is that ISC \verb|dhcp-2.0b1pl6| patched with the patch that
2938can be found in the \verb|dhcp.bootp.rarp| subdirectory of
2939the same ftp site {\em does\/} work,
2940at least on Ethernet and Token Ring.
2941\end{NB}
2942
2943\begin{verbatim}
2944#! /bin/bash
2945\end{verbatim}
2946\begin{flushleft}
2947\# {\bf Usage: \verb|ifone ADDRESS[/PREFIX-LENGTH] [DEVICE]|}\\
2948\# {\bf Parameters:}\\
2949\# \$1 --- Static IP address, optionally followed by prefix length.\\
2950\# \$2 --- Device name. If it is missing, \verb|eth0| is asssumed.\\
2951\# F.e. \verb|ifone 193.233.7.90|
2952\end{flushleft}
2953\begin{verbatim}
2954dev=$2
2955: ${dev:=eth0}
2956ipaddr=
2957\end{verbatim}
2958\# Parse IP address, splitting prefix length.
2959\begin{verbatim}
2960if [ "$1" != "" ]; then
2961  ipaddr=${1%/*}
2962  if [ "$1" != "$ipaddr" ]; then
2963    pfxlen=${1#*/}
2964  fi
2965  : ${pfxlen:=24}
2966fi
2967pfx="${ipaddr}/${pfxlen}"
2968\end{verbatim}
2969
2970\begin{flushleft}
2971\# {\bf Step 0} --- enable loopback.\\
2972\#\\
2973\# This step is necessary on any networked box before attempt\\
2974\# to configure any other device.\\
2975\end{flushleft}
2976\begin{verbatim}
2977ip link set up dev lo
2978ip addr add 127.0.0.1/8 dev lo brd + scope host
2979\end{verbatim}
2980\begin{flushleft}
2981\# IPv6 autoconfigure themself on loopback.\\
2982\#\\
2983\# If user gave loopback as device, we add the address as alias and exit.
2984\end{flushleft}
2985\begin{verbatim}
2986if [ "$dev" = "lo" ]; then
2987  if [ "$ipaddr" != "" -a  "$ipaddr" != "127.0.0.1" ]; then
2988    ip address add $ipaddr dev $dev
2989    exit $?
2990  fi
2991  exit 0
2992fi
2993\end{verbatim}
2994
2995\noindent\# {\bf Step 1} --- enable device \verb|$dev|
2996
2997\begin{verbatim}
2998if ! ip link set up dev $dev ; then
2999  echo "Cannot enable interface $dev. Aborting." 1>&2
3000  exit 1
3001fi
3002\end{verbatim}
3003\begin{flushleft}
3004\# The interface is \verb|UP|. IPv6 started stateless autoconfiguration itself,\\
3005\# and its configuration finishes here. However,\\
3006\# IP still needs some static preconfigured address.
3007\end{flushleft}
3008\begin{verbatim}
3009if [ "$ipaddr" = "" ]; then
3010  echo "No address for $dev is configured, trying DHCP..." 1>&2
3011  dhcpcd
3012  exit $?
3013fi
3014\end{verbatim}
3015
3016\begin{flushleft}
3017\# {\bf Step 2} --- IP Duplicate Address Detection~\cite{RFC-DHCP}.\\
3018\# Send two probes and wait for result for 3 seconds.\\
3019\# If the interface opens slower f.e.\ due to long media detection,\\
3020\# you want to increase the timeout.\\
3021\end{flushleft}
3022\begin{verbatim}
3023if ! arping -q -c 2 -w 3 -D -I $dev $ipaddr ; then
3024  echo "Address $ipaddr is busy, trying DHCP..." 1>&2
3025  dhcpcd
3026  exit $?
3027fi
3028\end{verbatim}
3029\begin{flushleft}
3030\# OK, the address is unique, we may add it on the interface.\\
3031\#\\
3032\# {\bf Step 3} --- Configure the address on the interface.
3033\end{flushleft}
3034
3035\begin{verbatim}
3036if ! ip address add $pfx brd + dev $dev; then
3037  echo "Failed to add $pfx on $dev, trying DHCP..." 1>&2
3038  dhcpcd
3039  exit $?
3040fi
3041\end{verbatim}
3042
3043\noindent\# {\bf Step 4} --- Announce our presence on the link.
3044\begin{verbatim}
3045arping -A -c 1 -I $dev $ipaddr
3046noarp=$?
3047( sleep 2;
3048  arping -U -c 1 -I $dev $ipaddr ) >& /dev/null </dev/null &
3049\end{verbatim}
3050
3051\begin{flushleft}
3052\# {\bf Step 5} (optional) --- Add some control routes.\\
3053\#\\
3054\# 1. Prohibit link local multicast addresses.\\
3055\# 2. Prohibit link local (alias, limited) broadcast.\\
3056\# 3. Add default multicast route.
3057\end{flushleft}
3058\begin{verbatim}
3059ip route add unreachable 224.0.0.0/24 
3060ip route add unreachable 255.255.255.255
3061if [ `ip link ls $dev | grep -c MULTICAST` -ge 1 ]; then
3062  ip route add 224.0.0.0/4 dev $dev scope global
3063fi
3064\end{verbatim}
3065
3066\begin{flushleft}
3067\# {\bf Step 6} --- Add fallback default route with huge metric.\\
3068\# If a proxy ARP server is present on the interface, we will be\\
3069\# able to talk to all the Internet without further configuration.\\
3070\# It is not so cheap though and we still hope that this route\\
3071\# will be overridden by more correct one by rdisc.\\
3072\# Do not make this step if the device is not ARPable,\\
3073\# because dead nexthop detection does not work on them.
3074\end{flushleft}
3075\begin{verbatim}
3076if [ "$noarp" = "0" ]; then
3077  ip ro add default dev $dev metric 30000 scope global
3078fi
3079\end{verbatim}
3080
3081\begin{flushleft}
3082\# {\bf Step 7} --- Restart router discovery and exit.
3083\end{flushleft}
3084\begin{verbatim}
3085killall -HUP rdisc || rdisc -fs
3086exit 0
3087\end{verbatim}
3088
3089
3090\section{Example: {\protect\tt ifcfg} --- interface address management}
3091\label{EXAMPLE-IFCFG}
3092
3093This is a simplistic script replacing one option of \verb|ifconfig|,
3094namely, IP address management. It not only adds
3095addresses, but also carries out Duplicate Address Detection~\cite{RFC-DHCP},
3096sends unsolicited ARP to update the caches of other hosts sharing
3097the interface, adds some control routes and restarts Router Discovery
3098when it is necessary.
3099
3100I strongly recommend using it {\em instead\/} of \verb|ifconfig| both
3101on hosts and on routers.
3102
3103\begin{verbatim}
3104#! /bin/bash
3105\end{verbatim}
3106\begin{flushleft}
3107\# {\bf Usage: \verb?ifcfg DEVICE[:ALIAS] [add|del] ADDRESS[/LENGTH] [PEER]?}\\
3108\# {\bf Parameters:}\\
3109\# ---Device name. It may have alias suffix, separated by colon.\\
3110\# ---Command: add, delete or stop.\\
3111\# ---IP address, optionally followed by prefix length.\\
3112\# ---Optional peer address for pointopoint interfaces.\\
3113\# F.e. \verb|ifcfg eth0 193.233.7.90/24|
3114
3115\noindent\# This function determines, whether it is router or host.\\
3116\# It returns 0, if the host is apparently not router.
3117\end{flushleft}
3118\begin{verbatim}
3119CheckForwarding () {
3120  local sbase fwd
3121  sbase=/proc/sys/net/ipv4/conf
3122  fwd=0
3123  if [ -d $sbase ]; then
3124    for dir in $sbase/*/forwarding; do
3125      fwd=$[$fwd + `cat $dir`]
3126    done
3127  else
3128    fwd=2
3129  fi
3130  return $fwd
3131}
3132\end{verbatim}
3133\begin{flushleft}
3134\# This function restarts Router Discovery.\\
3135\end{flushleft}
3136\begin{verbatim}
3137RestartRDISC () {
3138  killall -HUP rdisc || rdisc -fs
3139}
3140\end{verbatim}
3141\begin{flushleft}
3142\# Calculate ABC "natural" mask length\\
3143\# Arg: \$1 = dotquad address
3144\end{flushleft}
3145\begin{verbatim}
3146ABCMaskLen () {
3147  local class;
3148  class=${1%%.*}
3149  if [ $class -eq 0 -o $class -ge 224 ]; then return 0
3150  elif [ $class -ge 192 ]; then return 24
3151  elif [ $class -ge 128 ]; then return 16
3152  else  return 8 ; fi
3153}
3154\end{verbatim}
3155
3156
3157\begin{flushleft}
3158\# {\bf MAIN()}\\
3159\#\\
3160\# Strip alias suffix separated by colon.
3161\end{flushleft}
3162\begin{verbatim}
3163label="label $1"
3164ldev=$1
3165dev=${1%:*}
3166if [ "$dev" = "" -o "$1" = "help" ]; then
3167  echo "Usage: ifcfg DEV [[add|del [ADDR[/LEN]] [PEER] | stop]" 1>&2
3168  echo "       add - add new address" 1>&2
3169  echo "       del - delete address" 1>&2
3170  echo "       stop - completely disable IP" 1>&2
3171  exit 1
3172fi
3173shift
3174
3175CheckForwarding
3176fwd=$?
3177\end{verbatim}
3178\begin{flushleft}
3179\# Parse command. If it is ``stop'', flush and exit.
3180\end{flushleft}
3181\begin{verbatim}
3182deleting=0
3183case "$1" in
3184add) shift ;;
3185stop)
3186  if [ "$ldev" != "$dev" ]; then
3187    echo "Cannot stop alias $ldev" 1>&2
3188    exit 1;
3189  fi
3190  ip -4 addr flush dev $dev $label || exit 1
3191  if [ $fwd -eq 0 ]; then RestartRDISC; fi
3192  exit 0 ;;
3193del*)
3194  deleting=1; shift ;;
3195*)
3196esac
3197\end{verbatim}
3198\begin{flushleft}
3199\# Parse prefix, split prefix length, separated by slash.
3200\end{flushleft}
3201\begin{verbatim}
3202ipaddr=
3203pfxlen=
3204if [ "$1" != "" ]; then
3205  ipaddr=${1%/*}
3206  if [ "$1" != "$ipaddr" ]; then
3207    pfxlen=${1#*/}
3208  fi
3209  if [ "$ipaddr" = "" ]; then
3210    echo "$1 is bad IP address." 1>&2
3211    exit 1
3212  fi
3213fi
3214shift
3215\end{verbatim}
3216\begin{flushleft}
3217\# If peer address is present, prefix length is 32.\\
3218\# Otherwise, if prefix length was not given, guess it.
3219\end{flushleft}
3220\begin{verbatim}
3221peer=$1
3222if [ "$peer" != "" ]; then
3223  if [ "$pfxlen" != "" -a "$pfxlen" != "32" ]; then
3224    echo "Peer address with non-trivial netmask." 1>&2
3225    exit 1
3226  fi
3227  pfx="$ipaddr peer $peer"
3228else
3229  if [ "$pfxlen" = "" ]; then
3230    ABCMaskLen $ipaddr
3231    pfxlen=$?
3232  fi
3233  pfx="$ipaddr/$pfxlen"
3234fi
3235if [ "$ldev" = "$dev" -a "$ipaddr" != "" ]; then
3236  label=
3237fi
3238\end{verbatim}
3239\begin{flushleft}
3240\# If deletion was requested, delete the address and restart RDISC
3241\end{flushleft}
3242\begin{verbatim}
3243if [ $deleting -ne 0 ]; then
3244  ip addr del $pfx dev $dev $label || exit 1
3245  if [ $fwd -eq 0 ]; then RestartRDISC; fi
3246  exit 0
3247fi
3248\end{verbatim}
3249\begin{flushleft}
3250\# Start interface initialization.\\
3251\#\\
3252\# {\bf Step 0} --- enable device \verb|$dev|
3253\end{flushleft}
3254\begin{verbatim}
3255if ! ip link set up dev $dev ; then
3256  echo "Error: cannot enable interface $dev." 1>&2
3257  exit 1
3258fi
3259if [ "$ipaddr" = "" ]; then exit 0; fi
3260\end{verbatim}
3261\begin{flushleft}
3262\# {\bf Step 1} --- IP Duplicate Address Detection~\cite{RFC-DHCP}.\\
3263\# Send two probes and wait for result for 3 seconds.\\
3264\# If the interface opens slower f.e.\ due to long media detection,\\
3265\# you want to increase the timeout.\\
3266\end{flushleft}
3267\begin{verbatim}
3268if ! arping -q -c 2 -w 3 -D -I $dev $ipaddr ; then
3269  echo "Error: some host already uses address $ipaddr on $dev." 1>&2
3270  exit 1
3271fi
3272\end{verbatim}
3273\begin{flushleft}
3274\# OK, the address is unique. We may add it to the interface.\\
3275\#\\
3276\# {\bf Step 2} --- Configure the address on the interface.
3277\end{flushleft}
3278\begin{verbatim}
3279if ! ip address add $pfx brd + dev $dev $label; then
3280  echo "Error: failed to add $pfx on $dev." 1>&2
3281  exit 1
3282fi
3283\end{verbatim}
3284\noindent\# {\bf Step 3} --- Announce our presence on the link
3285\begin{verbatim}
3286arping -q -A -c 1 -I $dev $ipaddr
3287noarp=$?
3288( sleep 2 ;
3289  arping -q -U -c 1 -I $dev $ipaddr ) >& /dev/null </dev/null &
3290\end{verbatim}
3291\begin{flushleft}
3292\# {\bf Step 4} (optional) --- Add some control routes.\\
3293\#\\
3294\# 1. Prohibit link local multicast addresses.\\
3295\# 2. Prohibit link local (alias, limited) broadcast.\\
3296\# 3. Add default multicast route.
3297\end{flushleft}
3298\begin{verbatim}
3299ip route add unreachable 224.0.0.0/24 >& /dev/null 
3300ip route add unreachable 255.255.255.255 >& /dev/null
3301if [ `ip link ls $dev | grep -c MULTICAST` -ge 1 ]; then
3302  ip route add 224.0.0.0/4 dev $dev scope global >& /dev/null
3303fi
3304\end{verbatim}
3305\begin{flushleft}
3306\# {\bf Step 5} --- Add fallback default route with huge metric.\\
3307\# If a proxy ARP server is present on the interface, we will be\\
3308\# able to talk to all the Internet without further configuration.\\
3309\# Do not make this step on router or if the device is not ARPable.\\
3310\# because dead nexthop detection does not work on them.
3311\end{flushleft}
3312\begin{verbatim}
3313if [ $fwd -eq 0 ]; then
3314  if [ $noarp -eq 0 ]; then
3315    ip ro append default dev $dev metric 30000 scope global
3316  elif [ "$peer" != "" ]; then
3317    if ping -q -c 2 -w 4 $peer ; then
3318      ip ro append default via $peer dev $dev metric 30001
3319    fi
3320  fi
3321  RestartRDISC
3322fi
3323
3324exit 0
3325\end{verbatim}
3326\begin{flushleft}
3327\# End of {\bf MAIN()}
3328\end{flushleft}
3329
3330
3331\end{document}
3332