1\documentstyle[12pt,twoside]{article} 2\def\TITLE{Tunnels over IP} 3\input preamble 4\begin{center} 5\Large\bf Tunnels over IP in Linux-2.2 6\end{center} 7 8 9\begin{center} 10{ \large Alexey~N.~Kuznetsov } \\ 11\em Institute for Nuclear Research, Moscow \\ 12\verb|kuznet@ms2.inr.ac.ru| \\ 13\rm March 17, 1999 14\end{center} 15 16\vspace{5mm} 17 18\tableofcontents 19 20 21\section{Instead of introduction: micro-FAQ.} 22 23\begin{itemize} 24 25\item 26Q: In linux-2.0.36 I used: 27\begin{verbatim} 28 ifconfig tunl1 10.0.0.1 pointopoint 193.233.7.65 29\end{verbatim} 30to create tunnel. It does not work in 2.2.0! 31 32A: You are right, it does not work. The command written above is split to two commands. 33\begin{verbatim} 34 ip tunnel add MY-TUNNEL mode ipip remote 193.233.7.65 35\end{verbatim} 36will create tunnel device with name \verb|MY-TUNNEL|. Now you may configure 37it with: 38\begin{verbatim} 39 ifconfig MY-TUNNEL 10.0.0.1 40\end{verbatim} 41Certainly, if you prefer name \verb|tunl1| to \verb|MY-TUNNEL|, 42you still may use it. 43 44\item 45Q: In linux-2.0.36 I used: 46\begin{verbatim} 47 ifconfig tunl0 10.0.0.1 48 route add -net 10.0.0.0 gw 193.233.7.65 dev tunl0 49\end{verbatim} 50to tunnel net 10.0.0.0 via router 193.233.7.65. It does not 51work in 2.2.0! Moreover, \verb|route| prints a funny error sort of 52``network unreachable'' and after this I found a strange direct route 53to 10.0.0.0 via \verb|tunl0| in routing table. 54 55A: Yes, in 2.2 the rule that {\em normal} gateway must reside on directly 56connected network has not any exceptions. You may tell kernel, that 57this particular route is {\em abnormal}: 58\begin{verbatim} 59 ifconfig tunl0 10.0.0.1 netmask 255.255.255.255 60 ip route add 10.0.0.0/8 via 193.233.7.65 dev tunl0 onlink 61\end{verbatim} 62Note keyword \verb|onlink|, it is the magic key that orders kernel 63not to check for consistency of gateway address. 64Probably, after this explanation you have already guessed another method 65to cheat kernel: 66\begin{verbatim} 67 ifconfig tunl0 10.0.0.1 netmask 255.255.255.255 68 route add -host 193.233.7.65 dev tunl0 69 route add -net 10.0.0.0 netmask 255.0.0.0 gw 193.233.7.65 70 route del -host 193.233.7.65 dev tunl0 71\end{verbatim} 72Well, if you like such tricks, nobody may prohibit you to use them. 73Only do not forget 74that between \verb|route add| and \verb|route del| host 193.233.7.65 is 75unreachable. 76 77\item 78Q: In 2.0.36 I used to load \verb|tunnel| device module and \verb|ipip| module. 79I cannot find any \verb|tunnel| in 2.2! 80 81A: Linux-2.2 has single module \verb|ipip| for both directions of tunneling 82and for all IPIP tunnel devices. 83 84\item 85Q: \verb|traceroute| does not work over tunnel! Well, stop... It works, 86 only skips some number of hops. 87 88A: Yes. By default tunnel driver copies \verb|ttl| value from 89inner packet to outer one. It means that path traversed by tunneled 90packets to another endpoint is not hidden. If you dislike this, or if you 91are going to use some routing protocol expecting that packets 92with ttl 1 will reach peering host (f.e.\ RIP, OSPF or EBGP) 93and you are not afraid of 94tunnel loops, you may append option \verb|ttl 64|, when creating tunnel 95with \verb|ip tunnel add|. 96 97\item 98Q: ... Well, list of things, which 2.0 was able to do finishes. 99 100\end{itemize} 101 102\paragraph{Summary of differences between 2.2 and 2.0.} 103 104\begin{itemize} 105 106\item {\bf In 2.0} you could compile tunnel device into kernel 107 and got set of 4 devices \verb|tunl0| ... \verb|tunl3| or, 108 alternatively, compile it as module and load new module 109 for each new tunnel. Also, module \verb|ipip| was necessary 110 to receive tunneled packets. 111 112 {\bf 2.2} has {\em one\/} module \verb|ipip|. Loading it you get base 113 tunnel device \verb|tunl0| and another tunnels may be created with command 114 \verb|ip tunnel add|. These new devices may have arbitrary names. 115 116 117\item {\bf In 2.0} you set remote tunnel endpoint address with 118 the command \verb|ifconfig| ... \verb|pointopoint A|. 119 120 {\bf In 2.2} this command has the same semantics on all 121 the interfaces, namely it sets not tunnel endpoint, 122 but address of peering host, which is directly reachable 123 via this tunnel, 124 rather than via Internet. Actual tunnel endpoint address \verb|A| 125 should be set with \verb|ip tunnel add ... remote A|. 126 127\item {\bf In 2.0} you create tunnel routes with the command: 128\begin{verbatim} 129 route add -net 10.0.0.0 gw A dev tunl0 130\end{verbatim} 131 132 {\bf 2.2} interprets this command equally for all device 133 kinds and gateway is required to be directly reachable via this tunnel, 134 rather than via Internet. You still may use \verb|ip route add ... onlink| 135 to override this behaviour. 136 137\end{itemize} 138 139 140\section{Tunnel setup: basics} 141 142Standard Linux-2.2 kernel supports three flavor of tunnels, 143listed in the following table: 144\vspace{2mm} 145 146\begin{tabular}{lll} 147\vrule depth 0.8ex width 0pt\relax 148Mode & Description & Base device \\ 149ipip & IP over IP & tunl0 \\ 150sit & IPv6 over IP & sit0 \\ 151gre & ANY over GRE over IP & gre0 152\end{tabular} 153 154\vspace{2mm} 155 156\noindent All the kinds of tunnels are created with one command: 157\begin{verbatim} 158 ip tunnel add <NAME> mode <MODE> [ local <S> ] [ remote <D> ] 159\end{verbatim} 160 161This command creates new tunnel device with name \verb|<NAME>|. 162The \verb|<NAME>| is an arbitrary string. Particularly, 163it may be even \verb|eth0|. The rest of parameters set 164different tunnel characteristics. 165 166\begin{itemize} 167 168\item 169\verb|mode <MODE>| sets tunnel mode. Three modes are available now 170 \verb|ipip|, \verb|sit| and \verb|gre|. 171 172\item 173\verb|remote <D>| sets remote endpoint of the tunnel to IP 174 address \verb|<D>|. 175\item 176\verb|local <S>| sets fixed local address for tunneled 177 packets. It must be an address on another interface of this host. 178 179\end{itemize} 180 181\let\thefootnote\oldthefootnote 182 183Both \verb|remote| and \verb|local| may be omitted. In this case we 184say that they are zero or wildcard. Two tunnels of one mode cannot 185have the same \verb|remote| and \verb|local|. Particularly it means 186that base device or fallback tunnel cannot be replicated.\footnote{ 187This restriction is relaxed for keyed GRE tunnels.} 188 189Tunnels are divided to two classes: {\bf pointopoint} tunnels, which 190have some not wildcard \verb|remote| address and deliver all the packets 191to this destination, and {\bf NBMA} (i.e. Non-Broadcast Multi-Access) tunnels, 192which have no \verb|remote|. Particularly, base devices (f.e.\ \verb|tunl0|) 193are NBMA, because they have neither \verb|remote| nor 194\verb|local| addresses. 195 196 197After tunnel device is created you should configure it as you did 198it with another devices. Certainly, the configuration of tunnels has 199some features related to the fact that they work over existing Internet 200routing infrastructure and simultaneously create new virtual links, 201which changes this infrastructure. The danger that not enough careful 202tunnel setup will result in formation of tunnel loops, 203collapse of routing or flooding network with exponentially 204growing number of tunneled fragments is very real. 205 206 207Protocol setup on pointopoint tunnels does not differ of configuration 208of another devices. You should set a protocol address with \verb|ifconfig| 209and add routes with \verb|route| utility. 210 211NBMA tunnels are different. To route something via NBMA tunnel 212you have to explain to driver, where it should deliver packets to. 213The only way to make it is to create special routes with gateway 214address pointing to desired endpoint. F.e.\ 215\begin{verbatim} 216 ip route add 10.0.0.0/24 via <A> dev tunl0 onlink 217\end{verbatim} 218It is important to use option \verb|onlink|, otherwise 219kernel will refuse request to create route via gateway not directly 220reachable over device \verb|tunl0|. With IPv6 the situation is much simpler: 221when you start device \verb|sit0|, it automatically configures itself 222with all IPv4 addresses mapped to IPv6 space, so that all IPv4 223Internet is {\em really reachable} via \verb|sit0|! Excellent, the command 224\begin{verbatim} 225 ip route add 3FFE::/16 via ::193.233.7.65 dev sit0 226\end{verbatim} 227will route \verb|3FFE::/16| via \verb|sit0|, sending all the packets 228destined to this prefix to 193.233.7.65. 229 230\section{Tunnel setup: options} 231 232Command \verb|ip tunnel add| has several additional options. 233\begin{itemize} 234 235\item \verb|ttl N| --- set fixed TTL \verb|N| on tunneled packets. 236 \verb|N| is number in the range 1--255. 0 is special value, 237 meaning that packets inherit TTL value. 238 Default value is: \verb|inherit|. 239 240\item \verb|tos T| --- set fixed tos \verb|T| on tunneled packets. 241 Default value is: \verb|inherit|. 242 243\item \verb|dev DEV| --- bind tunnel to device \verb|DEV|, so that 244 tunneled packets will be routed only via this device and will 245 not be able to escape to another device, when route to endpoint changes. 246 247\item \verb|nopmtudisc| --- disable Path MTU Discovery on this tunnel. 248 It is enabled by default. Note that fixed ttl is incompatible 249 with this option: tunnels with fixed ttl always make pmtu discovery. 250 251\end{itemize} 252 253\verb|ipip| and \verb|sit| tunnels have no more options. \verb|gre| 254tunnels are more complicated: 255 256\begin{itemize} 257 258\item \verb|key K| --- use keyed GRE with key \verb|K|. \verb|K| is 259 either number or IP address-like dotted quad. 260 261\item \verb|csum| --- checksum tunneled packets. 262 263\item \verb|seq| --- serialize packets. 264\begin{NB} 265 I think this option does not 266 work. At least, I did not test it, did not debug it and 267 even do not understand, how it is supposed to work and for what 268 purpose Cisco planned to use it. 269\end{NB} 270 271\end{itemize} 272 273 274Actually, these GRE options can be set separately for input and 275output directions by prefixing corresponding keywords with letter 276\verb|i| or \verb|o|. F.e.\ \verb|icsum| orders to accept only 277packets with correct checksum and \verb|ocsum| means, that 278our host will calculate and send checksum. 279 280Command \verb|ip tunnel add| is not the only operation, 281which can be made with tunnels. Certainly, you may get short help page 282with: 283\begin{verbatim} 284 ip tunnel help 285\end{verbatim} 286 287Besides that, you may view list of installed tunnels with the help of command: 288\begin{verbatim} 289 ip tunnel ls 290\end{verbatim} 291Also you may look at statistics: 292\begin{verbatim} 293 ip -s tunnel ls Cisco 294\end{verbatim} 295where \verb|Cisco| is name of tunnel device. Command 296\begin{verbatim} 297 ip tunnel del Cisco 298\end{verbatim} 299destroys tunnel \verb|Cisco|. And, finally, 300\begin{verbatim} 301 ip tunnel change Cisco mode sit local ME remote HE ttl 32 302\end{verbatim} 303changes its parameters. 304 305\section{Differences 2.2 and 2.0 tunnels revisited.} 306 307Now we can discuss more subtle differences between tunneling in 2.0 308and 2.2. 309 310\begin{itemize} 311 312\item In 2.0 all tunneled packets were received promiscuously 313as soon as you loaded module \verb|ipip|. 2.2 tries to select the best 314tunnel device and packet looks as received on this. F.e.\ if host 315received \verb|ipip| packet from host \verb|D| destined to our 316local address \verb|S|, kernel searches for matching tunnels 317in order: 318 319\begin{tabular}{ll} 3201 & \verb|remote| is \verb|D| and \verb|local| is \verb|S| \\ 3212 & \verb|remote| is \verb|D| and \verb|local| is wildcard \\ 3223 & \verb|remote| is wildcard and \verb|local| is \verb|S| \\ 3234 & \verb|tunl0| 324\end{tabular} 325 326If tunnel exists, but it is not in \verb|UP| state, the tunnel is ignored. 327Note, that if \verb|tunl0| is \verb|UP| it receives all the IPIP packets, 328not acknowledged by more specific tunnels. 329Be careful, it means that without carefully installed firewall rules 330anyone on the Internet may inject to your network any packets with 331source addresses indistinguishable from local ones. It is not so bad idea 332to design tunnels in the way enforcing maximal route symmetry 333and to enable reversed path filter (\verb|rp_filter| sysctl option) on 334tunnel devices. 335 336\item In 2.2 you can monitor and debug tunnels with \verb|tcpdump|. 337F.e.\ \verb|tcpdump| \verb|-i Cisco| \verb|-nvv| will dump packets, 338which kernel output, via tunnel \verb|Cisco| and the packets received on it 339from kernel viewpoint. 340 341\end{itemize} 342 343 344\section{Linux and Cisco IOS tunnels.} 345 346Among another tunnels Cisco IOS supports IPIP and GRE. 347Essentially, Cisco setup is subset of options, available for Linux. 348Let us consider the simplest example: 349 350\begin{verbatim} 351interface Tunnel0 352 tunnel mode gre ip 353 tunnel source 10.10.14.1 354 tunnel destination 10.10.13.2 355\end{verbatim} 356 357 358This command set translates to: 359 360\begin{verbatim} 361 ip tunnel add Tunnel0 \ 362 mode gre \ 363 local 10.10.14.1 \ 364 remote 10.10.13.2 365\end{verbatim} 366 367Any questions? No questions. 368 369\section{Interaction IPIP tunnels and DVMRP.} 370 371DVMRP exploits IPIP tunnels to route multicasts via Internet. 372\verb|mrouted| creates 373IPIP tunnels listed in its configuration file automatically. 374From kernel and user viewpoints there are no differences between 375tunnels, created in this way, and tunnels created by \verb|ip tunnel|. 376I.e.\ if \verb|mrouted| created some tunnel, it may be used to 377route unicast packets, provided appropriate routes are added. 378And vice versa, if administrator has already created a tunnel, 379it will be reused by \verb|mrouted|, if it requests DVMRP 380tunnel with the same local and remote addresses. 381 382Do not wonder, if your manually configured tunnel is 383destroyed, when mrouted exits. 384 385 386\section{Broadcast GRE ``tunnels''.} 387 388It is possible to set \verb|remote| for GRE tunnel to a multicast 389address. Such tunnel becomes {\bf broadcast} tunnel (though word 390tunnel is not quite appropriate in this case, it is rather virtual network). 391\begin{verbatim} 392 ip tunnel add Universe local 193.233.7.65 \ 393 remote 224.66.66.66 ttl 16 394 ip addr add 10.0.0.1/16 dev Universe 395 ip link set Universe up 396\end{verbatim} 397This tunnel is true broadcast network and broadcast packets are 398sent to multicast group 224.66.66.66. By default such tunnel starts 399to resolve both IP and IPv6 addresses via ARP/NDISC, so that 400if multicast routing is supported in surrounding network, all GRE nodes 401will find one another automatically and will form virtual Ethernet-like 402broadcast network. If multicast routing does not work, it is unpleasant 403but not fatal flaw. The tunnel becomes NBMA rather than broadcast network. 404You may disable dynamic ARPing by: 405\begin{verbatim} 406 echo 0 > /proc/sys/net/ipv4/neigh/Universe/mcast_solicit 407\end{verbatim} 408and to add required information to ARP tables manually: 409\begin{verbatim} 410 ip neigh add 10.0.0.2 lladdr 128.6.190.2 dev Universe nud permanent 411\end{verbatim} 412In this case packets sent to 10.0.0.2 will be encapsulated in GRE 413and sent to 128.6.190.2. It is possible to facilitate address resolution 414using methods typical for another NBMA networks f.e.\ to start user 415level \verb|arpd| daemon, which will maintain database of hosts attached 416to GRE virtual network or ask for information 417dedicated ARP or NHRP server. 418 419 420Actually, such setup is the most natural for tunneling, 421it is really flexible, scalable and easily managable, so that 422it is strongly recommended to be used with GRE tunnels instead of ugly 423hack with NBMA mode and \verb|onlink| modifier. Unfortunately, 424by historical reasons broadcast mode is not supported by IPIP tunnels, 425but this probably will change in future. 426 427 428 429\section{Traffic control issues.} 430 431Tunnels are devices, hence all the power of Linux traffic control 432applies to them. The simplest (and the most useful in practice) 433example is limiting tunnel bandwidth. The following command: 434\begin{verbatim} 435 tc qdisc add dev tunl0 root tbf \ 436 rate 128Kbit burst 4K limit 10K 437\end{verbatim} 438will limit tunneled traffic to 128Kbit with maximal burst size of 4K 439and queuing not more than 10K. 440 441However, you should remember, that tunnels are {\em virtual} devices 442implemented in software and true queue management is impossible for them 443just because they have no queues. Instead, it is better to create classes 444on real physical interfaces and to map tunneled packets to them. 445In general case of dynamic routing you should create such classes 446on all outgoing interfaces, or, alternatively, 447to use option \verb|dev DEV| to bind tunnel to a fixed physical device. 448In the last case packets will be routed only via specified device 449and you need to setup corresponding classes only on it. 450Though you have to pay for this convenience, 451if routing will change, your tunnel will fail. 452 453Suppose that CBQ class \verb|1:ABC| has been created on device \verb|eth0| 454specially for tunnel \verb|Cisco| with endpoints \verb|S| and \verb|D|. 455Now you can select IPIP packets with addresses \verb|S| and \verb|D| 456with some classifier and map them to class \verb|1:ABC|. F.e.\ 457it is easy to make with \verb|rsvp| classifier: 458\begin{verbatim} 459 tc filter add dev eth0 pref 100 proto ip rsvp \ 460 session D ipproto ipip filter S \ 461 classid 1:ABC 462\end{verbatim} 463 464If you want to make more detailed classification of sub-flows 465transmitted via tunnel, you can build CBQ subtree, 466rooted at \verb|1:ABC| and attach to subroot set of rules parsing 467IPIP packets more deeply. 468 469\end{document} 470