1\documentstyle[12pt,twoside]{article} 2\def\TITLE{IPv6 Flow Labels} 3\input preamble 4\begin{center} 5\Large\bf IPv6 Flow Labels in Linux-2.2. 6\end{center} 7 8 9\begin{center} 10{ \large Alexey~N.~Kuznetsov } \\ 11\em Institute for Nuclear Research, Moscow \\ 12\verb|kuznet@ms2.inr.ac.ru| \\ 13\rm April 11, 1999 14\end{center} 15 16\vspace{5mm} 17 18\tableofcontents 19 20\section{Introduction.} 21 22Every IPv6 packet carries 28 bits of flow information. RFC2460 splits 23these bits to two fields: 8 bits of traffic class (or DS field, if you 24prefer this term) and 20 bits of flow label. Currently there exist 25no well-defined API to manage IPv6 flow information. In this document 26I describe an attempt to design the API for Linux-2.2 IPv6 stack. 27 28\vskip 1mm 29 30The API must solve the following tasks: 31 32\begin{enumerate} 33 34\item To allow user to set traffic class bits. 35 36\item To allow user to read traffic class bits of received packets. 37This feature is not so useful as the first one, however it will be 38necessary f.e.\ to implement ECN [RFC2481] for datagram oriented services 39or to implement receiver side of SRP or another end-to-end protocol 40using traffic class bits. 41 42\item To assign flow labels to packets sent by user. 43 44\item To get flow labels of received packets. I do not know 45any applications of this feature, but it is possible that receiver will 46want to use flow labels to distinguish sub-flows. 47 48\item To allocate flow labels in the way, compliant to RFC2460. Namely: 49 50\begin{itemize} 51\item 52Flow labels must be uniformly distributed (pseudo-)random numbers, 53so that any subset of 20 bits can be used as hash key. 54 55\item 56Flows with coinciding source address and flow label must have identical 57destination address and not-fragmentable extensions headers (i.e.\ 58hop by hop options and all the headers up to and including routing header, 59if it is present.) 60 61\begin{NB} 62There is a hole in specs: some hop-by-hop options can be 63defined only on per-packet base (f.e.\ jumbo payload option). 64Essentially, it means that such options cannot present in packets 65with flow labels. 66\end{NB} 67\begin{NB} 68NB notes here and below reflect only my personal opinion, 69they should be read with smile or should not be read at all :-). 70\end{NB} 71 72 73\item 74Flow labels have finite lifetime and source is not allowed to reuse 75flow label for another flow within the maximal lifetime has expired, 76so that intermediate nodes will be able to invalidate flow state before 77the label is taken over by another flow. 78Flow state, including lifetime, is propagated along datagram path 79by some application specific methods 80(f.e.\ in RSVP PATH messages or in some hop-by-hop option). 81 82 83\end{itemize} 84 85\end{enumerate} 86 87\section{Sending/receiving flow information.} 88 89\paragraph{Discussion.} 90\addcontentsline{toc}{subsection}{Discussion} 91It was proposed (Where? I do not remember any explicit statement) 92to solve the first four tasks using 93\verb|sin6_flowinfo| field added to \verb|struct| \verb|sockaddr_in6| 94(see RFC2553). 95 96\begin{NB} 97 This method is difficult to consider as reasonable, because it 98 puts additional overhead to all the services, despite of only 99 very small subset of them (none, to be more exact) really use it. 100 It contradicts both to IETF spirit and the letter. Before RFC2553 101 one justification existed, IPv6 address alignment left 4 byte 102 hole in \verb|sockaddr_in6| in any case. Now it has no justification. 103\end{NB} 104 105We have two problems with this method. The first one is common for all OSes: 106if \verb|recvmsg()| initializes \verb|sin6_flowinfo| to flow info 107of received packet, we loose one very important property of BSD socket API, 108namely, we are not allowed to use received address for reply directly 109and have to mangle it, even if we are not interested in flowinfo subtleties. 110 111\begin{NB} 112 RFC2553 adds new requirement: to clear \verb|sin6_flowinfo|. 113 Certainly, it is not solution but rather attempt to force applications 114 to make unnecessary work. Well, as usually, one mistake in design 115 is followed by attempts to patch the hole and more mistakes... 116\end{NB} 117 118Another problem is Linux specific. Historically Linux IPv6 did not 119initialize \verb|sin6_flowinfo| at all, so that, if kernel does not 120support flow labels, this field is not zero, but a random number. 121Some applications also did not take care about it. 122 123\begin{NB} 124Following RFC2553 such applications can be considered as broken, 125but I still think that they are right: clearing all the address 126before filling known fields is robust but stupid solution. 127Useless wasting CPU cycles and 128memory bandwidth is not a good idea. Such patches are acceptable 129as temporary hacks, but not as standard of the future. 130\end{NB} 131 132 133\paragraph{Implementation.} 134\addcontentsline{toc}{subsection}{Implementation} 135By default Linux IPv6 does not read \verb|sin6_flowinfo| field 136assuming that common applications are not obliged to initialize it 137and are permitted to consider it as pure alignment padding. 138In order to tell kernel that application 139is aware of this field, it is necessary to set socket option 140\verb|IPV6_FLOWINFO_SEND|. 141 142\begin{verbatim} 143 int on = 1; 144 setsockopt(sock, SOL_IPV6, IPV6_FLOWINFO_SEND, 145 (void*)&on, sizeof(on)); 146\end{verbatim} 147 148Linux kernel never fills \verb|sin6_flowinfo| field, when passing 149message to user space, though the kernels which support flow labels 150initialize it to zero. If user wants to get received flowinfo, he 151will set option \verb|IPV6_FLOWINFO| and after this he will receive 152flowinfo as ancillary data object of type \verb|IPV6_FLOWINFO| 153(cf.\ RFC2292). 154 155\begin{verbatim} 156 int on = 1; 157 setsockopt(sock, SOL_IPV6, IPV6_FLOWINFO, (void*)&on, sizeof(on)); 158\end{verbatim} 159 160Flowinfo received and latched by a connected TCP socket also may be fetched 161with \verb|getsockopt()| \verb|IPV6_PKTOPTIONS| together with 162another optional information. 163 164Besides that, in the spirit of RFC2292 the option \verb|IPV6_FLOWINFO| 165may be used as alternative way to send flowinfo with \verb|sendmsg()| or 166to latch it with \verb|IPV6_PKTOPTIONS|. 167 168\paragraph{Note about IPv6 options and destination address.} 169\addcontentsline{toc}{subsection}{IPv6 options and destination address} 170If \verb|sin6_flowinfo| does contain not zero flow label, 171destination address in \verb|sin6_addr| and non-fragmentable 172extension headers are ignored. Instead, kernel uses the values 173cached at flow setup (see below). However, for connected sockets 174kernel prefers the values set at connection time. 175 176\paragraph{Example.} 177\addcontentsline{toc}{subsection}{Example} 178After setting socket option \verb|IPV6_FLOWINFO| 179flowlabel and DS field are received as ancillary data object 180of type \verb|IPV6_FLOWINFO| and level \verb|SOL_IPV6|. 181In the cases when it is convenient to use \verb|recvfrom(2)|, 182it is possible to replace library variant with your own one, 183sort of: 184 185\begin{verbatim} 186#include <sys/socket.h> 187#include <netinet/in6.h> 188 189size_t recvfrom(int fd, char *buf, size_t len, int flags, 190 struct sockaddr *addr, int *addrlen) 191{ 192 size_t cc; 193 char cbuf[128]; 194 struct cmsghdr *c; 195 struct iovec iov = { buf, len }; 196 struct msghdr msg = { addr, *addrlen, 197 &iov, 1, 198 cbuf, sizeof(cbuf), 199 0 }; 200 201 cc = recvmsg(fd, &msg, flags); 202 if (cc < 0) 203 return cc; 204 ((struct sockaddr_in6*)addr)->sin6_flowinfo = 0; 205 *addrlen = msg.msg_namelen; 206 for (c=CMSG_FIRSTHDR(&msg); c; c = CMSG_NEXTHDR(&msg, c)) { 207 if (c->cmsg_level != SOL_IPV6 || 208 c->cmsg_type != IPV6_FLOWINFO) 209 continue; 210 ((struct sockaddr_in6*)addr)->sin6_flowinfo = *(__u32*)CMSG_DATA(c); 211 } 212 return cc; 213} 214\end{verbatim} 215 216 217 218\section{Flow label management.} 219 220\paragraph{Discussion.} 221\addcontentsline{toc}{subsection}{Discussion} 222Requirements of RFC2460 are pretty tough. Particularly, lifetimes 223longer than boot time require to store allocated labels at stable 224storage, so that the full implementation necessarily includes user space flow 225label manager. There are at least three different approaches: 226 227\begin{enumerate} 228\item {\bf ``Cooperative''. } We could leave flow label allocation wholly 229to user space. When user needs label he requests manager directly. The approach 230is valid, but as any ``cooperative'' approach it suffers of security problems. 231 232\begin{NB} 233One idea is to disallow not privileged user to allocate flow 234labels, but instead to pass the socket to manager via \verb|SCM_RIGHTS| 235control message, so that it will allocate label and assign it to socket 236itself. Hmm... the idea is interesting. 237\end{NB} 238 239\item {\bf ``Indirect''.} Kernel redirects requests to user level daemon 240and does not install label until the daemon acknowledged the request. 241The approach is the most promising, it is especially pleasant to recognize 242parallel with IPsec API [RFC2367,Craig]. Actually, it may share API with 243IPsec. 244 245\item {\bf ``Stupid''.} To allocate labels in kernel space. It is the simplest 246method, but it suffers of two serious flaws: the first, 247we cannot lease labels with lifetimes longer than boot time, the second, 248it is sensitive to DoS attacks. Kernel have to remember all the obsolete 249labels until their expiration and malicious user may fastly eat all the 250flow label space. 251 252\end{enumerate} 253 254Certainly, I choose the most ``stupid'' method. It is the cheapest one 255for implementor (i.e.\ me), and taking into account that flow labels 256still have no serious applications it is not useful to work on more 257advanced API, especially, taking into account that eventually we 258will get it for no fee together with IPsec. 259 260 261\paragraph{Implementation.} 262\addcontentsline{toc}{subsection}{Implementation} 263Socket option \verb|IPV6_FLOWLABEL_MGR| allows to 264request flow label manager to allocate new flow label, to reuse 265already allocated one or to delete old flow label. 266Its argument is \verb|struct| \verb|in6_flowlabel_req|: 267 268\begin{verbatim} 269struct in6_flowlabel_req 270{ 271 struct in6_addr flr_dst; 272 __u32 flr_label; 273 __u8 flr_action; 274 __u8 flr_share; 275 __u16 flr_flags; 276 __u16 flr_expires; 277 __u16 flr_linger; 278 __u32 __flr_reserved; 279 /* Options in format of IPV6_PKTOPTIONS */ 280}; 281\end{verbatim} 282 283\begin{itemize} 284 285\item \verb|dst| is IPv6 destination address associated with the label. 286 287\item \verb|label| is flow label value in network byte order. If it is zero, 288kernel will allocate new pseudo-random number. Otherwise, kernel will try 289to lease flow label ordered by user. In this case, it is user task to provide 290necessary flow label randomness. 291 292\item \verb|action| is requested operation. Currently, only three operations 293are defined: 294 295\begin{verbatim} 296#define IPV6_FL_A_GET 0 /* Get flow label */ 297#define IPV6_FL_A_PUT 1 /* Release flow label */ 298#define IPV6_FL_A_RENEW 2 /* Update expire time */ 299\end{verbatim} 300 301\item \verb|flags| are optional modifiers. Currently 302only \verb|IPV6_FL_A_GET| has modifiers: 303 304\begin{verbatim} 305#define IPV6_FL_F_CREATE 1 /* Allowed to create new label */ 306#define IPV6_FL_F_EXCL 2 /* Do not create new label */ 307\end{verbatim} 308 309 310\item \verb|share| defines who is allowed to reuse the same flow label. 311 312\begin{verbatim} 313#define IPV6_FL_S_NONE 0 /* Not defined */ 314#define IPV6_FL_S_EXCL 1 /* Label is private */ 315#define IPV6_FL_S_PROCESS 2 /* May be reused by this process */ 316#define IPV6_FL_S_USER 3 /* May be reused by this user */ 317#define IPV6_FL_S_ANY 255 /* Anyone may reuse it */ 318\end{verbatim} 319 320\item \verb|linger| is time in seconds. After the last user releases flow 321label, it will not be reused with different destination and options at least 322during this time. If \verb|share| is not \verb|IPV6_FL_S_EXCL| the label 323still can be shared by another sockets. Current implementation does not allow 324unprivileged user to set linger longer than 60 sec. 325 326\item \verb|expires| is time in seconds. Flow label will be kept at least 327for this time, but it will not be destroyed before user released it explicitly 328or closed all the sockets using it. Current implementation does not allow 329unprivileged user to set timeout longer than 60 sec. Proviledged applications 330MAY set longer lifetimes, but in this case they MUST save allocated 331labels at stable storage and restore them back after reboot before the first 332application allocates new flow. 333 334\end{itemize} 335 336This structure is followed by optional extension headers associated 337with this flow label in format of \verb|IPV6_PKTOPTIONS|. Only 338\verb|IPV6_HOPOPTS|, \verb|IPV6_RTHDR| and, if \verb|IPV6_RTHDR| presents, 339\verb|IPV6_DSTOPTS| are allowed. 340 341\paragraph{Example.} 342\addcontentsline{toc}{subsection}{Example} 343 The function \verb|get_flow_label| allocates 344private flow label. 345 346\begin{verbatim} 347int get_flow_label(int fd, struct sockaddr_in6 *dst, __u32 fl) 348{ 349 int on = 1; 350 struct in6_flowlabel_req freq; 351 352 memset(&freq, 0, sizeof(freq)); 353 freq.flr_label = htonl(fl); 354 freq.flr_action = IPV6_FL_A_GET; 355 freq.flr_flags = IPV6_FL_F_CREATE | IPV6_FL_F_EXCL; 356 freq.flr_share = IPV6_FL_S_EXCL; 357 memcpy(&freq.flr_dst, &dst->sin6_addr, 16); 358 if (setsockopt(fd, SOL_IPV6, IPV6_FLOWLABEL_MGR, 359 &freq, sizeof(freq)) == -1) { 360 perror ("can't lease flowlabel"); 361 return -1; 362 } 363 dst->sin6_flowinfo |= freq.flr_label; 364 365 if (setsockopt(fd, SOL_IPV6, IPV6_FLOWINFO_SEND, 366 &on, sizeof(on)) == -1) { 367 perror ("can't send flowinfo"); 368 369 freq.flr_action = IPV6_FL_A_PUT; 370 setsockopt(fd, SOL_IPV6, IPV6_FLOWLABEL_MGR, 371 &freq, sizeof(freq)); 372 return -1; 373 } 374 return 0; 375} 376\end{verbatim} 377 378A bit more complicated example using routing header can be found 379in \verb|ping6| utility (\verb|iputils| package). Linux rsvpd backend 380contains an example of using operation \verb|IPV6_FL_A_RENEW|. 381 382\paragraph{Listing flow labels.} 383\addcontentsline{toc}{subsection}{Listing flow labels} 384List of currently allocated 385flow labels may be read from \verb|/proc/net/ip6_flowlabel|. 386 387\begin{verbatim} 388Label S Owner Users Linger Expires Dst Opt 389A1BE5 1 0 0 6 3 3ffe2400000000010a0020fffe71fb30 0 390\end{verbatim} 391 392\begin{itemize} 393\item \verb|Label| is hexadecimal flow label value. 394\item \verb|S| is sharing style. 395\item \verb|Owner| is ID of creator, it is zero, pid or uid, depending on 396 sharing style. 397\item \verb|Users| is number of applications using the label now. 398\item \verb|Linger| is \verb|linger| of this label in seconds. 399\item \verb|Expires| is time until expiration of the label in seconds. It may 400 be negative, if the label is in use. 401\item \verb|Dst| is IPv6 destination address. 402\item \verb|Opt| is length of options, associated with the label. Option 403 data are not accessible. 404\end{itemize} 405 406 407\paragraph{Flow labels and RSVP.} 408\addcontentsline{toc}{subsection}{Flow labels and RSVP} 409RSVP daemon supports IPv6 flow labels 410without any modifications to standard ISI RAPI. Sender must allocate 411flow label, fill corresponding sender template and submit it to local rsvp 412daemon. rsvpd will check the label and start to announce it in PATH 413messages. Rsvpd on sender node will renew the flow label, so that it will not 414be reused before path state expires and all the intermediate 415routers and receiver purge flow state. 416 417\verb|rtap| utility is modified to parse flow labels. F.e.\ if user allocated 418flow label \verb|0xA1234|, he may write: 419 420\begin{verbatim} 421RTAP> sender 3ffe:2400::1/FL0xA1234 <Tspec> 422\end{verbatim} 423 424Receiver makes reservation with command: 425\begin{verbatim} 426RTAP> reserve ff 3ffe:2400::1/FL0xA1234 <Flowspec> 427\end{verbatim} 428 429\end{document} 430