History log of /net/dccp/minisocks.c
Revision Date Author Comments
9fe516ba3fb29b6f6a752ffd93342fdee500ec01 27-Jun-2014 Eric Dumazet <edumazet@google.com> inet: move ipv6only in sock_common

When an UDP application switches from AF_INET to AF_INET6 sockets, we
have a small performance degradation for IPv4 communications because of
extra cache line misses to access ipv6only information.

This can also be noticed for TCP listeners, as ipv6_only_sock() is also
used from __inet_lookup_listener()->compute_score()

This is magnified when SO_REUSEPORT is used.

Move ipv6only into struct sock_common so that it is available at
no extra cost in lookups.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
676d23690fb62b5d51ba5d659935e9f7d9da9f8e 11-Apr-2014 David S. Miller <davem@davemloft.net> net: Fix use after free by removing length arg from sk_data_ready callbacks.

Several spots in the kernel perform a sequence like:

skb_queue_tail(&sk->s_receive_queue, skb);
sk->sk_data_ready(sk, skb->len);

But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up. So this skb->len access is potentially
to freed up memory.

Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.

And finally, no actual implementation of this callback actually uses
the length argument. And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.

So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.

Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.

Signed-off-by: David S. Miller <davem@davemloft.net>
b44084c2c822f99dd3f2334b288b7e463d222662 10-Oct-2013 Eric Dumazet <edumazet@google.com> inet: rename ir_loc_port to ir_num

In commit 634fb979e8f ("inet: includes a sock_common in request_sock")
I forgot that the two ports in sock_common do not have same byte order :

skc_dport is __be16 (network order), but skc_num is __u16 (host order)

So sparse complains because ir_loc_port (mapped into skc_num) is
considered as __u16 while it should be __be16

Let rename ir_loc_port to ireq->ir_num (analogy with inet->inet_num),
and perform appropriate htons/ntohs conversions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
634fb979e8f3a70f04c1f2f519d0cd1142eb5c1a 10-Oct-2013 Eric Dumazet <edumazet@google.com> inet: includes a sock_common in request_sock

TCP listener refactoring, part 5 :

We want to be able to insert request sockets (SYN_RECV) into main
ehash table instead of the per listener hash table to allow RCU
lookups and remove listener lock contention.

This patch includes the needed struct sock_common in front
of struct request_sock

This means there is no more inet6_request_sock IPv6 specific
structure.

Following inet_request_sock fields were renamed as they became
macros to reference fields from struct sock_common.
Prefix ir_ was chosen to avoid name collisions.

loc_port -> ir_loc_port
loc_addr -> ir_loc_addr
rmt_addr -> ir_rmt_addr
rmt_port -> ir_rmt_port
iif -> ir_iif

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
efe4208f47f907b86f528788da711e8ab9dea44d 04-Oct-2013 Eric Dumazet <edumazet@google.com> ipv6: make lookups simpler and faster

TCP listener refactoring, part 4 :

To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common

Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.

Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).

inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6

This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.

inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr

And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.

We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e6c022a4fa2d2d9ca9d0a7ac3b05ad988f39fc30 28-Oct-2012 Eric Dumazet <edumazet@google.com> tcp: better retrans tracking for defer-accept

For passive TCP connections using TCP_DEFER_ACCEPT facility,
we incorrectly increment req->retrans each time timeout triggers
while no SYNACK is sent.

SYNACK are not sent for TCP_DEFER_ACCEPT that were established (for
which we received the ACK from client). Only the last SYNACK is sent
so that we can receive again an ACK from client, to move the req into
accept queue. We plan to change this later to avoid the useless
retransmit (and potential problem as this SYNACK could be lost)

TCP_INFO later gives wrong information to user, claiming imaginary
retransmits.

Decouple req->retrans field into two independent fields :

num_retrans : number of retransmit
num_timeout : number of timeouts

num_timeout is the counter that is incremented at each timeout,
regardless of actual SYNACK being sent or not, and used to
compute the exponential timeout.

Introduce inet_rtx_syn_ack() helper to increment num_retrans
only if ->rtx_syn_ack() succeeded.

Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans
when we re-send a SYNACK in answer to a (retransmitted) SYN.
Prior to this patch, we were not counting these retransmits.

Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS
only if a synack packet was successfully queued.

Reported-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
Cc: Elliott Hughes <enh@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f541fb7e20c848f947ca65fbf169efe69400c942 27-Feb-2012 Samuel Jero <sj323707@ohio.edu> dccp: fix bug in sequence number validation during connection setup

This fixes a bug in the sequence number validation during the initial handshake.

The code did not treat the initial sequence numbers ISS and ISR as read-only and
did not keep state for GSR and GSS as required by the specification. This causes
problems with retransmissions during the initial handshake, causing the
budding connection to be reset.

This patch now treats ISS/ISR as read-only and tracks GSS/GSR as required.

Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
dfd56b8b38fff3586f36232db58e1e9f7885a605 10-Dec-2011 Eric Dumazet <eric.dumazet@gmail.com> net: use IS_ENABLED(CONFIG_IPV6)

Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4e3fd7a06dc20b2d8ec6892233ad2012968fe7b6 21-Nov-2011 Alexey Dobriyan <adobriyan@gmail.com> net: remove ipv6_addr_copy()

C assignment can handle struct in6_addr copying.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e56c57d0d3fdbbdf583d3af96bfb803b8dfa713e 08-Nov-2011 Eric Dumazet <eric.dumazet@gmail.com> net: rename sk_clone to sk_clone_lock

Make clear that sk_clone() and inet_csk_clone() return a locked socket.

Add _lock() prefix and kerneldoc.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0b53d4604ac2b4f2faa9a62a04ea9b383ad2efe0 11-Oct-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: fix the adjustments to AWL and SWL

This fixes a problem and a potential loophole with regard to seqno/ackno
validity: currently the initial adjustments to AWL/SWL are only performed
once at the begin of the connection, during the handshake.

Since the Sequence Window feature is always greater than Wmin=32 (7.5.2),
it is however necessary to perform these adjustments at least for the first
W/W' (variables as per 7.5.1) packets in the lifetime of a connection.

This requirement is complicated by the fact that W/W' can change at any time
during the lifetime of a connection.

Therefore it is better to perform that safety check each time SWL/AWL are
updated, as implemented by the patch.

A second problem solved by this patch is that the remote/local Sequence Window
feature values (which set the bounds for AWL/SWL/SWH) are undefined until the
feature negotiation has completed.

During the initial handshake we have more stringent sequence number protection;
the changes added by this patch effect that {A,S}W{L,H} are within the correct
bounds at the instant that feature negotiation completes (since the SeqWin
feature activation handlers call dccp_update_gsr/gss()).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
5a0e3ad6af8660be21ca98a971cd00f331318c05 24-Mar-2010 Tejun Heo <tj@kernel.org> include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
a3a858ff18a72a8d388e31ab0d98f7e944841a62 04-Mar-2010 Zhu Yi <yi.zhu@intel.com> net: backlog functions rename

sk_add_backlog -> __sk_add_backlog
sk_add_backlog_limited -> sk_add_backlog

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e6b4d11367519bc71729c09d05a126b133c755be 02-Dec-2009 William Allen Simpson <william.allen.simpson@gmail.com> TCPCT part 1a: add request_values parameter for sending SYNACK

Add optional function parameters associated with sending SYNACK.
These parameters are not needed after sending SYNACK, and are not
used for retransmission. Avoids extending struct tcp_request_sock,
and avoids allocating kernel memory.

Also affects DCCP as it uses common struct request_sock_ops,
but this parameter is currently reserved for future use.

Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
792b48780e8b6435d017cef4b5c304876a48653e 17-Jan-2009 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Implement both feature-local and feature-remote Sequence Window feature

This adds full support for local/remote Sequence Window feature, from which the
* sequence-number-validity (W) and
* acknowledgment-number-validity (W') windows
derive as specified in RFC 4340, 7.5.3.

Specifically, the following is contained in this patch:
* integrated new socket fields into dccp_sk;
* updated the update_gsr/gss routines with regard to these fields;
* updated handler code: the Sequence Window feature is located at the TX side,
so the local feature is meant if the handler-rx flag is false;
* the initialisation of `rcv_wnd' in reqsk is removed, since
- rcv_wnd is not used by the code anywhere;
- sequence number checks are not done in the LISTEN state (cf. 7.5.3);
- dccp_check_req checks the Ack number validity more rigorously;
* the `struct dccp_minisock' became empty and is now removed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
6fdd34d43bff8be9bb925b49d87a0ee144d2ab07 08-Dec-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Phase out the use of boolean Ack Vector sysctl

This removes the use of the sysctl and the minisock variable for the Send Ack
Vector feature, as it now is handled fully dynamically via feature negotiation
(i.e. when CCID-2 is enabled, Ack Vectors are automatically enabled as per
RFC 4341, 4.).

Using a sysctl in parallel to this implementation would open the door to
crashes, since much of the code relies on tests of the boolean minisock /
sysctl variable. Thus, this patch replaces all tests of type

if (dccp_msk(sk)->dccpms_send_ack_vector)
/* ... */
with
if (dp->dccps_hc_rx_ackvec != NULL)
/* ... */

The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature
negotiation concluded that Ack Vectors are to be used on the half-connection.
Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child),
so that the test is a valid one.

The activation handler for Ack Vectors is called as soon as the feature
negotiation has concluded at the
* server when the Ack marking the transition RESPOND => OPEN arrives;
* client after it has sent its ACK, marking the transition REQUEST => PARTOPEN.

Adding the sequence number of the Response packet to the Ack Vector has been
removed, since
(a) connection establishment implies that the Response has been received;
(b) the CCIDs only look at packets received in the (PART)OPEN state, i.e.
this entry will always be ignored;
(c) it can not be used for anything useful - to detect loss for instance, only
packets received after the loss can serve as pseudo-dupacks.

There was a FIXME to change the error code when dccp_ackvec_add() fails.
I removed this after finding out that:
* the check whether ackno < ISN is already made earlier,
* this Response is likely the 1st packet with an Ackno that the client gets,
* so when dccp_ackvec_add() fails, the reason is likely not a packet error.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
4098dce5be537a157eed4a326efd464109825b8b 08-Dec-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Remove manual influence on NDP Count feature

Updating the NDP count feature is handled automatically now:
* for CCID-2 it is disabled, since the code does not use NDP counts;
* for CCID-3 it is enabled, as NDP counts are used to determine loss lengths.

Allowing the user to change NDP values leads to unpredictable and failing
behaviour, since it is then possible to disable NDP counts even when they
are needed (e.g. in CCID-3).

This means that only those user settings are sensible that agree with the
values for Send NDP Count implied by the choice of CCID. But those settings
are already activated by the feature negotiation (CCID dependency tracking),
hence this form of support is redundant.

At startup the initialisation of the NDP count feature uses the default
value of 0, which is done implicitly by the zeroing-out of the socket when
it is allocated. If the choice of CCID or feature negotiation enables NDP
count, this will then be updated via the NDP activation handler.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
0049bab5e765aa74cf767a834fa336e19453fc5e 08-Dec-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Remove obsolete parts of the old CCID interface

The TX/RX CCIDs of the minisock are now redundant: similar to the Ack Vector
case, their value equals initially that of the sysctl, but at the end of
feature negotiation may be something different.

The old interface removed by this patch thus has been replaced by the newer
interface to dynamically query the currently loaded CCIDs.

Also removed are the constructors for the TX CCID and the RX CCID, since the
switch "rx <-> non-rx" is done by the handler in minisocks.c (and the handler
is the only place in the code where CCIDs are loaded).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
192b27ff35bad4cf76cc4239419e9f805935e4f8 08-Dec-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Integration of dynamic feature activation - part 2 (server side)

This patch integrates the activation of features at the end of negotiation
into the server-side code.

Note regarding the removal of 'const':
--------------------------------------
The 'const' attribute has been removed from 'dreq' since dccp_activate_values()
needs to operate on dreq's feature list. Part of the activation is to remove
those options from the list that have already been confirmed, hence it is not
purely read-only.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
dd9c0e363cef32b7d6f23d4c87e8dfe4f91fd1c5 17-Nov-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Deprecate Ack Ratio sysctl

This patch deprecates the Ack Ratio sysctl, since
* Ack Ratio is entirely ignored by CCID-3 and CCID-4,
* Ack Ratio currently doesn't work in CCID-2 (i.e. is always set to 1);
* even if it would work in CCID-2, there is no point for a user to change it:
- Ack Ratio is constrained by cwnd (RFC 4341, 6.1.2),
- if Ack Ratio > cwnd, the system resorts to spurious RTO timeouts
(since waiting for Acks which will never arrive in this window),
- cwnd is not a user-configurable value.

The only reasonable place for Ack Ratio is to print it for debugging. It is
planned to do this later on, as part of e.g. dccp_probe.

With this patch Ack Ratio is now under full control of feature negotiation:
* Ack Ratio is resolved as a dependency of the selected CCID;
* if the chosen CCID supports it (i.e. CCID == CCID-2), Ack Ratio is set to
the default of 2, following RFC 4340, 11.3 - "New connections start with Ack
Ratio 2 for both endpoints";
* what happens then is part of another patch set, since it concerns the
dynamic update of Ack Ratio while the connection is in full flight.

Thanks to Tomasz Grobelny for discussion leading up to this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ac75773c2742d82cbcb078708df406e9017224b7 05-Nov-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Per-socket initialisation of feature negotiation

This provides feature-negotiation initialisation for both DCCP sockets
and DCCP request_sockets, to support feature negotiation during
connection setup.

It also resolves a FIXME regarding the congestion control
initialisation.

Thanks to Wei Yongjun for help with the IPv6 side of this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
944f750227fa0beb2b440709687415621e2533a4 20-Oct-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Port redirection support for DCCP

Commit a3116ac5c216fc3c145906a46df9ce542ff7dcf2 from 1st October ("tcp: Port
redirection support for TCP") broke DCCP skb lookup by changing inet_csk_clone,
which is used by DCCP to generate the child socket after the handshake.

This patch updates DCCP to use 'loc_port' instead of 'sport', which fixes the
problem, and thus inheriting port redirection support via the new interface.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
410e27a49bb98bc7fa3ff5fc05cc313817b9f253 09-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> This reverts "Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/dccp_exp"
as it accentally contained the wrong set of patches. These will be
submitted separately.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
bfbddd085a5bced6efb9e1bc4d029438f9639784 04-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Fix the adjustments to AWL and SWL

This fixes a problem and a potential loophole with regard to seqno/ackno
validity: the problem is that the initial adjustments to AWL/SWL were
only performed at the begin of the connection, during the handshake.

Since the Sequence Window feature is always greater than Wmin=32 (7.5.2),
it is however necessary to perform these adjustments at least for the first
W/W' (variables as per 7.5.1) packets in the lifetime of a connection.

This requirement is complicated by the fact that W/W' can change at any time
during the lifetime of a connection.

Therefore the consequence is to perform this safety check each time SWL/AWL
are updated.

A second problem solved by this patch is that the remote/local Sequence Window
feature values (which set the bounds for AWL/SWL/SWH) are undefined until the
feature negotiation has completed.

During the initial handshake we have more stringent sequence number protection,
the changes added by this patch effect that {A,S}W{L,H} are within the correct
bounds at the instant that feature negotiation completes (since the SeqWin
feature activation handlers call dccp_update_gsr/gss()).

A detailed rationale is below -- can be removed from the commit message.


1. Server sequence number checks during initial handshake
---------------------------------------------------------
The server can not use the fields of the listening socket for seqno/ackno checks
and thus needs to store all relevant information on a per-connection basis on
the dccp_request socket. This is a size-constrained structure and has currently
only ISS (dreq_iss) and ISR (dreq_isr) defined.
Adding further fields (SW{L,H}, AW{L,H}) would increase the size of the struct
and it is questionable whether this will have any practical gain. The currently
implemented solution is as follows.
* receiving first Request: dccp_v{4,6}_conn_request sets
ISR := P.seqno, ISS := dccp_v{4,6}_init_sequence()

* sending first Response: dccp_v{4,6}_send_response via dccp_make_response()
sets P.seqno := ISS, sets P.ackno := ISR

* receiving retransmitted Request: dccp_check_req() overrides ISR := P.seqno

* answering retransmitted Request: dccp_make_response() sets ISS += 1,
otherwise as per first Response

* completing the handshake: succeeds in dccp_check_req() for the first Ack
where P.ackno == ISS (P.seqno is not tested)

* creating child socket: ISS, ISR are copied from the request_sock

This solution will succeed whenever the server can receive the Request and the
subsequent Ack in succession, without retransmissions. If there is packet loss,
the client needs to retransmit until this condition succeeds; it will otherwise
eventually give up. Adding further fields to the request_sock could increase
the robustness a bit, in that it would make possible to let a reordered Ack
(from a retransmitted Response) pass. The argument against such a solution is
that if the packet loss is not persistent and an Ack gets through, why not
wait for the one answering the original response: if the loss is persistent, it
is probably better to not start the connection in the first place.

Long story short: the present design (by Arnaldo) is simple and will likely work
just as well as a more complicated solution. As a consequence, {A,S}W{L,H} are
not needed until the moment the request_sock is cloned into the accept queue.

At that stage feature negotiation has completed, so that the values for the local
and remote Sequence Window feature (7.5.2) are known, i.e. we are now in a better
position to compute {A,S}W{L,H}.


2. Client sequence number checks during initial handshake
---------------------------------------------------------
Until entering PARTOPEN the client does not need the adjustments, since it
constrains the Ack window to the packet it sent.

* sending first Request: dccp_v{4,6}_connect() choose ISS,
dccp_connect() then sets GAR := ISS (as per 8.5),
dccp_transmit_skb() (with the previous bug fix) sets
GSS := ISS, AWL := ISS, AWH := GSS
* n-th retransmitted Request (with previous patch):
dccp_retransmit_skb() via timer calls
dccp_transmit_skb(), which sets GSS := ISS+n
and then AWL := ISS, AWH := ISS+n

* receiving any Response: dccp_rcv_request_sent_state_process()
-- accepts packet if AWL <= P.ackno <= AWH;
-- sets GSR = ISR = P.seqno

* sending the Ack completing the handshake: dccp_send_ack() calls
dccp_transmit_skb(), which sets GSS += 1
and AWL := ISS, AWH := GSS


Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
51c7d4fa2675c106a980ddcdbe308b54b5151945 04-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Implement both feature-local and feature-remote Sequence Window feature

This adds full support for local/remote Sequence Window feature, from which the
* sequence-number-validity (W) and
* acknowledgment-number-validity (W') windows
derive as specified in RFC 4340, 7.5.3.

Specifically, the following changes are introduced:
* integrated new socket fields into dccp_sk;
* updated the update_gsr/gss routines with regard to these fields;
* updated handler code: the Sequence Window feature is located at the TX side,
so the local feature is meant if the handler-rx flag is false;
* the initialisation of `rcv_wnd' in reqsk is removed, since
- rcv_wnd is not used by the code anywhere;
- sequence number checks are not done in the LISTEN state (cf. 7.5.3);
- dccp_check_req checks the Ack number validity more rigorously;
* the `struct dccp_minisock' became empty and is now removed.

Until the handshake completes with activating negotiated values, the local/remote
Sequence-Window values are undefined and thus can not reliably be estimated.
This issue is addressed in a separate patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
b235dc4abbc1356284bd0dc730efa711f394e0e2 04-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Phase out the use of boolean Ack Vector sysctl

This removes the use of the sysctl and the minisock variable for the Send Ack
Vector feature, which is now handled fully dynamically via feature negotiation;
i.e. when CCID2 is enabled, Ack Vectors are automatically enabled (as per
RFC 4341, 4.).

Using a sysctl in parallel to this implementation would open the door to
crashes, since much of the code relies on tests of the boolean minisock /
sysctl variable. Thus, this patch replaces all tests of type

if (dccp_msk(sk)->dccpms_send_ack_vector)
/* ... */
with
if (dp->dccps_hc_rx_ackvec != NULL)
/* ... */

The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature
negotiation concluded that Ack Vectors are to be used on the half-connection.
Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child),
so that the test is a valid one.

The activation handler for Ack Vectors is called as soon as the feature
negotiation has concluded at the
* server when the Ack marking the transition RESPOND => OPEN arrives;
* client after it has sent its ACK, marking the transition REQUEST => PARTOPEN.

Adding the sequence number of the Response packet to the Ack Vector has been
removed, since
(a) connection establishment implies that the Response has been received;
(b) the CCIDs only look at packets received in the (PART)OPEN state, i.e.
this entry will always be ignored;
(c) it can not be used for anything useful - to detect loss for instance, only
packets received after the loss can serve as pseudo-dupacks.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
68e074bfcef269bc61006c2740d7f89ccbbd93d7 04-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Remove manual influence on NDP Count feature

Updating the NDP count feature is handled automatically now:
* for CCID-2 it is disabled, since the code does not use NDP counts;
* for CCID-3 it is enabled, as NDP counts are used to determine loss lengths.

Allowing the user to change NDP values leads to unpredictable and failing
behaviour, since it is then possible to disable NDP counts even when they
are needed (e.g. in CCID-3).

This means that only those user settings are sensible that agree with the
values for Send NDP Count implied by the choice of CCID. But those settings
are already activated by the feature negotiation (CCID dependency tracking),
hence this form of support is redundant.

At startup the initialisation of the NDP count feature is with the default
value of 0, which is done implicitly by the zeroing-out of the socket when
it is allocated. If the choice of CCID or feature negotiation enables NDP
count, this will then be updated via the NDP activation handler.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
78673e24df27c76ec75565f4024d45c2c74ef148 04-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Remove obsolete parts of the old CCID interface

The TX/RX CCIDs of the minisock are now redundant: similar to the Ack Vector
case, their value equals initially that of the sysctl, but at the end of
feature negotiation may be something different.

The old interface removed by this patch thus has been replaced by the newer
interface to dynamically query the currently loaded CCIDs earlier in this
patch set.

Also removed the constructors for the TX CCID and the RX CCID, since the
switch rx/non-rx is done by the handler in minisocks.c (and the handler is
the only place in the code where CCIDs are loaded).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
e70cacb90d76f0632f7bba69c87a62e709e84619 04-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Integration of dynamic feature activation - part 2 (server side)

This patch integrates the activation of features at the end of negotiation
into the server-side code.

Note:
In dccp_create_openreq_child the request_sock argument is no longer constant,
since dccp_activate_values() uses the feature-negotiation list on dreq to sort
out the initialisation values for the different features of the child socket;
and purges this queue after use (but the `req' argument to openreq_child
can and does still remain constant).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
17c30b40ed79e9f3955e884632c8f01e577b204a 04-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Deprecate Ack Ratio sysctl

This patch deprecates the Ack Ratio sysctl, since
* Ack Ratio is entirely ignored by CCID-3 and CCID-4,
* Ack Ratio currently doesn't work in CCID-2 (i.e. is always set to 1);
* even if it would work in CCID-2, there is no point for a user to change it:
- Ack Ratio is constrained by cwnd (RFC 4341, 6.1.2),
- if Ack Ratio > cwnd, the system resorts to spurious RTO timeouts
(since waiting for Acks which will never arrive in this window),
- cwnd is not a user-configurable value.

The only reasonable place for Ack Ratio is to print it for debugging. It is
planned to do this later on, as part of e.g. dccp_probe.

With this patch Ack Ratio is now under full control of feature negotiation:
* Ack Ratio is resolved as a dependency of the selected CCID;
* if the chosen CCID supports it (i.e. CCID == CCID-2), Ack Ratio is set to
the default of 2, following RFC 4340, 11.3 - "New connections start with Ack
Ratio 2 for both endpoints";
* what happens then is part of another patch set, since it concerns the
dynamic update of Ack Ratio while the connection is in full flight.

Thanks to Tomasz Grobelny for discussion leading up to this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
828755cee087e4a34f45d6c9db661ccd0631cc6d 04-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Per-socket initialisation of feature negotiation

This provides feature-negotiation initialisation for both DCCP sockets and
DCCP request_sockets, to support feature negotiation during connection setup.

It also resolves a FIXME regarding the congestion control initialisation.

Thanks to Wei Yongjun for help with the IPv6 side of this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
6edafaaf6f5e70ef1e620ff01bd6bacebe1e0718 07-Aug-2008 Gui Jianfeng <guijianfeng@cn.fujitsu.com> tcp: Fix kernel panic when calling tcp_v(4/6)_md5_do_lookup

If the following packet flow happen, kernel will panic.
MathineA MathineB
SYN
---------------------->
SYN+ACK
<----------------------
ACK(bad seq)
---------------------->
When a bad seq ACK is received, tcp_v4_md5_do_lookup(skb->sk, ip_hdr(skb)->daddr))
is finally called by tcp_v4_reqsk_send_ack(), but the first parameter(skb->sk) is
NULL at that moment, so kernel panic happens.
This patch fixes this bug.

OOPS output is as following:
[ 302.812793] IP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42
[ 302.817075] Oops: 0000 [#1] SMP
[ 302.819815] Modules linked in: ipv6 loop dm_multipath rtc_cmos rtc_core rtc_lib pcspkr pcnet32 mii i2c_piix4 parport_pc i2c_core parport ac button ata_piix libata dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
[ 302.849946]
[ 302.851198] Pid: 0, comm: swapper Not tainted (2.6.27-rc1-guijf #5)
[ 302.855184] EIP: 0060:[<c05cfaa6>] EFLAGS: 00010296 CPU: 0
[ 302.858296] EIP is at tcp_v4_md5_do_lookup+0x12/0x42
[ 302.861027] EAX: 0000001e EBX: 00000000 ECX: 00000046 EDX: 00000046
[ 302.864867] ESI: ceb69e00 EDI: 1467a8c0 EBP: cf75f180 ESP: c0792e54
[ 302.868333] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 302.871287] Process swapper (pid: 0, ti=c0792000 task=c0712340 task.ti=c0746000)
[ 302.875592] Stack: c06f413a 00000000 cf75f180 ceb69e00 00000000 c05d0d86 000016d0 ceac5400
[ 302.883275] c05d28f8 000016d0 ceb69e00 ceb69e20 681bf6e3 00001000 00000000 0a67a8c0
[ 302.890971] ceac5400 c04250a3 c06f413a c0792eb0 c0792edc cf59a620 cf59a620 cf59a634
[ 302.900140] Call Trace:
[ 302.902392] [<c05d0d86>] tcp_v4_reqsk_send_ack+0x17/0x35
[ 302.907060] [<c05d28f8>] tcp_check_req+0x156/0x372
[ 302.910082] [<c04250a3>] printk+0x14/0x18
[ 302.912868] [<c05d0aa1>] tcp_v4_do_rcv+0x1d3/0x2bf
[ 302.917423] [<c05d26be>] tcp_v4_rcv+0x563/0x5b9
[ 302.920453] [<c05bb20f>] ip_local_deliver_finish+0xe8/0x183
[ 302.923865] [<c05bb10a>] ip_rcv_finish+0x286/0x2a3
[ 302.928569] [<c059e438>] dev_alloc_skb+0x11/0x25
[ 302.931563] [<c05a211f>] netif_receive_skb+0x2d6/0x33a
[ 302.934914] [<d0917941>] pcnet32_poll+0x333/0x680 [pcnet32]
[ 302.938735] [<c05a3b48>] net_rx_action+0x5c/0xfe
[ 302.941792] [<c042856b>] __do_softirq+0x5d/0xc1
[ 302.944788] [<c042850e>] __do_softirq+0x0/0xc1
[ 302.948999] [<c040564b>] do_softirq+0x55/0x88
[ 302.951870] [<c04501b1>] handle_fasteoi_irq+0x0/0xa4
[ 302.954986] [<c04284da>] irq_exit+0x35/0x69
[ 302.959081] [<c0405717>] do_IRQ+0x99/0xae
[ 302.961896] [<c040422b>] common_interrupt+0x23/0x28
[ 302.966279] [<c040819d>] default_idle+0x2a/0x3d
[ 302.969212] [<c0402552>] cpu_idle+0xb2/0xd2
[ 302.972169] =======================
[ 302.974274] Code: fc ff 84 d2 0f 84 df fd ff ff e9 34 fe ff ff 83 c4 0c 5b 5e 5f 5d c3 90 90 57 89 d7 56 53 89 c3 50 68 3a 41 6f c0 e8 e9 55 e5 ff <8b> 93 9c 04 00 00 58 85 d2 59 74 1e 8b 72 10 31 db 31 c9 85 f6
[ 303.011610] EIP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42 SS:ESP 0068:c0792e54
[ 303.018360] Kernel panic - not syncing: Fatal exception in interrupt

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
be4c798a41bf626cdaacf96c382f116ed2f7dbe9 11-Jun-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Bug in initial acknowledgment number assignment

Step 8.5 in RFC 4340 says for the newly cloned socket

Initialize S.GAR := S.ISS,

but what in fact the code (minisocks.c) does is

Initialize S.GAR := S.ISR,

which is wrong (typo?) -- fixed by the patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
fd80eb942ad9761f241c9b287b3b9a342b20690d 29-Feb-2008 Denis V. Lunev <den@openvz.org> [INET]: Remove struct dst_entry *dst from request_sock_ops.rtx_syn_ack.

It looks like dst parameter is used in this API due to historical
reasons. Actually, it is really used in the direct call to
tcp_v4_send_synack only. So, create a wrapper for tcp_v4_send_synack
and remove dst from rtx_syn_ack.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
b4d4f7c70fd3361c6c889752e08ea9be304cf5f4 13-Dec-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [DCCP]: Handle timestamps on Request/Response exchange separately

In DCCP, timestamps can occur on packets anytime, CCID3 uses a timestamp(/echo) on the Request/Response
exchange. This patch addresses the following situation:
* timestamps are recorded on the listening socket;
* Responses are sent from dccp_request_sockets;
* suppose two connections reach the listening socket with very small time in between:
* the first timestamp value gets overwritten by the second connection request.

This is not really good, so this patch separates timestamps into
* those which are received by the server during the initial handshake (on dccp_request_sock);
* those which are received by the client or the client after connection establishment.

As before, a timestamp of 0 is regarded as indicating that no (meaningful) timestamp has been
received (in addition, a warning message is printed if hosts send 0-valued timestamps).

The timestamp-echoing now works as follows:
* when a timestamp is present on the initial Request, it is placed into dreq, due to the
call to dccp_parse_options in dccp_v{4,6}_conn_request;
* when a timestamp is present on the Ack leading from RESPOND => OPEN, it is copied over
from the request_sock into the child cocket in dccp_create_openreq_child;
* timestamps received on an (established) dccp_sock are treated as before.

Since Elapsed Time is measured in hundredths of milliseconds (13.2), the new dccp_timestamp()
function is used, as it is expected that the time between receiving the timestamp and
sending the timestamp echo will be very small against the wrap-around time. As a byproduct,
this allows smaller timestamping-time fields.

Furthermore, inserting the Timestamp Echo option has been taken out of the block starting with
'!dccp_packet_without_ack()', since Timestamp Echo can be carried on any packet (5.8 and 13.3).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8109616e2ef978d142ea45850efd4f102b9bdce4 13-Dec-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [DCCP]: Add (missing) option parsing to request_sock processing

This adds option-parsing code to processing of Acks in the listening state
on request_socks on the server, serving two purposes
(i) resolves a FIXME (removed);
(ii) paves the way for feature-negotiation during connection-setup.

There is an intended subtlety here with regard to dccp_check_req:

Parsing options happens only after testing whether the received packet is
a retransmitted Request. Otherwise, if the Request contained (a possibly
large number of) feature-negotiation options, recomputing state would have to
happen each time a retransmitted Request arrives, which opens the door to an
easy DoS attack. Since in a genuine retransmission the options should not be
different from the original, reusing the already computed state seems better.

The other point is - if there are timestamp options on the Request, they will
not be answered; which means that in the presence of retransmission (likely
due to loss and/or other problems), the use of Request/Response RTT sampling
is suspended, so that startup problems here do not propagate.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
451bc0473f010babeadd888ae8ec1015959fd1b2 04-Oct-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [DCCP]: Tidy-up -- minisock initialisation

This

* removes a declaration of a non-existent function
__dccp_minisock_init;

* shifts the initialisation function dccp_minisock_init() from
options.c to minisocks.c, where it is more naturally expected to
be.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8fb8354af9b92ce3bd41083995f1fe26024d0959 20-Aug-2007 Arnaldo Carvalho de Melo <acme@ghostprotocols.net> [DCCP]: Nuke dccp_timestamp and dccps_epoch, not used anymore

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4ef8d0aeafda8388dd51f2671b7059192b1e5a5f 26-Apr-2007 Milind Arun Choudhary <milindchoudhary@gmail.com> [NET]: SPIN_LOCK_UNLOCKED cleanup in drivers/atm, net

SPIN_LOCK_UNLOCKED cleanup,use __SPIN_LOCK_UNLOCKED instead

Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
99c72ce091ec85868a0847e598eb7562dc0d8205 06-Mar-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [DCCP]: Set RTO for newly created child socket

This mirrors a recent change in tcp_open_req_child, whereby the icsk_rto of the
newly created child socket was not set (but rather on the parent socket). Same
fix for DCCP.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8109b02b5397ed52a32c116163a62a34f4768b26 10-Dec-2006 Arnaldo Carvalho de Melo <acme@mandriva.com> [DCCP]: Whitespace cleanups

That accumulated over the last months hackaton, shame on me for not
using git-apply whitespace helping hand, will do that from now on.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
59348b19efebfd6a8d0791ff81d207b16594c94b 20-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk> [DCCP]: Simplified conditions due to use of enum:8 states

This reaps the benefit of the earlier patch, which changed the type of
CCID 3 states to use enums, in that many conditions are now simplified
and the number of possible (unexpected) values is greatly reduced.

In a few instances, this also allowed to simplify pre-conditions; where
care has been taken to retain logical equivalence.

[DCCP]: Introduce a consistent BUG/WARN message scheme

This refines the existing set of DCCP messages so that
* BUG(), BUG_ON(), WARN_ON() have meaningful DCCP-specific counterparts
* DCCP_CRIT (for severe warnings) is not rate-limited
* DCCP_WARN() is introduced as rate-limited wrapper

Using these allows a faster and cleaner transition to their original
counterparts once the code has matured into a full DCCP implementation.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
cfb6eeb4c860592edd123fdea908d23c6ad1c7dc 15-Nov-2006 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [TCP]: MD5 Signature Option (RFC2385) support.

Based on implementation by Rick Payne.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
afb0a34dd3e20b3f534de19993271b8664cf10bb 13-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk> [DCCP]: Introduce a consistent naming scheme for sysctls

In order to make their function clearer and obtain a consistent naming
scheme to identify sysctls, all existing DCCP sysctls have been prefixed
with `sysctl_dccp', following the same convention as used by TCP.

Feature-specific sysctls retain the `feat' in the middle, although the
`default' has been dropped, since it is obvious from use.

Also removed a duplicate `dccp_feat_default_sequence_window' in ipv4.c.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
e11d9d30802278af22e78d8c10f348b683670cd9 13-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk> [DCCP]: Increment sequence numbers on retransmitted Response packets

Problem:
d83ca5accb256de1b44835cd222bfdc3207bd7dc 10-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk> [DCCP]: Update code comments for Step 2/3

Sorts out the comments for processing steps 2,3 in section 8.5 of RFC 4340.
All comments have been updated against this document, and the reference to step
2 has been made consistent throughout the files.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
cf557926f6955b4c3fa55e81fdb3675e752e8eed 10-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk> [DCCP]: tidy up dccp_v{4,6}_conn_request

This is a code simplification to remove reduplicated code
by concentrating and abstracting shared code.

Detailed Changes:
8a73cd09d96aa01743316657fc4e6864fe79b703 10-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk> [DCCP]: calling dccp_v{4,6}_reqsk_send_ack is a BUG

This patch removes two functions, the send_ack functions of request_sock,
which are not called/used by the DCCP code. It is correct that these
functions are not called, below is a justification why calling these
functions (on a passive socket in the LISTEN/RESPOND state) would mean
a DCCP protocol violation.

A) Background: using request_sock in TCP:
c4028958b6ecad064b1a6303a6a5906d4fe48d73 22-Nov-2006 David Howells <dhowells@redhat.com> WorkStruct: make allyesconfig

Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>
6ab3d5624e172c553004ecc862bfeac16d9d68b7 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de> Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
a4bf3902427a128455b8de299ff0918072b2e974 21-Mar-2006 Arnaldo Carvalho de Melo <acme@mandriva.com> [DCCP] minisock: Rename struct dccp_options to struct dccp_minisock

This will later be included in struct dccp_request_sock so that we can
have per connection feature negotiation state while in the 3way
handshake, when we clone the DCCP_ROLE_LISTEN socket (in
dccp_create_openreq_child) we'll just copy this state from
dreq_minisock to dccps_minisock.

Also the feature negotiation and option parsing code will mostly touch
dccps_minisock, which will simplify some stuff.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
91f0ebf7b6d5cb2b6e818d48587566144821babe 21-Mar-2006 Arnaldo Carvalho de Melo <acme@mandriva.com> [DCCP] CCID: Improve CCID infrastructure

1. No need for ->ccid_init nor ->ccid_exit, this is what module_{init,exit}
does and anynways neither ccid2 nor ccid3 were using it.

2. Rename struct ccid to struct ccid_operations and introduce struct ccid
with a pointer to ccid_operations and rigth after it the rx or tx
private state.

3. Remove the pointer to the state of the half connections from struct
dccp_sock, now its derived thru ccid_priv() from the ccid pointer.

Now we also can implement the setsockopt for changing the CCID easily as
no ccid init routines can affect struct dccp_sock in any way that prevents
other CCIDs from working if a CCID switch operation is asked by apps.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
afe00251dd9b53d51de91ff0099961f42bbf3754 21-Mar-2006 Andrea Bittau <a.bittau@cs.ucl.ac.uk> [DCCP]: Initial feature negotiation implementation

Still needs more work, but boots and doesn't crashes, even
does some negotiation!

18:38:52.174934 127.0.0.1.43458 > 127.0.0.1.5001: request <change_l ack_ratio 2, change_r ccid 2, change_l ccid 2>
18:38:52.218526 127.0.0.1.5001 > 127.0.0.1.43458: response <nop, nop, change_l ack_ratio 2, confirm_r ccid 2 2, confirm_l ccid 2 2, confirm_r ack_ratio 2>
18:38:52.185398 127.0.0.1.43458 > 127.0.0.1.5001: <nop, confirm_r ack_ratio 2, ack_vector0 0x00, elapsed_time 212>

:-)

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7400d781105d18bf5bba89f8b986a413f14144a8 21-Mar-2006 Arnaldo Carvalho de Melo <acme@mandriva.com> [DCCP] ackvec: Ditch dccpav_buf_len

Simplifying the code a bit as we're always using DCCP_MAX_ACKVEC_LEN.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3df80d9320bcaea72b1b4761a319c79cb3fdaf5f 14-Dec-2005 Arnaldo Carvalho de Melo <acme@mandriva.com> [DCCP]: Introduce DCCPv6

Still needs mucho polishing, specially in the checksum code, but works
just fine, inet_diag/iproute2 and all 8)

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f21e68caa0ddffddf98a1e729e734a470957b6ec 14-Dec-2005 Arnaldo Carvalho de Melo <acme@mandriva.com> [DCCP]: Prepare the AF agnostic core for the introduction of DCCPv6

Basically exports a similar set of functions as the one exported by
the non-AF specific TCP code.

In the process moved some non-AF specific code from dccp_v4_connect to
dccp_connect_init and moved the checksum verification from
dccp_invalid_packet to dccp_v4_rcv, so as to use it in dccp_v6_rcv
too.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
57cca05af1e20fdc65b55be52c042c234f86c866 14-Dec-2005 Arnaldo Carvalho de Melo <acme@mandriva.com> [DCCP]: Introduce dccp_ipv4_af_ops

And make the core DCCP code AF agnostic, just like TCP, now its time
to work on net/dccp/ipv6.c, we are close to the end!

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ae31c3399d17b1f7bc1742724f70476b5417744f 18-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com> [DCCP]: Move the ack vector code to net/dccp/ackvec.[ch]

Isolating it, that will be used when we introduce a CCID2 (TCP-Like)
implementation.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
67e6b629212fa9ffb7420e8a88a41806af637e28 17-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com> [DCCP]: Introduce DCCP_SOCKOPT_SERVICE

As discussed in the dccp@vger mailing list:

Now applications have to use setsockopt(DCCP_SOCKOPT_SERVICE, service[s]),
prior to calling listen() and connect().

An array of unsigned ints can be passed meaning that the listening sock accepts
connection requests for several services.

With this we can ditch struct sockaddr_dccp and use only sockaddr_in (and
sockaddr_in6 in the future).

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b0e567806d16586629468c824dfb2e71155df7da 09-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com> [DCCP] Introduce dccp_timestamp

To start the timestamps with 0.0ms, easing the integer maths in the CCIDs, this
probably will be reworked to use the to be introduced struct timeval_offset
infrastructure out of skb_get_timestamp, etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
03ace394ac9bcad38043a381ae5f4860b9c9fa1c 21-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com> [DCCP]: Fix the ACK and SEQ window variables settings

This is from a first audit, more eyeballs are more than welcome.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7690af3fff7633e40b1b9950eb8489129251d074 14-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com> [DCCP]: Just reflow the source code to fit in 80 columns

Andrew Morton should be happy now 8)

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
64cf1e5d8b5f88d56509260e08fa0d8314277350 10-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com> [DCCP]: Finish the TIMEWAIT minisock support

Using most of the infrastructure TCP uses, with a dccp_death_row,
etc. As per my current interpretation of the draft what we have with
this changeset seems to be all we need (or very close to it 8)).

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f6ccf55419c4f0021e7382f000f2fd14a29f3d3c 10-Aug-2005 David S. Miller <davem@davemloft.net> [DCCP]: Fix u64 printf format warnings.

Signed-off-by: David S. Miller <davem@davemloft.net>
7c657876b63cb1d8a2ec06f8fc6c37bb8412e66c 10-Aug-2005 Arnaldo Carvalho de Melo <acme@ghostprotocols.net> [DCCP]: Initial implementation

Development to this point was done on a subversion repository at:

http://oops.ghostprotocols.net:81/cgi-bin/viewcvs.cgi/dccp-2.6/

This repository will be kept at this site for the foreseable future,
so that interested parties can see the history of this code,
attributions, etc.

If I ever decide to take this offline I'll provide the full history at
some other suitable place.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>