History log of /net/dccp/ccids/ccid2.c
Revision Date Author Comments
eb93992207dadb946a3b5cf4544957dc924a6f58 19-Dec-2011 Rusty Russell <rusty@rustcorp.com.au> module_param: make bool parameters really bool (net & drivers/net)

module_param(bool) used to counter-intuitively take an int. In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option. For this version
it'll simply give a warning, but it'll break next kernel version.

(Thanks to Joe Perches for suggesting coccinelle for 0/1 -> true/false).

Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
d96a9e8dd04cf5ab2782ca6192e395c5ca373f7d 25-Jul-2011 Samuel Jero <sj323707@ohio.edu> dccp ccid-2: check Ack Ratio when reducing cwnd

This patch causes CCID-2 to check the Ack Ratio after reducing the congestion
window. If the Ack Ratio is greater than the congestion window, it is
reduced. This prevents timeouts caused by an Ack Ratio larger than the
congestion window.

In this situation, we choose to set the Ack Ratio to half the congestion window
(or one if that's zero) so that if we loose one ack we don't trigger a timeout.

Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
0ce95dc792549e0cf704e74aa8acb15a401f8cca 25-Jul-2011 Samuel Jero <sj323707@ohio.edu> dccp ccid-2: increment cwnd correctly

This patch fixes an issue where CCID-2 will not increase the congestion
window for numerous RTTs after an idle period, application-limited period,
or a loss once the algorithm is in Congestion Avoidance.

What happens is that, when CCID-2 is in Congestion Avoidance mode, it will
increase hc->tx_packets_acked by one for every packet and will increment cwnd
every cwnd packets. However, if there is now an idle period in the connection,
cwnd will be reduced, possibly below the slow start threshold. This will
cause the connection to go into Slow Start. However, in Slow Start CCID-2
performs this test to increment cwnd every second ack:

++hc->tx_packets_acked == 2

Unfortunately, this will be incorrect, if cwnd previous to the idle period
was larger than 2 and if tx_packets_acked was close to cwnd. For example:
cwnd=50 and tx_packets_acked=45.

In this case, the current code, will increment tx_packets_acked until it
equals two, which will only be once tx_packets_acked (an unsigned 32-bit
integer) overflows.

My fix is simply to change that test for tx_packets_acked greater than or
equal to two in slow start.

Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
d346d886a4c7f771c184e73833133f23a18de884 25-Jul-2011 Samuel Jero <sj323707@ohio.edu> dccp ccid-2: prevent cwnd > Sequence Window

Add a check to prevent CCID-2 from increasing the cwnd greater than the
Sequence Window.

When the congestion window becomes bigger than the Sequence Window, CCID-2
will attempt to keep more data in the network than the DCCP Sequence Window
code considers possible. This results in the Sequence Window code issuing
a Sync, thereby inducing needless overhead. Further, if this occurs at the
sender, CCID-2 will never detect the problem because the Acks it receives
will indicate no losses. I have seen this cause a drop of 1/3rd in throughput
for a connection.

Also add code to adjust the Sequence Window to be about 5 times the number of
packets in the network (RFC 4340, 7.5.2) and to adjust the Ack Ratio so that
the remote Sequence Window will hold about 5 times the number of packets in
the network. This allows the congestion window to increase correctly without
being limited by the Sequence Window.

Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
31daf0393fbb17cf6efe613fb538a3ea4b5202e4 25-Jul-2011 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: use feature-negotiation to report Ack Ratio changes

This uses the new feature-negotiation framework to signal Ack Ratio changes,
as required by RFC 4341, sec. 6.1.2.

That raises some problems with CCID-2, which at the moment can not cope
gracefully with Ack Ratios > 1. Since these issues are not directly related
to feature negotiation, they are marked by a FIXME.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.uk>
113ced1f52e5ed2dfedc0771a1b11b536cde8168 03-Jul-2011 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Perform congestion-window validation

CCID-2's cwnd increases like TCP during slow-start, which has implications for
* the local Sequence Window value (should be > cwnd),
* the Ack Ratio value.
Hence an exponential growth, if it does not reflect the actual network
conditions, can quickly lead to instability.

This patch adds congestion-window validation (RFC2861) to CCID-2:
* cwnd is constrained if the sender is application limited;
* cwnd is reduced after a long idle period, as suggested in the '90 paper
by Van Jacobson, in RFC 2581 (sec. 4.1);
* cwnd is never reduced below the RFC 3390 initial window.

As marked in the comments, the code is actually almost a direct copy of the
TCP congestion-window-validation algorithms. By continuing this work, it may
in future be possible to use the TCP code (not possible at the moment).

The mechanism can be turned off using a module parameter. Sampling of the
currently-used window (moving-maximum) is however done constantly; this is
used to determine the expected window, which can be exploited to regulate
DCCP's Sequence Window value.

This patch also sets slow-start-after-idle (RFC 4341, 5.1), i.e. it behaves like
TCP when net.ipv4.tcp_slow_start_after_idle = 1.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
58fdea0f3170c13a3b875ef904d5b67cf73814be 03-Jul-2011 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Use existing function to test for data packets

This replaces a switch statement with a test, using the equivalent
function dccp_data_packet(skb). It also doubles the range of the field
`rx_num_data_pkts' by changing the type from `int' to `u32', avoiding
signed/unsigned comparison with the u16 field `dccps_r_ack_ratio'.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
b4d5f4b2884625d13c7ef5b9fd085ec93bbf545c 03-Jul-2011 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: move rfc 3390 function into header file

This moves CCID-2's initial window function into the header file, since several
parts throughout the CCID-2 code need to call it (CCID-2 still uses RFC 3390).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Leandro Melo de Sales <leandro@ic.ufal.br>
442b9635c569fef038d5367a7acd906db4677ae1 03-Feb-2011 David S. Miller <davem@davemloft.net> tcp: Increase the initial congestion window to 10.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Nandita Dukkipati <nanditad@google.com>
7e87fe84303cc54ecf3c7b688cb08ca24322a41d 14-Nov-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Separate option parsing from CCID processing

This patch replaces an almost identical replication of code: large parts
of dccp_parse_options() re-appeared as ccid2_ackvector() in ccid2.c.

Apart from the duplication, this caused two more problems:
1. CCIDs should not need to be concerned with parsing header options;
2. one can not assume that Ack Vectors appear as a contiguous area within an
skb, it is legal to insert other options and/or padding in between. The
current code would throw an error and stop reading in such a case.

Since Ack Vectors provide CCID-specific information, they are now processed
by the CCID directly, separating this functionality from the main DCCP code.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
f17a37c9b8c4b32c01e501a84fa6f30e344c6110 10-Nov-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Ack Vector interface clean-up

This patch brings the Ack Vector interface up to date. Its main purpose is
to lay the basis for the subsequent patches of this set, which will use the
new data structure fields and routines.

There are no real algorithmic changes, rather an adaptation:

(1) Replaced the static Ack Vector size (2) with a #define so that it can
be adapted (with low loss / Ack Ratio, a value of 1 works, so 2 seems
to be sufficient for the moment) and added a solution so that computing
the ECN nonce will continue to work - even with larger Ack Vectors.

(2) Replaced the #defines for Ack Vector states with a complete enum.

(3) Replaced #defines to compute Ack Vector length and state with general
purpose routines (inlines), and updated code to use these.

(4) Added a `tail' field (conversion to circular buffer in subsequent patch).

(5) Updated the (outdated) documentation for Ack Vector struct.

(6) All sequence number containers now trimmed to 48 bits.

(7) Removal of unused bits:
* removed dccpav_ack_nonce from struct dccp_ackvec, since this is already
redundantly stored in the `dccpavr_ack_nonce' (of Ack Vector record);
* removed Elapsed Time for Ack Vectors (it was nowhere used);
* replaced semantics of dccpavr_sent_len with dccpavr_ack_runlen, since
the code needs to be able to remember the old run length;
* reduced the de-/allocation routines (redundant / duplicate tests).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
1c0e0a0569e925220c2948ea9b92fc013895917f 27-Oct-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Stop polling

This updates CCID-2 to use the CCID dequeuing mechanism, converting from
previous continuous-polling to a now event-driven mechanism.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
baf9e782e1dc4991edecfa3b8700cf8739c40259 11-Oct-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: remove unused argument in CCID tx function

This removes the argument `more' from ccid_hc_tx_packet_sent, since it was
nowhere used in the entire code.

(Btw, this argument was not even used in the original KAME code where the
function initially came from; compare the variable moreToSend in the
freebsd61-dccp-kame-28.08.2006.patch kept by Emmanuel Lochin.)

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
4886fcad6e12572afbd230dfab1b268eace20d6d 29-Aug-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Share TCP's minimum RTO code

Using a fixed RTO_MIN of 0.2 seconds was found to cause problems for CCID-2
over 802.11g: at least once per session there was a spurious timeout. It
helped to then increase the the value of RTO_MIN over this link.

Since the problem is the same as in TCP, this patch makes the solution from
commit "05bb1fad1cde025a864a90cfeb98dcbefe78a44a"
"[TCP]: Allow minimum RTO to be configurable via routing metrics."
available to DCCP.

This avoids reinventing the wheel, so that e.g. the following works in the
expected way now also for CCID-2:

> ip route change 10.0.0.2 rto_min 800 dev ath0

Luckily this useful rto_min function was recently moved to net/tcp.h,
which simplifies sharing code originating from TCP.

Documentation also updated (plus minor whitespace fixes).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
22b71c8f4f3db8df92f5e7b081c265bc56c0bd2f 29-Aug-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk> tcp/dccp: Consolidate common code for RFC 3390 conversion

This patch consolidates initial-window code common to TCP and CCID-2:
* TCP uses RFC 3390 in a packet-oriented manner (tcp_input.c) and
* CCID-2 uses RFC 3390 in packet-oriented manner (RFC 4341).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
d26eeb07fd02de31848b59d19687daff0e93532f 29-Aug-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Remove wrappers around sk_{reset,stop}_timer()

This removes the wrappers around the sk timer functions, since not much is
gained from using them: the BUG_ON in start_rto_timer will never trigger
since that function is called only if:

* the RTO timer expires (rto_expire, and then timer_pending() is false);
* in tx_packet_sent only if !timer_pending() (BUG_ON is redundant here);
* previously in new_ack, after stopping the timer (timer_pending() false).

Removing the wrappers also clears the way for eventually replacing the
RTO timer with the icsk-retransmission-timer, as it is already part of the
DCCP socket.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
d82b6f85c1d73340ef4a26bd0b247ac14610cd83 29-Aug-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Use u32 timestamps uniformly

Since CCID-2 is de facto a mini implementation of TCP, it makes sense to share
as much code as possible.

Hence this patch aligns CCID-2 timestamping with TCP timestamping.
This also halves the space consumption (on 64-bit systems).

The necessary include file <net/tcp.h> is already included by way of
net/dccp.h. Redundant includes have been removed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
231cc2aaf14bad3b2325be0b19b8385ff5e75485 22-Aug-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Replace broken RTT estimator with better algorithm

The current CCID-2 RTT estimator code is in parts broken and lags behind the
suggestions in RFC2988 of using scaled variants for SRTT/RTTVAR.

That code is replaced by the present patch, which reuses the Linux TCP RTT
estimator code.

Further details:
----------------
1. The minimum RTO of previously one second has been replaced with TCP's, since
RFC4341, sec. 5 says that the minimum of 1 sec. (suggested in RFC2988, 2.4)
is not necessary. Instead, the TCP_RTO_MIN is used, which agrees with DCCP's
concept of a default RTT (RFC 4340, 3.4).
2. The maximum RTO has been set to DCCP_RTO_MAX (64 sec), which agrees with
RFC2988, (2.5).
3. De-inlined the function ccid2_new_ack().
4. Added a FIXME: the RTT is sampled several times per Ack Vector, which will
give the wrong estimate. It should be replaced with one sample per Ack.
However, at the moment this can not be resolved easily, since
- it depends on TX history code (which also needs some work),
- the cleanest solution is not to use the `sent' time at all (saves 4 bytes
per entry) and use DCCP timestamps / elapsed time to estimated the RTT,
which however is non-trivial to get right (but needs to be done).

Reasons for reusing the Linux TCP estimator algorithm:
------------------------------------------------------
Some time was spent to find a better alternative, using basic RFC2988 as a first
step. Further analysis and experimentation showed that the Linux TCP RTO
estimator is superior to a basic RFC2988 implementation. A summary is on
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid2/rto_estimator/

In addition, this estimator fared well in a recent empirical evaluation:

Rewaskar, Sushant, Jasleen Kaur and F. Donelson Smith.
A Performance Study of Loss Detection/Recovery in Real-world TCP
Implementations. Proceedings of 15th IEEE International
Conference on Network Protocols (ICNP-07), 2007.

Thus there is significant benefit in reusing the existing TCP code.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
c38c92a84a9291a3d0eaf6a13650a11961ae964f 22-Aug-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Simplify dec_pipe and rearming of RTO timer

This removes the dec_pipe function and improves the way the RTO timer is rearmed
when a new acknowledgment comes in.

Details and justification for removal:
--------------------------------------
1) The BUG_ON in dec_pipe is never triggered: pipe is only decremented for TX
history entries between tail and head, for which it had previously been
incremented in tx_packet_sent; and it is not decremented twice for the same
entry, since it is
- either decremented when a corresponding Ack Vector cell in state 0 or 1
was received (and then ccid2s_acked==1),
- or it is decremented when ccid2s_acked==0, as part of the loss detection
in tx_packet_recv (and hence it can not have been decremented earlier).

2) Restarting the RTO timer happens for every single entry in each Ack Vector
parsed by tx_packet_recv (according to RFC 4340, 11.4 this can happen up to
16192 times per Ack Vector).

3) The RTO timer should not be restarted when all outstanding data has been
acknowledged. This is currently done similar to (2), in dec_pipe, when
pipe has reached 0.

The patch onsolidates the code which rearms the RTO timer, combining the
segments from new_ack and dec_pipe. As a result, the code becomes clearer
(compare with tcp_rearm_rto()).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
30564e355511b434613aa42375317b5a07fc9f23 22-Aug-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Remove redundant sanity tests

This removes the ccid2_hc_tx_check_sanity function: it is redundant.

Details:

The tx_check_sanity function performs three tests:
1) it checks that the circular TX list is sorted
- in ascending order of sequence number (ccid2s_seq)
- and time (ccid2s_sent),
- in the direction from `tail' (hctx_seqt) to `head' (hctx_seqh);
2) it ensures that the entire list has the length seqbufc * CCID2_SEQBUF_LEN;
3) it ensures that pipe equals the number of packets that were not
marked `acked' (ccid2s_acked) between `tail' and `head'.

The following argues that each of these tests is redundant, this can be verified
by going through the code.

(1) is not necessary, since both time and GSS increase from one packet to the
next, so that subsequent insertions in tx_packet_sent (which advance the `head'
pointer) will be in ascending order of time and sequence number.

In (2), the length of the list is always equal to seqbufc times CCID2_SEQBUF_LEN
(set to 1024) unless allocation caused an earlier failure, because:
* at initialisation (tx_init), there is one chunk of size 1024 and seqbufc=1;
* subsequent calls to tx_alloc_seq take place whenever head->next == tail in
tx_packet_sent; then a new chunk of size 1024 is inserted between head and
tail, and seqbufc is incremented by one.

To show that (3) is redundant requires looking at two cases.

The `pipe' variable of the TX socket is incremented only in tx_packet_sent, and
decremented in tx_packet_recv. When head == tail (TX history empty) then pipe
should be 0, which is the case directly after initialisation and after a
retransmission timeout has occurred (ccid2_hc_tx_rto_expire).

The first case involves parsing Ack Vectors for packets recorded in the live
portion of the buffer, between tail and head. For each packet marked by the
receiver as received (state 0) or ECN-marked (state 1), pipe is decremented by
one, so for all such packets the BUG_ON in tx_check_sanity will not trigger.

The second case is the loss detection in the second half of tx_packet_recv,
below the comment "Check for NUMDUPACK".

The first while-loop here ensures that the sequence number of `seqp' is either
above or equal to `high_ack', or otherwise equal to the highest sequence number
sent so far (of the entry head->prev, as head points to the next unsent entry).
The next while-loop ("while (1)") counts the number of acked packets starting
from that position of seqp, going backwards in the direction from head->prev to
tail. If NUMDUPACK=3 such packets were counted within this loop, `seqp' points
to the last acknowledged packet of these, and the "if (done == NUMDUPACK)" block
is entered next.
The while-loop contained within that block in turn traverses the list backwards,
from head to tail; the position of `seqp' is saved in the variable `last_acked'.
For each packet not marked as `acked', a congestion event is triggered within
the loop, and pipe is decremented. The loop terminates when `seqp' has reached
`tail', whereupon tail is set to the position previously stored in `last_acked'.
Thus, between `last_acked' and the previous position of `tail',
- pipe has been decremented earlier if the packet was marked as state 0 or 1;
- pipe was decremented if the packet was not marked as acked.
That is, pipe has been decremented by the number of packets between `last_acked'
and the previous position of `tail'. As a consequence, pipe now again reflects
the number of packets which have not (yet) been acked between the new position
of tail (at `last_acked') and head->prev, or 0 if head==tail. The result is that
the BUG_ON condition in check_sanity will also not be triggered, hence the test
(3) is also redundant.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
67b67e365f07d6dc70f3bb266af3268bac0a4836 22-Aug-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk> ccid: ccid-2/3 code cosmetics

This patch collects cosmetics-only changes to separate these from
code changes:
* update with regard to CodingStyle and whitespace changes,
* documentation:
- adding/revising comments,
- remove CCID-3 RX socket documentation which is either
duplicate or refers to fields that no longer exist,
* expand embedded tfrc_tx_info struct inline for consistency,
removing indirections via #define.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
5a0e3ad6af8660be21ca98a971cd00f331318c05 24-Mar-2010 Tejun Heo <tj@kernel.org> include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
77d2dd93742222973d253443d98ab8402d641038 05-Oct-2009 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Remove CCID naming redundancy 1/2

This removes a redundancy in the CCID half-connection (hc) naming scheme:
* instead of 'hctx->tx_...', write 'hc->tx_...';
* instead of 'hcrx->rx_...', write 'hc->rx_...';

which works because the 'type' of the half-connection is encoded in the
'rx_' / 'tx_' prefixes.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b1c00fe3cf8f54d97d20cdf196145a106f04bd63 05-Oct-2009 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Overhaul CCID naming convention 1/2

This patch starts a less problematic naming convention for CCID structs.

The old naming convention used 'hc{tx,rx}->ccid?hc{tx,rx}->...' as
recurring prefixes, which made the code
* hard to write (not easy to fit into 80 characters);
* hard to read (most of the space is occupied by prefixes).

The new naming scheme:
* struct entries for the TX socket are prefixed by 'tx_';
* and those for the RX socket are prefixed by 'rx_'.

The identifiers then remain distinguishable when grep-ing through the tree:
(a) RX/TX sockets are distinguished by the naming scheme,
(b) individual CCIDs are distinguished by filename (ccid{2,3,4}.{c,h}).

This first patch implements the scheme for CCID-2.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
aa1b1ff0991b469eca6fde4456190df6ed59ff40 12-Sep-2009 Gerrit Renker <gerrit@erg.abdn.ac.uk> net-next-2.6 [PATCH 1/1] dccp: ccids whitespace-cleanup / CodingStyle

No code change, cosmetical changes only:

* whitespace cleanup via scripts/cleanfile,
* remove self-references to filename at top of files,
* fix coding style (extraneous brackets),
* fix documentation style (kernel-doc-nano-HOWTO).

Thanks are due to Ivo Augusto Calado who raised these issues by
submitting good-quality patches.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
ddebc973c56b51b4e5d84d606f0430d81b895d67 05-Jan-2009 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Lockless integration of CCID congestion-control plugins

Based on Arnaldo's earlier patch, this patch integrates the standardised
CCID congestion control plugins (CCID-2 and CCID-3) of DCCP with dccp.ko:

* enables a faster connection path by eliminating the need to always go
through the CCID registration lock;

* updates the implementation to use only a single array whose size equals
the number of configured CCIDs instead of the maximum (256);

* since the CCIDs are now fixed array elements, synchronization is no
longer needed, simplifying use and implementation.

CCID-2 is suggested as minimum for a basic DCCP implementation (RFC 4340, 10);
CCID-3 is a standards-track CCID supported by RFC 4342 and RFC 5348.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
e8ef967a54f401ac5e8637b7f7f8bddb006144c4 12-Nov-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Registration routines for changing feature values

Two registration routines, for SP and NN features, are provided by this patch,
replacing a previous routine which was used for both feature types.

These are internal-only routines and therefore start with `__feat_register'.

It further exports the known limits of Sequence Window and Ack Ratio as symbolic
constants.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
410e27a49bb98bc7fa3ff5fc05cc313817b9f253 09-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> This reverts "Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/dccp_exp"
as it accentally contained the wrong set of patches. These will be
submitted separately.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
6224877b2ca4be5de96270a8ae490fe2ba11b0e0 04-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> tcp/dccp: Consolidate common code for RFC 3390 conversion

This patch consolidates the code common to TCP and CCID-2:
* TCP uses RFC 3390 in a packet-oriented manner (tcp_input.c) and
* CCID-2 uses RFC 3390 in packet-oriented manner (RFC 4341).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
20bbd0f75ee4b72c1dafc8e5fb6ad39ba506a75c 04-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Remove wrappers around sk_{reset,stop}_timer()

This removes the wrappers around the sk timer functions as it makes the code
clearer and not much is gained from using wrappers: the BUG_ON in
start_rto_timer will never trigger since that function was called only when
* the RTO timer expired (rto_expire, and then timer_pending() is false);
* in tx_packet_sent only if !timer_pending() (BUG_ON is redundant here);
* previously in new_ack, after stopping the timer (timer_pending() false).

One further motive behind this patch is to replace the RTO timer with the
icsk retransmission timer, as it is already part of the DCCP socket.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
1435562d7e0412e4885b661843f69859013f9d25 04-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Replace broken RTT estimator with better algorithm

The current CCID-2 RTT estimator code is in parts broken and lags behind the
suggestions in RFC2988 of using scaled variants for SRTT/RTTVAR.
That code is replaced by the present patch, which reuses the Linux TCP RTT
estimator code - reasons for this code duplication are given below.

Further details:
----------------
1. The minimum RTO of previously one second has been replaced with TCP's, since
RFC4341, sec. 5 says that the minimum of 1 sec. (suggested in RFC2988, 2.4)
is not necessary. Instead, the TCP_RTO_MIN is used, which agrees with DCCP's
concept of a default RTT (RFC 4340, 3.4).
2. The maximum RTO has been set to DCCP_RTO_MAX (64 sec), which agrees with
RFC2988, (2.5).
3. De-inlined the function ccid2_new_ack().
4. Added a FIXME: the RTT is sampled several times per Ack Vector, which will
give the wrong estimate. It should be replaced with one sample per Ack.
However, at the moment this can not be resolved easily, since
- it depends on TX history code (which also needs some work),
- the cleanest solution is not to use the `sent' time at all (saves 4 bytes
per entry) and use DCCP timestamps / elapsed time to estimated the RTT,
which however is non-trivial to get right (but needs to be done).

Reasons for reusing the Linux TCP estimator algorithm:
------------------------------------------------------
Some time was spent to find a better alternative, using basic RFC2988 as a first
step. Further analysis and experimentation showed that the Linux TCP RTO
estimator is superior to a basic RFC2988 implementation. A summary is on
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid2/rto_estimator/

In addition, this estimator fared well in a recent empirical evaluation:

Rewaskar, Sushant, Jasleen Kaur and F. Donelson Smith.
A Performance Study of Loss Detection/Recovery in Real-world TCP
Implementations. Proceedings of 15th IEEE International
Conference on Network Protocols (ICNP-07). 2007.

Thus there is significant benefit in reusing the existing TCP code.


Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
e9803c0104564698d3b8e84ccdb0b8b0e65427e2 04-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Simplify dec_pipe and rearming of RTO timer

This removes the dec_pipe function and improves the way the RTO timer is rearmed
when a new acknowledgment comes in.

Details and justification for removal:
--------------------------------------
1) The BUG_ON in dec_pipe is never triggered: pipe is only decremented for TX
history entries between tail and head, for which it had previously been
incremented in tx_packet_sent; and it is not decremented twice for the same
entry, since it is
- either decremented when a corresponding Ack Vector cell in state 0 or 1
was received (and then ccid2s_acked==1),
- or it is decremented when ccid2s_acked==0, as part of the loss detection
in tx_packet_recv (and hence it can not have been decremented earlier).

2) Restarting the RTO timer happens for every single entry in each Ack Vector
parsed by tx_packet_recv (according to RFC 4340, 11.4 this can happen up to
16192 times per Ack Vector).

3) The RTO timer should not be restarted when all outstanding data has been
acknowledged. This is currently done similar to (2), in dec_pipe, when
pipe has reached 0.

The patch onsolidates the code which rearms the RTO timer, combining the
segments from new_ack and dec_pipe. As a result, the code becomes clearer
(compare with tcp_rearm_rto()).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
c6f0f2e71f3088a0f05502d6adb0f667b84028c3 04-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Remove redundant sanity tests

This removes the ccid2_hc_tx_check_sanity function: it is redundant.

Details:
========
The tx_check_sanity function performs three tests:
1) it checks that the circular TX list is sorted
- in ascending order of sequence number (ccid2s_seq)
- and time (ccid2s_sent),
- in the direction from `tail' (hctx_seqt) to `head' (hctx_seqh);
2) it ensures that the entire list has the length seqbufc * CCID2_SEQBUF_LEN;
3) it ensures that pipe equals the number of packets that were not
marked `acked' (ccid2s_acked) between `tail' and `head'.

The following argues that each of these tests is redundant, this can be verified
by going through the code.

(1) is not necessary, since both time and GSS increase from one packet to the
next, so that subsequent insertions in tx_packet_sent (which advance the `head'
pointer) will be in ascending order of time and sequence number.

In (2), the length of the list is always equal to seqbufc times CCID2_SEQBUF_LEN
(set to 1024) unless allocation caused an earlier failure, because:
* at initialisation (tx_init), there is one chunk of size 1024 and seqbufc=1;
* subsequent calls to tx_alloc_seq take place whenever head->next == tail in
tx_packet_sent; then a new chunk of size 1024 is inserted between head and
tail, and seqbufc is incremented by one.

To show that (3) is redundant requires looking at two cases.

The `pipe' variable of the TX socket is incremented only in tx_packet_sent, and
decremented in tx_packet_recv. When head == tail (TX history empty) then pipe
should be 0, which is the case directly after initialisation and after a
retransmission timeout has occurred (ccid2_hc_tx_rto_expire).

The first case involves parsing Ack Vectors for packets recorded in the live
portion of the buffer, between tail and head. For each packet marked by the
receiver as received (state 0) or ECN-marked (state 1), pipe is decremented by
one, so for all such packets the BUG_ON in tx_check_sanity will not trigger.

The second case is the loss detection in the second half of tx_packet_recv,
below the comment "Check for NUMDUPACK".

The first while-loop here ensures that the sequence number of `seqp' is either
above or equal to `high_ack', or otherwise equal to the highest sequence number
sent so far (of the entry head->prev, as head points to the next unsent entry).
The next while-loop ("while (1)") counts the number of acked packets starting
from that position of seqp, going backwards in the direction from head->prev to
tail. If NUMDUPACK=3 such packets were counted within this loop, `seqp' points
to the last acknowledged packet of these, and the "if (done == NUMDUPACK)" block
is entered next.
The while-loop contained within that block in turn traverses the list backwards,
from head to tail; the position of `seqp' is saved in the variable `last_acked'.
For each packet not marked as `acked', a congestion event is triggered within
the loop, and pipe is decremented. The loop terminates when `seqp' has reached
`tail', whereupon tail is set to the position previously stored in `last_acked'.
Thus, between `last_acked' and the previous position of `tail',
- pipe has been decremented earlier if the packet was marked as state 0 or 1;
- pipe was decremented if the packet was not marked as acked.
That is, pipe has been decremented by the number of packets between `last_acked'
and the previous position of `tail'. As a consequence, pipe now again reflects
the number of packets which have not (yet) been acked between the new position
of tail (at `last_acked') and head->prev, or 0 if head==tail. The result is that
the BUG_ON condition in check_sanity will also not be triggered, hence the test
(3) is also redundant.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
83337dae6ca94d801b6700600244865cd694205b 04-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Stop polling

This updates CCID2 to use the CCID dequeuing mechanism, converting from
previous constant-polling to a now event-driven mechanism.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
c8bf462bc567c3dcb083ff95cc13060dd06f138c 04-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Separate option parsing from CCID processing

This patch replaces an almost identical replication of code: large parts
of dccp_parse_options() re-appeared as ccid2_ackvector() in ccid2.c.

Apart from the duplication, this caused two more problems:
1. CCIDs should not need to be concerned with parsing header options;
2. one can not assume that Ack Vectors appear as a contiguous area within an
skb, it is legal to insert other options and/or padding in between. The
current code would throw an error and stop reading in such a case.

The patch provides a new data structure and associated list housekeeping.

Only small changes were necessary to integrate with CCID-2: data structure
initialisation, adapt list traversal routine, and add call to the provided
cleanup routine.

The latter also lead to fixing the following BUG: CCID-2 so far ignored
Ack Vectors on all packets other than Ack/DataAck, which is incorrect,
since Ack Vectors can be present on any packet that has an Ack field.

Details:
--------
* received Ack Vectors are parsed by dccp_parse_options() alone, which passes
the result on to the CCID-specific routine ccid_hc_tx_parse_options();
* CCIDs interested in using/decoding Ack Vector information will add code
to fetch parsed Ack Vectors via this interface;
* a data structure, `struct dccp_ackvec_parsed' is provided as interface;
* this structure arranges Ack Vectors of the same skb into a FIFO order;
* a doubly-linked list is used to keep the required FIFO code small.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
ff49e27089ec363b7fc3849504e0435d447ab18a 04-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Ack Vector interface clean-up

This patch brings the Ack Vector interface up to date. Its main purpose is
to lay the basis for the subsequent patches of this set, which will use the
new data structure fields and routines.

There are no real algorithmic changes, rather an adaptation:

(1) Replaced the static Ack Vector size (2) with a #define so that it can
be adapted (with low loss / Ack Ratio, a value of 1 works, so 2 seems
to be sufficient for the moment) and added a solution so that computing
the ECN nonce will continue to work - even with larger Ack Vectors.

(2) Replaced the #defines for Ack Vector states with a complete enum.

(3) Replaced #defines to compute Ack Vector length and state with general
purpose routines (inlines), and updated code to use these.

(4) Added a `tail' field (conversion to circular buffer in subsequent patch).

(5) Updated the (outdated) documentation for Ack Vector struct.

(6) All sequence number containers now trimmed to 48 bits.

(7) Removal of unused bits:
* removed dccpav_ack_nonce from struct dccp_ackvec, since this is already
redundantly stored in the `dccpavr_ack_nonce' (of Ack Vector record);
* removed Elapsed Time for Ack Vectors (it was nowhere used);
* replaced semantics of dccpavr_sent_len with dccpavr_ack_runlen, since
the code needs to be able to remember the old run length;
* reduced the de-/allocation routines (redundant / duplicate tests).


Justification for removing Elapsed Time information [can be removed]:
---------------------------------------------------------------------
1. The Elapsed Time information for Ack Vectors was nowhere used in the code.
2. DCCP does not implement rate-based pacing of acknowledgments. The only
recommendation for always including Elapsed Time is in section 11.3 of
RFC 4340: "Receivers that rate-pace acknowledgements SHOULD [...]
include Elapsed Time options". But such is not the case here.
3. It does not really improve estimation accuracy. The Elapsed Time field only
records the time between the arrival of the last acknowledgeable packet and
the time the Ack Vector is sent out. Since Linux does not (yet) implement
delayed Acks, the time difference will typically be small, since often the
arrival of a data packet triggers sending feedback at the HC-receiver.


Justification for changes in de-/allocation routines [can be removed]:
----------------------------------------------------------------------
* INIT_LIST_HEAD in dccp_ackvec_record_new was redundant, since the list
pointers were later overwritten when the node was added via list_add();
* dccp_ackvec_record_new() was called in a single place only;
* calls to list_del_init() before calling dccp_ackvec_record_delete() were
redundant, since subsequently the entire element was k-freed;
* since all calls to dccp_ackvec_record_delete() were preceded to a call to
list_del_init(), the WARN_ON test would never evaluate to true;
* since all calls to dccp_ackvec_record_delete() were made from within
list_for_each_entry_safe(), the test for avr == NULL was redundant;
* list_empty() in ackvec_free was redundant, since the same condition is
embedded in the loop condition of the subsequent list_for_each_entry_safe().

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
c506d91d9ab7681e058afcd750e9118c6cdaabc1 04-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Unused argument in CCID tx function

This removes the argument `more' from ccid_hc_tx_packet_sent, since it was
nowhere used in the entire code.

(Anecdotally, this argument was not even used in the original KAME code where
the function originally came from; compare the variable moreToSend in the
freebsd61-dccp-kame-28.08.2006.patch now maintained by Emmanuel Lochin.)

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
1fb87509606cb19f5f603e54c28af7da149049f3 04-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp ccid-2: Remove ccid2hc{tx,rx}_ prefixes

This patch fixes two problems caused by the ubiquitous long "hctx->ccid2htx_"
and "hcrx->ccid2hcrx_" prefixes:
* code becomes hard to read;
* multiple-line statements are almost inevitable even for simple expressions;
The prefixes are not really necessary (compare with "struct tcp_sock").

There had been previous discussion of this on dccp@vger, but so far this was
not followed up (most people agreed that the prefixes are too long).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Leandro Melo de Sales <leandroal@gmail.com>
86349c8d9c6892b57aff4549256ab1aa65aed0f0 04-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Registration routines for changing feature values

Two registration routines, for SP and NN features, are provided by this patch,
replacing a previous routine which was used for both feature types.

These are internal-only routines and therefore start with `__feat_register'.

It further exports the known limits of Sequence Window and Ack Ratio as symbolic
constants.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
432649916b0435b608fb3e1fcb97347ac294d38d 23-Aug-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Toggle debug output without module unloading

This sets the sysfs permissions so that root can toggle the `debug'
parameter available for nearly every DCCP module. This is useful
since there are various module inter-dependencies. The debug flag
can now be toggled at runtime using

echo 1 > /sys/module/dccp/parameters/dccp_debug
echo 1 > /sys/module/dccp_ccid2/parameters/ccid2_debug
echo 1 > /sys/module/dccp_ccid3/parameters/ccid3_debug
echo 1 > /sys/module/dccp_tfrc_lib/parameters/tfrc_debug

The last is not very useful yet, since no code at the moment calls
the tfrc_debug() macro.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
157439fa4a9b38ac4ce41e2fc379fc5031affec8 23-Aug-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk> dccp: Toggle debug output without module unloading

This sets the sysfs permissions so that root can toggle the `debug'
parameter available for nearly every DCCP module. This is useful
since there are various module inter-dependencies. The debug flag
can now be toggled at runtime using

echo 1 > /sys/module/dccp/parameters/dccp_debug
echo 1 > /sys/module/dccp_ccid2/parameters/ccid2_debug
echo 1 > /sys/module/dccp_ccid3/parameters/ccid3_debug
echo 1 > /sys/module/dccp_tfrc_lib/parameters/tfrc_debug

The last is not very useful yet, since no code at the moment calls
the tfrc_debug() macro.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
84994e16f25dabe234be4fc2d323ec9db95b87cb 03-May-2008 Harvey Harrison <harvey.harrison@gmail.com> dccp: ccid2.c, ccid3.c use clamp(), clamp_t()

Makes the intention of the nested min/max clear.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
84a97b0af8c29aa5a47cc5271968a9c6004fb91e 14-Dec-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID]: More informative registration

The patch makes the registration messages of CCID 2/3 a bit more
informative: instead of repeating the CCID number as currently done,

"CCID: Registered CCID 2 (ccid2)" or
"CCID: Registered CCID 3 (ccid3)",

the descriptive names of the CCID's (from RFCs) are now used:

"CCID: Registered CCID 2 (TCP-like)" and
"CCID: Registered CCID 3 (TCP-Friendly Rate Control)".

To allow spaces in the name, the slab name string has been changed to
refer to the numeric CCID identifier, using the same format as before.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dcfbc7e97a2e3a0d73a2e41e1bddb988dcca701e 25-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID2]: Remove misleading comment

This removes a comment which identifies an `issue' with dccp_write_xmit() where there is none.
The comment assumes it is possible that a packet is sent between the calls to

ccid_hc_tx_send_packet(),
dccp_transmit_skb(),
ccid_hc_tx_packet_sent()

(in the above order) in dccp_write_xmit().

I think that this is impossible, since dccp_write_xmit() is always called under lock:

* when called as dccp_write_xmit(sk, 1) from dccp_send_close(), the socket is locked
(see code comment above dccp_send_close());
* when called as dccp_write_xmit(sk, 0) from dccp_send_msg(), it is after lock_sock() has been called;
* when called as dccp_write_xmit(sk, 0) from dccp_write_xmit_timer(), bh_lock_sock() has been called
and the if/else statement has made sure that sk_lock.owner is not set;
* there are no other places where dccp_write_xmit() is called.

Furthermore, the debug statement for printing the sequence number of the packet just sent has been
removed, since the entire list is being printed anyway and so the entry of that number appears last.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a302002516a094015e5d004b8d939a8a34559c82 25-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID2]: Remove redundant ack-counting variable

The code used two different variables to count Acks, one of them redundant.
This patch reduces the number of Ack counters to one.

The type of the Ack counter has also been changed to u32 (twice the range of int);
and the variable has been renamed into `packets_acked' - for consistency with
RFC 3465 (and similarly named variables are used by TCP and SCTP).

Lastly, a slightly less aggressive `maxincr' increment is used (for even Ack Ratios,
maxincr was Ack Ratio/2 + 1 instead of Ack Ratio/2).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
83399361c30f2ffae20ee348ba9ada9a856d499a 25-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID2]: Remove redundant synchronisation variable

This removes the synchronisation variable `ccid2hctx_sendwait', which is set to 1
when the CCID2 sender may send a new packet, and which is set to 0 otherwise

The variable is redundant, since it is only used in combination with the hc_tx_send_packet/
hc_tx_packet_sent function pair. Both functions are called under socket lock, so the
following happens when the CCID2 may send a new packet:

* it sets sendwait = 1 in tx_send_packet and returns 0;
* the subsequent call to tx_packet_sent clears the sendwait flag;
* since tx_send_packet returns 0 if and only if sendwait == 1, the BUG_ON condition
in tx_packet_sent is never satisfied, since that function is never called when
tx_send_packet returns a value different from 0 (cf. dccp_write_xmit);
* the call to tx_packet_sent clears the flag so that the condition "!sendwait" is
true the next time tx_packet_sent is called.

In other words, it is sufficient to just return 0 / not-0 to synchronise tx_send_packet
and tx_packet_sent -- which is what the patch does.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
da98e0b5d4c1f88b7c9e63e8918783cd4905be2b 25-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID2]: Redundant debugging output

This reduces the amount of redundant debugging messages:

* pipe/cwnd are printed in both tx_send_packet() and tx_packet_sent().
Both functions are called immediately after one another, so one occurrence is sufficient.

* Since tx_packet_sent() prints pipe/cwnd already, the second printk for pipe is redundant.

* In tx_packet_sent() the check_sanity function is called twice (at the begin and at the end).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
95b21d7e9d099f1cffca08e40f292d6658a88b3c 25-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID2]: Replace pipe assignment-function with assignment

The function ccid2_change_pipe only does an assignment. This patch simplifies the code by
replacing the function with the assignment it performs.

Furthermore, the type of pipe is promoted from `signed' to unsigned (increasing the range).
As a result, a BUG_ON test for negative values now becomes obsolete (for safety not removed,
but replaced with a less annoying `DCCP_BUG').

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3deeadd74bbf916b502d307222833ffcf68db557 25-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID2]: Replace cwnd assignment-function with assignment

The current function ccid2_change_cwnd in effect makes only an assignment, as
the test whether cwnd has reached 0 is only required when cwnd is halved.

This patch simplifies the code by replacing the function with the assignment
it performs.

Furthermore, since ssthresh derives from cwnd and appears in many assignments and
comparisons, the type of ssthresh has also been changed to match that of cwnd.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
63df18ad7fb91c65dafc89d3cf94a58a486ad416 25-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID2]: Replace read-only variable with constant

This replaces the field member `numdupack', which was used as a read-only
constant in the code, with a #define.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7792cd8885954eb7ac38e781a7a9faae5a80a3d8 25-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID2]: Remove unused variable

This removes a variable `ccid2hctx_sent' which is incremented but
never referenced/read (i.e., dead code).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
900bfed4718126e6c32244903b6f43e0990d04ad 25-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID2]: Disable broken Ack Ratio adaptation algorithm

This comments out a problematic section comprising a half-finished algorithm:

- The variable `ccid2hctx_ackloss' is never initialised to a value different from 0 and
hence in fact is a read-only constant.
- The `arsent' variable counts packets other than Acks (it is incremented for every packet),
and there is no test for Ack Loss.
- The concept of counting Acks as such leads to a complex calculation, and the calculation
at the moment is inconsistent with this concept.
The problem is that the number of Acks - rather than the number of windows - is counted,
which leads to a complex (cubic/quadratic) expression - this is not even implemented.

In its current state, the commented-out algorithm interfers with normal processing by
changing Ack Ratio incorrectly, and at the wrong times.

A new algorithm is necessary, which will not necessarily use the same variables as used by
the unfinished one; hence the old variables have been removed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b00d2bbc45a287c9a72374582ce42205f3412419 25-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID2]: Larger initial windows also for CCID2

RFC 4341, sec. 5 states that "The cwnd parameter is initialized to at most
four packets for new connections, following the rules from [RFC3390]", which
is implemented by this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d50ad163e6db2dcc365b8d02b30350220f86df04 25-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID2]: Deadlock and spurious timeouts when Ack Ratio > cwnd

This patch removes a bug in the current code. I agree with Andrea's comment
that there is a problem here but the way it is treated does not fix it.

The problem is that whenever Ack Ratio > cwnd, starvation/deadlock occurs:
* the receiver will not send an Ack until (Ack Ratio - cwnd) data packets
have arrived;
* the sender will not send any data packet before the receipt of an Ack
advances the send window.
The only way that the connection then progresses was via RTO timeout. In one
extreme case (bulk transfer), it was observed that this happened for every single
packet; i.e. hundreds of packets, each a RTO timeout of 1..3 seconds apart:
a transfer which normally would take a fraction of a second thus grew to
several minutes.

The solution taken by this approach is to observe the relation

"Ack Ratio <= cwnd"

by using the constraint (1) from RFC 4341, 6.1.2; i.e. set

Ack Ratio = ceil(cwnd / 2)

and update it whenever either Ack Ratio or cwnd change. This ensures that
the deadlock problem can not arise.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
df054e1d00fdafa2e2920319df326ddb3f0d0413 25-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID2]: Don't assign negative values to Ack Ratio

Since it makes not sense to assign negative values to Ack Ratio, this
patch disallows this possibility.

As a consequence, a Bug test for negative Ack Ratio values becomes obsolete.

Furthermore, a check against overflow (as Ack Ratio may not exceed 2 bytes,
due to RFC 4340, 11.3) has been added.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cfbbeabc8864902c4af1c0cadf0972b352930a26 24-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID2]: Fix sequence number arithmetic/comparisons

This replaces use of normal subtraction with modulo-48 subtraction.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3de5489f47febe0333b142e0eb6389b9924b2634 24-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID2]: Bug in reading Ack Vectors

In CCID2 the receiver-history is sorted in ascending order of sequence number,
but the processing of received Ack Vectors requires the list traversal in the
opposite direction.

The current code has a bug in this regard: the list traversal is upwards. As a
consequence, only Ack Vectors with a run length of 1 will pass, in all other
Ack Vectors the remaining (acked) sequence numbers are missed, and may later
falsely be identified as lost.

Note: This bug is only visible when Ack Ratio > 1, since otherwise the run
lengths of Ack Vectors are 0.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b24b8a247ff65c01b252025926fe564209fae4fc 24-Jan-2008 Pavel Emelyanov <xemul@openvz.org> [NET]: Convert init_timer into setup_timer

Many-many code in the kernel initialized the timer->function
and timer->data together with calling init_timer(timer). There
is already a helper for this. Use it for networking code.

The patch is HUGE, but makes the code 130 lines shorter
(98 insertions(+), 228 deletions(-)).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
24c667db59a9cc4caaafe4f77f6f4ef85899a454 24-Oct-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID2/3]: Initialisation assignments of 0 are redundant

Assigning initial values of `0' is redundant when loading a new CCID structure,
since in net/dccp/ccid.c the entire CCID structure is zeroed out prior to
initialisation in ccid_new():

struct ccid {
struct ccid_operations *ccid_ops;
char ccid_priv[0];
};

// ...
if (rx) {
memset(ccid + 1, 0, ccid_ops->ccid_hc_rx_obj_size);
if (ccid->ccid_ops->ccid_hc_rx_init != NULL &&
ccid->ccid_ops->ccid_hc_rx_init(ccid, sk) != 0)
goto out_free_ccid;
} else {
memset(ccid + 1, 0, ccid_ops->ccid_hc_tx_obj_size);
/* analogous to the rx case */
}

This patch therefore removes the redundant assignments. Thanks to Arnaldo for
the inspiration.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
5e28599a6e45eb8ce7e50510b06c3a34ebf1a8fa 04-Oct-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID2]: Sequence number wraparound issues

This replaces several uses of standard arithmetic with the DCCP
sequence number arithmetic functions. The problem here is that the
sequence number wrap-around was not taken into consideration.

* Condition "seqp->ccid2s_seq <= prev->ccid2s_seq" has been replaced
by

dccp_delta_seqno(seqp->ccid2s_seq, prev->ccid2s_seq) >= 0

since if seqp is `before' prev, then the delta_seqno() is positive.

* The test whether sequence numbers `a' and `b' are consecutive has
the form

dccp_delta_seqno(a, b) == 1

* Increment of ccid2hctx_rpseq could be done using dccp_inc_seqno(),
but since here the incremented ccid2hctx_rpseq == seqno, used
assignment instead.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6c583248083c30c5305ec561e79f666ca465b376 04-Oct-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID2]: Remove redundant case block

skb's passed to ccid2_hc_tx_send_packet() are headerless, the packet
type is decided later, in dccp_write_xmit(). Therefore the first test
of the switch/case block is always true, the others are never reached.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ee196c2186d24d82088c94962598470e5abc081f 04-Oct-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID2]: Remove redundant BUG_ON

This removes a test for `val < 1' which would only have been triggered
when val < 0, due to a preceding test for 0. Fixed by using an
unsigned type for cwnd (as in TCP) instead.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7d9e8931f93683e575679e41f188d3b465269f08 04-Oct-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID2]: Remove ugly BUG_ON

This removes an ugly BUG_ON which has been pointed out by Arnaldo.

Instead of freezing up the machine, a `critical' message is now issued
to the system log.

There is potential of doing this more gracefully (eg. there are a few
internal variables which could be updated despite the lack of memory),
but that requires more complicated changes to the algorithm; thus a
`FIXME' has been added.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cd1f7d347c9e51f348119811bd41b74346ec57b8 04-Oct-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [CCID2]: Simplify interface

This patch simplifies the interface of ccid2_hc_tx_alloc_seq():

* ccid2_hc_tx_alloc_seq() is always called with an argument of
CCID2_SEQBUF_LEN;

* other code - ccid2_hc_tx_check_sanity() - even depends on the
assumption that ccid2_hc_tx_alloc_seq() has been called with this
particular size;

* passing the `gfp_t' argument to ccid2_hc_tx_alloc_seq() is
redundant with gfp_any().

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
042d18f9f39a51716683b4e156fbee689314bb22 04-Oct-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [DCCP]: Make all `debug' parameters bool

This just sets the parameter to bool, since debugging messages are
either on or off.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
39dad26c37fdb1382e4173172a2704fa278f7fd6 20-Aug-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk> [DCCP]: Allocation in atomic context

This fixes the following bug reported in syslog:

[ 4039.051658] BUG: sleeping function called from invalid context at /usr/src/davem-2.6/mm/slab.c:3032
[ 4039.051668] in_atomic():1, irqs_disabled():0
[ 4039.051670] INFO: lockdep is turned off.
[ 4039.051674] [<c0104c0f>] show_trace_log_lvl+0x1a/0x30
[ 4039.051687] [<c0104d4d>] show_trace+0x12/0x14
[ 4039.051691] [<c0104d65>] dump_stack+0x16/0x18
[ 4039.051695] [<c011371e>] __might_sleep+0xaf/0xbe
[ 4039.051700] [<c0157b66>] __kmalloc+0xb1/0xd0
[ 4039.051706] [<f090416f>] ccid2_hc_tx_alloc_seq+0x35/0xc3 [dccp_ccid2]
[ 4039.051717] [<f09048d6>] ccid2_hc_tx_packet_sent+0x27f/0x2d9 [dccp_ccid2]
[ 4039.051723] [<f085486b>] dccp_write_xmit+0x1eb/0x338 [dccp]
[ 4039.051741] [<f085603d>] dccp_sendmsg+0x113/0x18f [dccp]
[ 4039.051750] [<c03907fc>] inet_sendmsg+0x2e/0x4c
[ 4039.051758] [<c033a47d>] sock_aio_write+0xd5/0x107
[ 4039.051766] [<c015abc1>] do_sync_write+0xcd/0x11c
[ 4039.051772] [<c015b296>] vfs_write+0x118/0x11f
[ 4039.051840] [<c015b932>] sys_write+0x3d/0x64
[ 4039.051845] [<c0103e7c>] syscall_call+0x7/0xb
[ 4039.051848] =======================

The problem was that GFP_KERNEL was used; fixed by using gfp_any().

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
c9eaf17341834de00351bf79f16b2d879c8aea96 09-Feb-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [NET] DCCP: Fix whitespace errors.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8109b02b5397ed52a32c116163a62a34f4768b26 10-Dec-2006 Arnaldo Carvalho de Melo <acme@mandriva.com> [DCCP]: Whitespace cleanups

That accumulated over the last months hackaton, shame on me for not
using git-apply whitespace helping hand, will do that from now on.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
6b57c93dc3aa0115b589cb89ef862d46ab9bd95e 28-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk> [DCCP]: Use `unsigned' for packet lengths

This patch implements a suggestion by Ian McDonald and

1) Avoids tests against negative packet lengths by using unsigned int
for packet payload lengths in the CCID send_packet()/packet_sent() routines

2) As a consequence, it removes an now unnecessary test with regard to `len > 0'
in ccid3_hc_tx_packet_sent: that condition is always true, since
* negative packet lengths are avoided
* ccid3_hc_tx_send_packet flags an error whenever the payload length is 0.
As a consequence, ccid3_hc_tx_packet_sent is never called as all errors
returned by ccid_hc_tx_send_packet are caught in dccp_write_xmit

3) Removes the third argument of ccid_hc_tx_send_packet (the `len' parameter),
since it is currently always set to skb->len. The code is updated with regard
to this parameter change.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
59348b19efebfd6a8d0791ff81d207b16594c94b 20-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk> [DCCP]: Simplified conditions due to use of enum:8 states

This reaps the benefit of the earlier patch, which changed the type of
CCID 3 states to use enums, in that many conditions are now simplified
and the number of possible (unexpected) values is greatly reduced.

In a few instances, this also allowed to simplify pre-conditions; where
care has been taken to retain logical equivalence.

[DCCP]: Introduce a consistent BUG/WARN message scheme

This refines the existing set of DCCP messages so that
* BUG(), BUG_ON(), WARN_ON() have meaningful DCCP-specific counterparts
* DCCP_CRIT (for severe warnings) is not rate-limited
* DCCP_WARN() is introduced as rate-limited wrapper

Using these allows a faster and cleaner transition to their original
counterparts once the code has matured into a full DCCP implementation.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
84116716cc9404356f775443b460f76766f08f65 20-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk> [DCCP]: enable debug messages also for static builds

This patch
* makes debugging (when configured) work both for static / module build
* provides generic debugging macros for use in other DCCP / CCID modules
* adds missing information about debug parameters to Kconfig
* performs some code tidy-up

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
32aac18dfa0963fde40cc074ba97ebbae8b755f2 16-Nov-2006 Andrea Bittau <a.bittau@cs.ucl.ac.uk> [DCCP] CCID2: Code optimizations

These are code optimizations which are relevant when dealing with large
windows. They are not coded the way I would like to, but they do the job for
the short-term. This patch should be more neat.

Commiter note: Changed the seqno comparisions to use {after,before}48 to handle
wrapping.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
234af4840135342ab295b4e1219fd35c27fdd439 30-Oct-2006 Randy Dunlap <randy.dunlap@oracle.com> [DCCP]: fix printk format warnings

Fix printk format warnings:
build2.out:net/dccp/ccids/ccid2.c:355: warning: long long unsigned int format, u64 arg (arg 3)
build2.out:net/dccp/ccids/ccid2.c:360: warning: long long unsigned int format, u64 arg (arg 3)
build2.out:net/dccp/ccids/ccid2.c:482: warning: long long unsigned int format, u64 arg (arg 5)
build2.out:net/dccp/ccids/ccid2.c:639: warning: long long unsigned int format, u64 arg (arg 3)
build2.out:net/dccp/ccids/ccid2.c:639: warning: long long unsigned int format, u64 arg (arg 4)
build2.out:net/dccp/ccids/ccid2.c:674: warning: long long unsigned int format, u64 arg (arg 3)
build2.out:net/dccp/ccids/ccid2.c:720: warning: long long unsigned int format, u64 arg (arg 3)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0e64e94e477f8ed04e9295b11a5898d443c28a47 25-Oct-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk> [DCCP]: Update documentation references.

Updates the references to spec documents throughout the code, taking into
account that

* the DCCP, CCID 2, and CCID 3 drafts all became RFCs in March this year

* RFC 1063 was obsoleted by RFC 1191

* draft-ietf-tcpimpl-pmtud-0x.txt was published as an Informational
RFC, RFC 2923 on 2000-09-22.

All references verified.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3dd9a7c3a155ee96160876cba92439fdc96d7e0b 22-Sep-2006 Ian McDonald <ian.mcdonald@jandi.co.nz> [DCCP]: Use constants for CCIDs

With constants for CCID numbers this now uses them in some places.

Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
593f16aa627d61da447c76ee5a159450174627f6 19-Sep-2006 Andrea Bittau <a.bittau@cs.ucl.ac.uk> [DCCP] CCID2: Add helper functions for changing important CCID2 state

Introduce methods which manipulate interesting congestion control
state such as pipe and rtt estimate. This is useful for people
wishing to monitor the variables of CCID and instrument the code
[perhaps using Kprobes]. Personally, I am a fan of
encapsulation---that justifies this change =D.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
374bcf32c86e1b56eab832bbb6b21e636707eab6 19-Sep-2006 Andrea Bittau <a.bittau@cs.ucl.ac.uk> [DCCP] CCID2: Halve cwnd once upon multiple losses in a single RTT

When multiple losses occur in one RTT, the window should be halved
only once [a single "congestion event"]. This is now implemented,
although not perfectly. Slightly changed the interface for changing
the cwnd: pass hctx instead of dp. This is required in order to allow
for change_cwnd to be called from _init().

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
07978aabd52ce67f59971872c80f76d6e3ca18ae 19-Sep-2006 Andrea Bittau <a.bittau@cs.ucl.ac.uk> [DCCP] CCID2: Allocate seq records on demand

Allocate more sequence state on demand. Each time a packet is sent
out by CCID2, a record of it needs to be kept. This list of records
grows proportionally to cwnd. Previously, the length of this list was
hardcored and therefore the cwnd could only grow to this value (of
128). Now, records are allocated on demand as necessary---cwnd may
grow as it wishes. The exceptional case of when memory is not
available is not handled gracefully. Perhaps, cwnd should be capped
at that point.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8d424f6ca2d02026dadff409770639d720375afb 19-Sep-2006 Andrea Bittau <a.bittau@cs.ucl.ac.uk> [DCCP] CCID2: Add Kconfig option for CCID2 debug

Allow the user to choose whether or not to enable CCID2 debugging via
Kconfig.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
446dec30c7f305ed1bb0092b0a8d9367d842a33f 19-Sep-2006 Andrea Bittau <a.bittau@cs.ucl.ac.uk> [DCCP] CCID2: Tell DCCP to quickly check whether cwnd is available

If not enough cwnd is available, tell the sender to check again as
soon as possible. This will increase CPU utilization (polling
frequently for cwnd) but will improve network performance. That is,
the sender will need to wait less before detecting the increase of
cwnd. A better architecture would be for the CCID to call-back (or
dequeue) from DCCP when it is able to transmit traffic -- not the
other way around as it currently occurs.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d458c25ce24ce00ea547e9976e293e7835416253 19-Sep-2006 Andrea Bittau <a.bittau@cs.ucl.ac.uk> [DCCP] CCID2: Initialize ssthresh to infinity

Initialize the slow-start threshold to infinity. This way, upon connection
initiation, slow-start will be exited only upon a packet loss. This patch will
allow connections to quickly gain speed.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
29651cda97b0a9e4ac0fbeb5ea731a9909f0f128 19-Sep-2006 Andrea Bittau <a.bittau@cs.ucl.ac.uk> [DCCP] CCID2: Fix jiffie wrap issues

Jiffies are now handled correctly (I hope) in CCID2. If they wrap, no
problem.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8e27e4650cb7e73aa4dd97d860539e7605ac7e39 19-Sep-2006 Andrea Bittau <a.bittau@cs.ucl.ac.uk> [DCCP] ackvec: Fix how DCCP_ACKVEC_STATE_NOT_RECEIVED is used

Fix the way state is masked out. DCCP_ACKVEC_STATE_NOT_RECEIVED is
defined as appears in the packet, therefore bit shifting is not
required. This fix allows CCID2 to correctly detect losses.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6ab3d5624e172c553004ecc862bfeac16d9d68b7 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de> Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
c0c736db7ef4a7bdc1a28f3de751cc7e9f720313 21-Mar-2006 Arnaldo Carvalho de Melo <acme@mandriva.com> [DCCP] ccid2: coding style cleanups

No changes in the logic where made.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
91f0ebf7b6d5cb2b6e818d48587566144821babe 21-Mar-2006 Arnaldo Carvalho de Melo <acme@mandriva.com> [DCCP] CCID: Improve CCID infrastructure

1. No need for ->ccid_init nor ->ccid_exit, this is what module_{init,exit}
does and anynways neither ccid2 nor ccid3 were using it.

2. Rename struct ccid to struct ccid_operations and introduce struct ccid
with a pointer to ccid_operations and rigth after it the rx or tx
private state.

3. Remove the pointer to the state of the half connections from struct
dccp_sock, now its derived thru ccid_priv() from the ccid pointer.

Now we also can implement the setsockopt for changing the CCID easily as
no ccid init routines can affect struct dccp_sock in any way that prevents
other CCIDs from working if a CCID switch operation is asked by apps.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
77ff72d528d5b9d30a47f42f364ba34d931f9da3 21-Mar-2006 Andrea Bittau <a.bittau@cs.ucl.ac.uk> [DCCP] CCID2: Drop sock reference count on timer expiration and reset.

There was a hybrid use of standard timers and sk_timers. This caused
the reference count of the sock to be incorrect when resetting the RTO
timer. The sock reference count should now be correct, enabling its
destruction, and allowing the DCCP module to be unloaded.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2a91aa3967398fb94eccc8da67c82bce9f67afdf 21-Mar-2006 Andrea Bittau <a.bittau@cs.ucl.ac.uk> [DCCP] CCID2: Initial CCID2 (TCP-Like) implementation

Original work by Andrea Bittau, Arnaldo Melo cleaned up and fixed several
issues on the merge process.

For now CCID2 was turned the default for all SOCK_DCCP connections, but this
will be remedied soon with the merge of the feature negotiation code.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>