History log of /net/netfilter/ipvs/ip_vs_proto_tcp.c
Revision Date Author Comments
f18ae7206eaebfecc2dd8b017b0d6a0950eabf8b 10-Sep-2014 Julian Anastasov <ja@ssi.bg> ipvs: use the new dest addr family field

Use the new address family field cp->daf when printing
cp->daddr in logs or connection listing.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
c6c96c188336b2b95d5f14facd101f1e4165a9d3 13-Jun-2013 Alexander Frolkin <avf@eldamar.org.uk> ipvs: sloppy TCP and SCTP

This adds support for sloppy TCP and SCTP modes to IPVS.

When enabled (sysctls net.ipv4.vs.sloppy_tcp and
net.ipv4.vs.sloppy_sctp), allows IPVS to create connection state on any
packet, not just a TCP SYN (or SCTP INIT).

This allows connections to fail over from one IPVS director to another
mid-flight.

Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
ac69269a45e84c1772dcb9e77db976a932f4af22 22-Mar-2013 Julian Anastasov <ja@ssi.bg> ipvs: do not disable bh for long time

We used a global BH disable in LOCAL_OUT hook.
Add _bh suffix to all places that need it and remove
the disabling from LOCAL_OUT and sync code.

Functions like ip_defrag need protection from
BH, so add it. As for nf_nat_mangle_tcp_packet, it needs
RCU lock.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
ceec4c3816818459d90c92152e61371ff5b1d5a1 22-Mar-2013 Julian Anastasov <ja@ssi.bg> ipvs: convert services to rcu

This is the final step in RCU conversion.

Things that are removed:

- svc->usecnt: now svc is accessed under RCU read lock
- svc->inc: and some unused code
- ip_vs_bind_pe and ip_vs_unbind_pe: no ability to replace PE
- __ip_vs_svc_lock: replaced with RCU
- IP_VS_WAIT_WHILE: now readers lookup svcs and dests under
RCU and work in parallel with configuration

Other changes:

- before now, a RCU read-side critical section included the
calling of the schedule method, now it is extended to include
service lookup
- ip_vs_svc_table and ip_vs_svc_fwm_table are now using hlist
- svc->pe and svc->scheduler remain to the end (of grace period),
the schedulers are prepared for such RCU readers
even after done_service is called but they need
to use synchronize_rcu because last ip_vs_scheduler_put
can happen while RCU read-side critical sections
use an outdated svc->scheduler pointer
- as planned, update_service is removed
- empty services can be freed immediately after grace period.
If dests were present, the services are freed from
the dest trash code

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
363c97d7435ebba8a040f86e29bdec79ee182f0c 21-Mar-2013 Julian Anastasov <ja@ssi.bg> ipvs: convert app locks

We use locks like tcp_app_lock, udp_app_lock,
sctp_app_lock to protect access to the protocol hash tables
from readers in packet context while the application
instances (inc) are [un]registered under global mutex.

As the hash tables are mostly read when conns are
created and bound to app, use RCU for readers and reclaim
app instance after grace period.

Simplify ip_vs_app_inc_get because we use usecnt
only for statistics and rely on module refcounting.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
d4383f04d145cce8b855c463f40020639ef83ea0 26-Sep-2012 Jesper Dangaard Brouer <brouer@redhat.com> ipvs: API change to avoid rescan of IPv6 exthdr

Reduce the number of times we scan/skip the IPv6 exthdrs.

This patch contains a lot of API changes. This is done, to avoid
repeating the scan of finding the IPv6 headers, via ipv6_find_hdr(),
which is called by ip_vs_fill_iph_skb().

Finding the IPv6 headers is done as early as possible, and passed on
as a pointer "struct ip_vs_iphdr *" to the affected functions.

This patch reduce/removes 19 calls to ip_vs_fill_iph_skb().

Notice, I have choosen, not to change the API of function
pointer "(*schedule)" (in struct ip_vs_scheduler) as it can be
used by external schedulers, via {un,}register_ip_vs_scheduler.
Only 4 out of 10 schedulers use info from ip_vs_iphdr*, and when
they do, they are only interested in iph->{s,d}addr.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
63dca2c0b0e7a92cb39d1b1ecefa32ffda201975 26-Sep-2012 Jesper Dangaard Brouer <brouer@redhat.com> ipvs: Fix faulty IPv6 extension header handling in IPVS

IPv6 packets can contain extension headers, thus its wrong to assume
that the transport/upper-layer header, starts right after (struct
ipv6hdr) the IPv6 header. IPVS uses this false assumption, and will
write SNAT & DNAT modifications at a fixed pos which will corrupt the
message.

To fix this, proper header position must be found before modifying
packets. Introducing ip_vs_fill_iph_skb(), which uses ipv6_find_hdr()
to skip the exthdrs. It finds (1) the transport header offset, (2) the
protocol, and (3) detects if the packet is a fragment.

Note, that fragments in IPv6 is represented via an exthdr. Thus, this
is detected while skipping through the exthdrs.

This patch depends on commit 84018f55a:
"netfilter: ip6_tables: add flags parameter to ipv6_find_hdr()"
This also adds a dependency to ip6_tables.

Originally based on patch from: Hans Schillstrom

kABI notes:
Changing struct ip_vs_iphdr is a potential minor kABI breaker,
because external modules can be compiled with another version of
this struct. This should not matter, as they would most-likely
be using a compiled-in version of ip_vs_fill_iphdr(). When
recompiled, they will notice ip_vs_fill_iphdr() no longer exists,
and they have to used ip_vs_fill_iph_skb() instead.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
582b8e3eadaec77788c1aa188081a8d5059c42a6 26-Apr-2012 Hans Schillstrom <hans.schillstrom@ericsson.com> ipvs: take care of return value from protocol init_netns

ip_vs_create_timeout_table() can return NULL
All functions protocol init_netns is affected of this patch.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
4a516f1108070db94dbfc88c80b8b6942915f1f2 16-Sep-2011 Simon Horman <horms@verge.net.au> ipvs: Remove unused return value of protocol state transitions

Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
a0840e2e165a370ca24a59545e564e9881a55891 03-Jan-2011 Hans Schillstrom <hans.schillstrom@ericsson.com> IPVS: netns, ip_vs_ctl local vars moved to ipvs struct.

Moving global vars to ipvs struct, except for svc table lock.
Next patch for ctl will be drop-rate handling.

*v3
__ip_vs_mutex remains global
ip_vs_conntrack_enabled(struct netns_ipvs *ipvs)

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
6e67e586e7289c144d5a189d6e0fa7141d025746 03-Jan-2011 Hans Schillstrom <hans.schillstrom@ericsson.com> IPVS: netns, connection hash got net as param.

Connection hash table is now name space aware.
i.e. net ptr >> 8 is xor:ed to the hash,
and this is the first param to be compared.
The net struct is 0xa40 in size ( a little bit smaller for 32 bit arch:s)
and cache-line aligned, so a ptr >> 5 might be a more clever solution ?

All lookups where net is compared uses net_eq() which returns 1 when netns
is disabled, and the compiler seems to do something clever in that case.

ip_vs_conn_fill_param() have *net as first param now.

Three new inlines added to keep conn struct smaller
when names space is disabled.
- ip_vs_conn_net()
- ip_vs_conn_net_set()
- ip_vs_conn_net_eq()

*v3
moved net compare to the end in "fast path"

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
ab8a5e8408c3df2d654611bffc3aaf04f418b266 03-Jan-2011 Hans Schillstrom <hans.schillstrom@ericsson.com> IPVS: netns awareness to ip_vs_app

All variables moved to struct ipvs,
most external changes fixed (i.e. init_net removed)

in ip_vs_protocol param struct net *net added to:
- register_app()
- unregister_app()
This affected almost all proto_xxx.c files

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
9bbac6a904d0816dae58b454692c54d6773cc20d 03-Jan-2011 Hans Schillstrom <hans.schillstrom@ericsson.com> IPVS: netns, common protocol changes and use of appcnt.

appcnt and timeout_table moved from struct ip_vs_protocol to
ip_vs proto_data.

struct net *net added as first param to
- register_app()
- unregister_app()
- app_conn_bind()
- ip_vs_conn_new()

[horms@verge.net.au: removed cosmetic-change-only hunk]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
9330419d9aa4f97df412ac9be9fc0388c67dd315 03-Jan-2011 Hans Schillstrom <hans.schillstrom@ericsson.com> IPVS: netns, use ip_vs_proto_data as param.

ip_vs_protocol *pp is replaced by ip_vs_proto_data *pd in
function call in ip_vs_protocol struct i.e. :,
- timeout_change()
- state_transition()

ip_vs_protocol_timeout_change() got ipvs as param, due to above
and a upcoming patch - defence work

Most of this changes are triggered by Julians comment:
"tcp_timeout_change should work with the new struct ip_vs_proto_data
so that tcp_state_table will go to pd->state_table
and set_tcp_state will get pd instead of pp"

*v3
Mostly comments from Julian
The pp -> pd conversion should start from functions like
ip_vs_out() that use pp = ip_vs_proto_get(iph.protocol),
now they should use ip_vs_proto_data_get(net, iph.protocol).
conn_in_get() and conn_out_get() unused param *pp, removed.

*v4
ip_vs_protocol_timeout_change() walk the proto_data path.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
4a85b96c08ef84076f84e87280223a4301988ed9 03-Jan-2011 Hans Schillstrom <hans.schillstrom@ericsson.com> IPVS: netns preparation for proto_tcp

In this phase (one), all local vars will be moved to ipvs struct.

Remaining work, add param struct net *net to a couple of
functions that is common for all protos and use all
ip_vs_proto_data

*v3
Removed unused function as sugested by Simon

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
fc723250c9cb046cc19833a2b1c4309bbf59ac36 03-Jan-2011 Hans Schillstrom <hans.schillstrom@ericsson.com> IPVS: netns to services part 1

Services hash tables got netns ptr a hash arg,
While Real Servers (rs) has been moved to ipvs struct.
Two new inline functions added to get net ptr from skb.

Since ip_vs is called from different contexts there is two
places to dig for the net ptr skb->dev or skb->sk
this is handled in skb_net() and skb_sknet()

Global functions, ip_vs_service_get() ip_vs_lookup_real_service()
etc have got struct net *net as first param.
If possible get net ptr skb etc,
- if not &init_net is used at this early stage of patching.

ip_vs_ctl.c procfs not ready for netns yet.

*v3
Comments by Julian
- __ip_vs_service_find and __ip_vs_svc_fwm_find are fast path,
net_eq(svc->net, net) so the check is at the end now.
- net = skb_net(skb) in ip_vs_out moved after check for skb_dst.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
a5959d53d6048a56103ee0ade1eb6f2c0c733b1d 19-Nov-2010 Hans Schillstrom <hans.schillstrom@ericsson.com> IPVS: Handle Scheduling errors.

If ip_vs_conn_fill_param_persist return an error to ip_vs_sched_persist,
this error must propagate as ignored=-1 to ip_vs_schedule().
Errors from ip_vs_conn_new() in ip_vs_sched_persist() and ip_vs_schedule()
should also return *ignored=-1;

This patch just relies on the fact that ignored is 1 before calling
ip_vs_sched_persist().

Sent from Julian:
"The new case when ip_vs_conn_fill_param_persist fails
should set *ignored = -1, so that we can use NF_DROP,
see below. *ignored = -1 should be also used for ip_vs_conn_new
failure in ip_vs_sched_persist() and ip_vs_schedule().
The new negative value should be handled in tcp,udp,sctp"

"To summarize:

- *ignored = 1:
protocol tried to schedule (eg. on SYN), found svc but the
svc/scheduler decides that this packet should be accepted with
NF_ACCEPT because it must not be scheduled.

- *ignored = 0:
scheduler can not find destination, so try bypass or
return ICMP and then NF_DROP (ip_vs_leave).

- *ignored = -1:
scheduler tried to schedule but fatal error occurred, eg.
ip_vs_conn_new failure (ENOMEM) or ip_vs_sip_fill_param
failure such as missing Call-ID, ENOMEM on skb_linearize
or pe_data. In this case we should return NF_DROP without
any attempts to send ICMP with ip_vs_leave."

More or less all ideas and input to this patch is work from
Julian Anastasov

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
0d79641a96d612aaa6d57a4d4f521d7ed9c9ccdd 17-Oct-2010 Julian Anastasov <ja@ssi.bg> ipvs: provide address family for debugging

As skb->protocol is not valid in LOCAL_OUT add
parameter for address family in packet debugging functions.
Even if ports are not present in AH and ESP change them to
use ip_vs_tcpudp_debug_packet to show at least valid addresses
as before. This patch removes the last user of skb->protocol
in IPVS.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
190ecd27cd7294105e3b26ca71663c7d940acbbb 17-Oct-2010 Julian Anastasov <ja@ssi.bg> ipvs: do not schedule conns from real servers

This patch is needed to avoid scheduling of
packets from local real server when we add ip_vs_in
in LOCAL_OUT hook to support local client.

Currently, when ip_vs_in can not find existing
connection it tries to create new one by calling ip_vs_schedule.

The default indication from ip_vs_schedule was if
connection was scheduled to real server. If real server is
not available we try to use the bypass forwarding method
or to send ICMP error. But in some cases we do not want to use
the bypass feature. So, add flag 'ignored' to indicate if
the scheduler ignores this packet.

Make sure we do not create new connections from replies.
We can hit this problem for persistent services and local real
server when ip_vs_in is added to LOCAL_OUT hook to handle
local clients.

Also, make sure ip_vs_schedule ignores SYN packets
for Active FTP DATA from local real server. The FTP DATA
connection should be created on SYN+ACK from client to assign
correct connection daddr.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
8b27b10f5863a5b63e46304a71aa01463d1efac4 17-Oct-2010 Julian Anastasov <ja@ssi.bg> ipvs: optimize checksums for apps

Avoid full checksum calculation for apps that can provide
info whether csum was broken after payload mangling. For now only
ip_vs_ftp mangles payload and it updates the csum, so the full
recalculation is avoided for all packets.

Add CHECKSUM_UNNECESSARY for snat_handler (TCP and UDP).
It is needed to support SNAT from local address for the case
when csum is fully recalculated.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
5bc9068e9d962ca6b8bec3f0eb6f60ab4dee1d04 17-Oct-2010 Julian Anastasov <ja@ssi.bg> ipvs: fix CHECKSUM_PARTIAL for TCP, UDP

Fix CHECKSUM_PARTIAL handling. Tested for IPv4 TCP,
UDP not tested because it needs network card with HW CSUM support.
May be fixes problem where IPVS can not be used in virtual boxes.
Problem appears with DNAT to local address when the local stack
sends reply in CHECKSUM_PARTIAL mode.

Fix tcp_dnat_handler and udp_dnat_handler to provide
vaddr and daddr in right order (old and new IP) when calling
tcp_partial_csum_update/udp_partial_csum_update (CHECKSUM_PARTIAL).

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
5c0d2374a16fcb52096df914ee57720987677be5 02-Aug-2010 Simon Horman <horms@verge.net.au> ipvs: provide default ip_vs_conn_{in,out}_get_proto

This removes duplicate code by providing a default implementation
which is used by 3 of the 4 modules that provide these call.

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
36cbd3dcc10384f813ec0814255f576c84f2bcd4 05-Aug-2009 Jan Engelhardt <jengelh@medozas.de> net: mark read-only arrays as const

String literals are constant, and usually, we can also tag the array
of pointers const too, moving it to the .rodata section.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
1e3e238e9c4bf9987b19185235cd0cdc21ea038c 02-Aug-2009 Hannes Eder <heder@google.com> IPVS: use pr_err and friends instead of IP_VS_ERR and friends

Since pr_err and friends are used instead of printk there is no point
in keeping IP_VS_ERR and friends. Furthermore make use of '__func__'
instead of hard coded function names.

Signed-off-by: Hannes Eder <heder@google.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
9aada7ac047f789ffb27540cc1695989897b2dfe 30-Jul-2009 Hannes Eder <heder@google.com> IPVS: use pr_fmt

While being at it cleanup whitespace.

Signed-off-by: Hannes Eder <heder@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ca62059b7ec7267d1d2cab0791d3ed6d033e0583 07-Nov-2008 Harvey Harrison <harvey.harrison@gmail.com> ipvs: oldlen, newlen should be be16, not be32

Noticed by sparse:
net/netfilter/ipvs/ip_vs_proto_tcp.c:195:6: warning: incorrect type in argument 5 (different base types)
net/netfilter/ipvs/ip_vs_proto_tcp.c:195:6: expected restricted __be16 [usertype] oldlen
net/netfilter/ipvs/ip_vs_proto_tcp.c:195:6: got restricted __be32 [usertype] <noident>
net/netfilter/ipvs/ip_vs_proto_tcp.c:196:6: warning: incorrect type in argument 6 (different base types)
net/netfilter/ipvs/ip_vs_proto_tcp.c:196:6: expected restricted __be16 [usertype] newlen
net/netfilter/ipvs/ip_vs_proto_tcp.c:196:6: got restricted __be32 [usertype] <noident>
net/netfilter/ipvs/ip_vs_proto_tcp.c:270:6: warning: incorrect type in argument 5 (different base types)
net/netfilter/ipvs/ip_vs_proto_tcp.c:270:6: expected restricted __be16 [usertype] oldlen
net/netfilter/ipvs/ip_vs_proto_tcp.c:270:6: got restricted __be32 [usertype] <noident>
net/netfilter/ipvs/ip_vs_proto_tcp.c:271:6: warning: incorrect type in argument 6 (different base types)
net/netfilter/ipvs/ip_vs_proto_tcp.c:271:6: expected restricted __be16 [usertype] newlen
net/netfilter/ipvs/ip_vs_proto_tcp.c:271:6: got restricted __be32 [usertype] <noident>
net/netfilter/ipvs/ip_vs_proto_udp.c:206:6: warning: incorrect type in argument 5 (different base types)
net/netfilter/ipvs/ip_vs_proto_udp.c:206:6: expected restricted __be16 [usertype] oldlen
net/netfilter/ipvs/ip_vs_proto_udp.c:206:6: got restricted __be32 [usertype] <noident>
net/netfilter/ipvs/ip_vs_proto_udp.c:207:6: warning: incorrect type in argument 6 (different base types)
net/netfilter/ipvs/ip_vs_proto_udp.c:207:6: expected restricted __be16 [usertype] newlen
net/netfilter/ipvs/ip_vs_proto_udp.c:207:6: got restricted __be32 [usertype] <noident>
net/netfilter/ipvs/ip_vs_proto_udp.c:282:6: warning: incorrect type in argument 5 (different base types)
net/netfilter/ipvs/ip_vs_proto_udp.c:282:6: expected restricted __be16 [usertype] oldlen
net/netfilter/ipvs/ip_vs_proto_udp.c:282:6: got restricted __be32 [usertype] <noident>
net/netfilter/ipvs/ip_vs_proto_udp.c:283:6: warning: incorrect type in argument 6 (different base types)
net/netfilter/ipvs/ip_vs_proto_udp.c:283:6: expected restricted __be16 [usertype] newlen
net/netfilter/ipvs/ip_vs_proto_udp.c:283:6: got restricted __be32 [usertype] <noident>

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
cb7f6a7b716e801097b564dec3ccb58d330aef56 19-Sep-2008 Julius Volz <juliusv@google.com> IPVS: Move IPVS to net/netfilter/ipvs

Since IPVS now has partial IPv6 support, this patch moves IPVS from
net/ipv4/ipvs to net/netfilter/ipvs. It's a result of:

$ git mv net/ipv4/ipvs net/netfilter

and adapting the relevant Kconfigs/Makefiles to the new path.

Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>