History log of /libcore/luni/src/test/java/libcore/java/net/ServerSocketConcurrentCloseTest.java
Revision Date Author Comments
9315d4ae2857cc7f1292d906d9352d8df448970f 05-Sep-2017 Tobias Thierer <tobiast@google.com> ServerSocketConcurrentCloseTest: license header

This file was originally added by myself in commit
bcb10462e854ba7172bab840329b6d3126a2fdd4 (May 2016).

Test: Treehugger
Change-Id: I2dee98752d91e6f6df466eac97fcaf57609c9590
f1866bd83f2c05e3e0bd52ed40296edf2599f28e 25-Aug-2017 Tobias Thierer <tobiast@google.com> Attempt to de-flake ServerSocketConcurrentCloseTest

Relax non-critical assertion that sometimes fails on
old devices and for which investigation has not revealed
any serious underlying cause.

The test is running 100 iterations; during each iteration,
separate server and client threads repeatedly connect to
each other, until they're shut down after 50msec.

On some systems, the test runs out of sockets/connections
since they're still stuck in TIMED_WAIT state, leading
to later iterations failing to achieve any connections.
The test had an assertion that connections were achieved
in *each* iteration, and that assertion would fail on such
systems.

Since the assertion was only ever intended to guard in a
bug in the test, rather than a bug in the code under test,
this CL relaxes that assertion to only assert that *some*
(currently >= 5) out of the 100 iterations achieve any
connections.

The main purpose of the test, ensuring that the server
thread does not block indefinitely when the ServerSocket
is closed, is unaffected.

Bug: 65024353
Test: ran ServerSocketConcurrentCloseTest once (on Pixel XL).

(cherry picked from commit 1cbfa9d8393e3ce68706a3d221462afe69db7f97)

Change-Id: I8c11954bc6374850fcb2962a352fa7683de48308
1cbfa9d8393e3ce68706a3d221462afe69db7f97 25-Aug-2017 Tobias Thierer <tobiast@google.com> Attempt to de-flake ServerSocketConcurrentCloseTest

Relax non-critical assertion that sometimes fails on
old devices and for which investigation has not revealed
any serious underlying cause.

The test is running 100 iterations; during each iteration,
separate server and client threads repeatedly connect to
each other, until they're shut down after 50msec.

On some systems, the test runs out of sockets/connections
since they're still stuck in TIMED_WAIT state, leading
to later iterations failing to achieve any connections.
The test had an assertion that connections were achieved
in *each* iteration, and that assertion would fail on such
systems.

Since the assertion was only ever intended to guard in a
bug in the test, rather than a bug in the code under test,
this CL relaxes that assertion to only assert that *some*
(currently >= 5) out of the 100 iterations achieve any
connections.

The main purpose of the test, ensuring that the server
thread does not block indefinitely when the ServerSocket
is closed, is unaffected.

Bug: 65024353
Test: ran ServerSocketConcurrentCloseTest once (on Pixel XL).
Change-Id: I247509ff671a0e0dd86c79805346e76a00761230
7177fa7f9d1f256019fabb85c56d192237d2fe34 05-Jul-2017 Tobias Thierer <tobiast@google.com> Fix early termination in ServerSocketConcurrentCloseTest

Commit 7956ee1980b5da2ede5d0555cabe5755fcaef813 added a fallback
to give the server/client threads an extra couple of 50msec blocks
per iteration.

It was intended that those extra blocks would only be given when
the threads didn't manage to make any connections in the first
50msec; but the value was read before it was written (after the
socket is closed) so the extra 50msec blocks were always given,
by accident. This exhausted sockets on some systems.

This CL fix this by incrementing the connection counter during
the test run.

Bug: 63311387
Test: ServerSocketConcurrentCloseTest

Change-Id: I706d5db09998a7fae0ceadd3dce8aedbaa6a48f4
cecbeeb99fe09060d719af26c1caa4e35117e46d 05-Jul-2017 Kazuhiro Inaba <kinaba@google.com> DO NOT MERGE: Stabilize testConcurrentServerSocketCloseReliablyThrows.

As commented inside the test code, iterating this test too many times
will make it flaky by piling up sockets in TIME_WAIT state and eventually
failing. Reducing the iteration number mitigates the issue.

The iteration number is already reduced in the master branch for
speeding up the test: http://r.android.com/258101/. The current CL
extracts from the master CL only the part relevant to the flaky test.

Bug: 63265871
Test: CtsLibcoreTestCases (on affected device)
Change-Id: I9b121e180835e58a6ed419f78d0619418b1030fa
bbea315f534b92d98566afd18ef61ccd198bd5b1 27-Jun-2017 Tobias Thierer <tobiast@google.com> Work around ServerSocketConcurrentCloseTest flakiness

The test involves 100 iterations of a client and server thread
connecting to each other and disconnecting over and over, until
the ServerSocket involved is closed at the end of the iteration.

Before this CL, each iteration took 50msec (counted from when
the threads signaled that they had started running by counting
down a latch. On recent builds on some devices, the two threads
didn't manage to establish any connection within those 50msec
but only when other tests had run previously. The exact cause
for this was not determined, but threads left over from over
from previous tests appeared to not be involved.

This CL increases the maximum time for each iteration to a maximum
of 150msec (in steps of 50msec, until at least one connection was
established), which makes the test pass (5/5 tries) on a device
where it previously consistently failed (15/16 tries failed).

Test: CtsLibcoreTestCases (on affected device)
Bug: 62859398

(cherry picked from commit 7956ee1980b5da2ede5d0555cabe5755fcaef813)

Change-Id: I19368fa2cac3b43f4c361b785c9ff3202c87652f
7956ee1980b5da2ede5d0555cabe5755fcaef813 27-Jun-2017 Tobias Thierer <tobiast@google.com> Work around ServerSocketConcurrentCloseTest flakiness

The test involves 100 iterations of a client and server thread
connecting to each other and disconnecting over and over, until
the ServerSocket involved is closed at the end of the iteration.

Before this CL, each iteration took 50msec (counted from when
the threads signaled that they had started running by counting
down a latch. On recent builds on some devices, the two threads
didn't manage to establish any connection within those 50msec
but only when other tests had run previously. The exact cause
for this was not determined, but threads left over from over
from previous tests appeared to not be involved.

This CL increases the maximum time for each iteration to a maximum
of 150msec (in steps of 50msec, until at least one connection was
established), which makes the test pass (5/5 tries) on a device
where it previously consistently failed (15/16 tries failed).

Test: CtsLibcoreTestCases (on affected device)
Bug: 62859398

Change-Id: I60eb30e9aa96ab1c0d2b70a74d38757c2bb84825
5381c95c802b9ddbfd9b795ca5823c332898c73d 17-Feb-2017 Tobias Thierer <tobiast@google.com> Attempt to de-flaky ServerSocketConcurrentCloseTest

The test runs 100 iterations, each asserting that connections
between a server and client thread are made within 50msec.
Rarely, this assertion fails in one of those iterations.

Before this CL, the test was counting the time for the server
and client Threads to start up towards those 50msec, which is
unreasonable since no connections can be made while the threads
are still starting.

This CL modifies the test to allow the threads 1 second to start,
which is not counted towards the 50msec.

Test: Ran 10,000 iterations (the equivalent of running
ServerSocketConcurrentCloseTest 100 times in a row)
Bug: 35463422

Change-Id: I04ff834661ecc96b5328d82926ef766bb30e5433
5e5169a394fd431609c3419bb4052e71dc73fed3 17-Aug-2016 Tobias Thierer <tobiast@google.com> Speed up needlessly slow libcore CTS tests

On a Nexus 6P,
run cts -p android.core.tests.libcore.package.libcore
currently runs for about 20 minutes. This consists of
two runs (arm32 and arm64 runtimes) that involved 587sec
each spent running the set of test methods.

This CL speeds up the test methods that ran the longest
(11-21sec per method) as measured by comparing wallclock
timestamps in a CTS run. The speedup is about 10% (59sec
faster per run, to now 528sec per run).

The speedup is achieved by Thread.sleep()ing for shorter
in tests that slept for what appeared like excessive
amounts of time, and by decreasing data sizes / loop counts
for probabilistic tests for race conditions. There was no
evidence apparent that these had found any bugs in years,
so needing to run them more often to find a potential bug
appears like a fair trade-off for a 10% speedup.

Test: run cts -p android.core.tests.libcore.package.libcore
(ran this three times)

Bug: 30948560

Change-Id: I9b28ce60268746b3a5edfc5b94dcc2ec29c8feca
79a03395aa123f547cc4507a73e57fd4afa625b9 22-Jun-2016 Tobias Thierer <tobiast@google.com> Let testImplAccept_detectsClosedState() fail quickly when it fails.

Previously this test relied on the test runner enforcing a reasonable
timeout, which sometimes was not the case (e.g. bug 28998288).

Note that this test should only fail on devices that exhibited
bug 27763633, which was fixed in May.

Tested:
- Checked that the test still passes.
- Didn't verify that the test still fails on a device with the bug.
I tried against NRD49C but it seemed like the current AOSP adb
doesn't want to talk to that build's adb.

Bug: 29030362
Change-Id: I0e25c1be52afaae207e7701bbccdfc681cb0a458
e765ba60627643781097c8b5f5cb39785c8bd61d 22-Jun-2016 Tobias Thierer <tobiast@google.com> Fix ServerSocketConcurrentCloseTest's package

The test was in package java.net while everything else in the
same directory was in libcore.java.net. I'm surprised this
even compiled.

Tested:
cts-tradefed run cts -c libcore.java.net.ServerSocketConcurrentCloseTest

Change-Id: Ie2c96bb3aa1326fe2b9257822ddb6bfe69dc1f46
46b99406f674c5b4fda9820b6372e24b496c529d 16-May-2016 Tobias Thierer <tobiast@google.com> Fix race condition in accept() on socket that is concurrently closed.

This CL is the result of long digging into the rabbit hole after
observing a flaky test. I've added a unit test for behavior at
the ServerSocket level, although the root cause of the bug sits
lower.

ServerSocket.accept() checks for a closed connection at the Java
level before descending into deeper (including native) levels.
The problem was that if the socket was closed() concurrently
*after* this initial check at the Java level, later native code
didn't deal with the closed state correctly and could block
indefinitely.

Specifically, a closed socket internally sets the value of its
fd field to -1. Native code reads that field via

(*env)->GetIntField(env, fdObj, IO_fd_fdID);

and passes it to NET_Timeout, which then blocks. Other places
in the code read the same field and pass it to NET_Accept and
NET_Connect, which do not block when they get an fd value of -1.

There are 7 places that I found which call NET_Timeout, and they
all get their fd value from the Java field using the code snippet
above. NET_Timeout is documented to be a "Wrapper for
poll(s, timeout)". However, it was never a particularly thin
wrapper: unlike NET_Poll, NET_Timeout already has a bunch of
extra logic on top of the poll syscall, namely ignoring other
signals and adjusting the timeout. All of this suggests that
NET_Timeout is the right place to handle the case of fd AKA s
being < 0.

As discussed on the bug, observability in the sense of
ServerSocketConcurrentCloseTest failing was caused by:

https://android.googlesource.com/platform/libcore/+/eeb3b74

It's therefore probably a good idea to investigate whether our
current mechanics for closing sockets are the right ones.
Establishing correctness of such an alternative change would
require very careful analysis, though. Therefore I went for
this fix instead, which is much easier to verify as safe and
corrects all observable issues that we currently know about.

Tested:
- Ran the test many times from Android Studio (adjusted
test's packageName). This is the only test that I also did
on devices without the fix.
- on build with the fix, the test passes 25/25 times.
- on build without the fix, the test fails 12/15 times.
- Also ran a couple of times from CTS (25181x pass, 19x skipped):
make cts && cts-tradefed run cts -m CtsLibcoreTestCases \
--test java.net.ServerSocketConcurrentCloseTest
- Ran all of the libcore tests once through CTS
(25181 tests pass, 19 tests not run):
make cts && cts-tradefed run cts -m CtsLibcoreTestCases
- make cts && cts-tradefed run cts -m CtsNetTestCases

Tests were run on a Nexus 6P. I previously verified that the
bug also occurs on Nexus 5X and Nexus 6 running N.

(cherry picked from commit bcb10462e854ba7172bab840329b6d3126a2fdd4)

Bug: 27763633
Change-Id: I074acebe3d3c8a252bf5107bf45be9589d098c6a
bcb10462e854ba7172bab840329b6d3126a2fdd4 16-May-2016 Tobias Thierer <tobiast@google.com> Fix race condition in accept() on socket that is concurrently closed.

This CL is the result of long digging into the rabbit hole after
observing a flaky test. I've added a unit test for behavior at
the ServerSocket level, although the root cause of the bug sits
lower.

ServerSocket.accept() checks for a closed connection at the Java
level before descending into deeper (including native) levels.
The problem was that if the socket was closed() concurrently
*after* this initial check at the Java level, later native code
didn't deal with the closed state correctly and could block
indefinitely.

Specifically, a closed socket internally sets the value of its
fd field to -1. Native code reads that field via

(*env)->GetIntField(env, fdObj, IO_fd_fdID);

and passes it to NET_Timeout, which then blocks. Other places
in the code read the same field and pass it to NET_Accept and
NET_Connect, which do not block when they get an fd value of -1.

There are 7 places that I found which call NET_Timeout, and they
all get their fd value from the Java field using the code snippet
above. NET_Timeout is documented to be a "Wrapper for
poll(s, timeout)". However, it was never a particularly thin
wrapper: unlike NET_Poll, NET_Timeout already has a bunch of
extra logic on top of the poll syscall, namely ignoring other
signals and adjusting the timeout. All of this suggests that
NET_Timeout is the right place to handle the case of fd AKA s
being < 0.

As discussed on the bug, observability in the sense of
ServerSocketConcurrentCloseTest failing was caused by:

https://android.googlesource.com/platform/libcore/+/eeb3b74

It's therefore probably a good idea to investigate whether our
current mechanics for closing sockets are the right ones.
Establishing correctness of such an alternative change would
require very careful analysis, though. Therefore I went for
this fix instead, which is much easier to verify as safe and
corrects all observable issues that we currently know about.

Tested:
- Ran the test many times from Android Studio (adjusted
test's packageName). This is the only test that I also did
on devices without the fix.
- on build with the fix, the test passes 25/25 times.
- on build without the fix, the test fails 12/15 times.
- Also ran a couple of times from CTS (25181x pass, 19x skipped):
make cts && cts-tradefed run cts -m CtsLibcoreTestCases \
--test java.net.ServerSocketConcurrentCloseTest
- Ran all of the libcore tests once through CTS
(25181 tests pass, 19 tests not run):
make cts && cts-tradefed run cts -m CtsLibcoreTestCases
- make cts && cts-tradefed run cts -m CtsNetTestCases

Tests were run on a Nexus 6P. I previously verified that the
bug also occurs on Nexus 5X and Nexus 6 running N.

Bug: 27763633
Change-Id: I074acebe3d3c8a252bf5107bf45be9589d098c6a