Cross Reference: /libcore/luni/src/test/java/libcore/java/net/ServerSocketConcurrentCloseTest.java

History log of /libcore/luni/src/test/java/libcore/java/net/ServerSocketConcurrentCloseTest.java
Revision	Date	Author	Comments
9315d4ae2857cc7f1292d906d9352d8df448970f	05-Sep-2017	Tobias Thierer <tobiast@google.com>	ServerSocketConcurrentCloseTest: license header This file was originally added by myself in commit bcb10462e854ba7172bab840329b6d3126a2fdd4 (May 2016). Test: Treehugger Change-Id: I2dee98752d91e6f6df466eac97fcaf57609c9590
f1866bd83f2c05e3e0bd52ed40296edf2599f28e	25-Aug-2017	Tobias Thierer <tobiast@google.com>	Attempt to de-flake ServerSocketConcurrentCloseTest Relax non-critical assertion that sometimes fails on old devices and for which investigation has not revealed any serious underlying cause. The test is running 100 iterations; during each iteration, separate server and client threads repeatedly connect to each other, until they're shut down after 50msec. On some systems, the test runs out of sockets/connections since they're still stuck in TIMED_WAIT state, leading to later iterations failing to achieve any connections. The test had an assertion that connections were achieved in each iteration, and that assertion would fail on such systems. Since the assertion was only ever intended to guard in a bug in the test, rather than a bug in the code under test, this CL relaxes that assertion to only assert that some (currently >= 5) out of the 100 iterations achieve any connections. The main purpose of the test, ensuring that the server thread does not block indefinitely when the ServerSocket is closed, is unaffected. Bug: 65024353 Test: ran ServerSocketConcurrentCloseTest once (on Pixel XL). (cherry picked from commit 1cbfa9d8393e3ce68706a3d221462afe69db7f97) Change-Id: I8c11954bc6374850fcb2962a352fa7683de48308
1cbfa9d8393e3ce68706a3d221462afe69db7f97	25-Aug-2017	Tobias Thierer <tobiast@google.com>	Attempt to de-flake ServerSocketConcurrentCloseTest Relax non-critical assertion that sometimes fails on old devices and for which investigation has not revealed any serious underlying cause. The test is running 100 iterations; during each iteration, separate server and client threads repeatedly connect to each other, until they're shut down after 50msec. On some systems, the test runs out of sockets/connections since they're still stuck in TIMED_WAIT state, leading to later iterations failing to achieve any connections. The test had an assertion that connections were achieved in each iteration, and that assertion would fail on such systems. Since the assertion was only ever intended to guard in a bug in the test, rather than a bug in the code under test, this CL relaxes that assertion to only assert that some (currently >= 5) out of the 100 iterations achieve any connections. The main purpose of the test, ensuring that the server thread does not block indefinitely when the ServerSocket is closed, is unaffected. Bug: 65024353 Test: ran ServerSocketConcurrentCloseTest once (on Pixel XL). Change-Id: I247509ff671a0e0dd86c79805346e76a00761230
7177fa7f9d1f256019fabb85c56d192237d2fe34	05-Jul-2017	Tobias Thierer <tobiast@google.com>	Fix early termination in ServerSocketConcurrentCloseTest Commit 7956ee1980b5da2ede5d0555cabe5755fcaef813 added a fallback to give the server/client threads an extra couple of 50msec blocks per iteration. It was intended that those extra blocks would only be given when the threads didn't manage to make any connections in the first 50msec; but the value was read before it was written (after the socket is closed) so the extra 50msec blocks were always given, by accident. This exhausted sockets on some systems. This CL fix this by incrementing the connection counter during the test run. Bug: 63311387 Test: ServerSocketConcurrentCloseTest Change-Id: I706d5db09998a7fae0ceadd3dce8aedbaa6a48f4
cecbeeb99fe09060d719af26c1caa4e35117e46d	05-Jul-2017	Kazuhiro Inaba <kinaba@google.com>	DO NOT MERGE: Stabilize testConcurrentServerSocketCloseReliablyThrows. As commented inside the test code, iterating this test too many times will make it flaky by piling up sockets in TIME_WAIT state and eventually failing. Reducing the iteration number mitigates the issue. The iteration number is already reduced in the master branch for speeding up the test: http://r.android.com/258101/. The current CL extracts from the master CL only the part relevant to the flaky test. Bug: 63265871 Test: CtsLibcoreTestCases (on affected device) Change-Id: I9b121e180835e58a6ed419f78d0619418b1030fa
bbea315f534b92d98566afd18ef61ccd198bd5b1	27-Jun-2017	Tobias Thierer <tobiast@google.com>	Work around ServerSocketConcurrentCloseTest flakiness The test involves 100 iterations of a client and server thread connecting to each other and disconnecting over and over, until the ServerSocket involved is closed at the end of the iteration. Before this CL, each iteration took 50msec (counted from when the threads signaled that they had started running by counting down a latch. On recent builds on some devices, the two threads didn't manage to establish any connection within those 50msec but only when other tests had run previously. The exact cause for this was not determined, but threads left over from over from previous tests appeared to not be involved. This CL increases the maximum time for each iteration to a maximum of 150msec (in steps of 50msec, until at least one connection was established), which makes the test pass (5/5 tries) on a device where it previously consistently failed (15/16 tries failed). Test: CtsLibcoreTestCases (on affected device) Bug: 62859398 (cherry picked from commit 7956ee1980b5da2ede5d0555cabe5755fcaef813) Change-Id: I19368fa2cac3b43f4c361b785c9ff3202c87652f
7956ee1980b5da2ede5d0555cabe5755fcaef813	27-Jun-2017	Tobias Thierer <tobiast@google.com>	Work around ServerSocketConcurrentCloseTest flakiness The test involves 100 iterations of a client and server thread connecting to each other and disconnecting over and over, until the ServerSocket involved is closed at the end of the iteration. Before this CL, each iteration took 50msec (counted from when the threads signaled that they had started running by counting down a latch. On recent builds on some devices, the two threads didn't manage to establish any connection within those 50msec but only when other tests had run previously. The exact cause for this was not determined, but threads left over from over from previous tests appeared to not be involved. This CL increases the maximum time for each iteration to a maximum of 150msec (in steps of 50msec, until at least one connection was established), which makes the test pass (5/5 tries) on a device where it previously consistently failed (15/16 tries failed). Test: CtsLibcoreTestCases (on affected device) Bug: 62859398 Change-Id: I60eb30e9aa96ab1c0d2b70a74d38757c2bb84825
5381c95c802b9ddbfd9b795ca5823c332898c73d	17-Feb-2017	Tobias Thierer <tobiast@google.com>	Attempt to de-flaky ServerSocketConcurrentCloseTest The test runs 100 iterations, each asserting that connections between a server and client thread are made within 50msec. Rarely, this assertion fails in one of those iterations. Before this CL, the test was counting the time for the server and client Threads to start up towards those 50msec, which is unreasonable since no connections can be made while the threads are still starting. This CL modifies the test to allow the threads 1 second to start, which is not counted towards the 50msec. Test: Ran 10,000 iterations (the equivalent of running ServerSocketConcurrentCloseTest 100 times in a row) Bug: 35463422 Change-Id: I04ff834661ecc96b5328d82926ef766bb30e5433
5e5169a394fd431609c3419bb4052e71dc73fed3	17-Aug-2016	Tobias Thierer <tobiast@google.com>	Speed up needlessly slow libcore CTS tests On a Nexus 6P, run cts -p android.core.tests.libcore.package.libcore currently runs for about 20 minutes. This consists of two runs (arm32 and arm64 runtimes) that involved 587sec each spent running the set of test methods. This CL speeds up the test methods that ran the longest (11-21sec per method) as measured by comparing wallclock timestamps in a CTS run. The speedup is about 10% (59sec faster per run, to now 528sec per run). The speedup is achieved by Thread.sleep()ing for shorter in tests that slept for what appeared like excessive amounts of time, and by decreasing data sizes / loop counts for probabilistic tests for race conditions. There was no evidence apparent that these had found any bugs in years, so needing to run them more often to find a potential bug appears like a fair trade-off for a 10% speedup. Test: run cts -p android.core.tests.libcore.package.libcore (ran this three times) Bug: 30948560 Change-Id: I9b28ce60268746b3a5edfc5b94dcc2ec29c8feca
79a03395aa123f547cc4507a73e57fd4afa625b9	22-Jun-2016	Tobias Thierer <tobiast@google.com>	Let testImplAccept_detectsClosedState() fail quickly when it fails. Previously this test relied on the test runner enforcing a reasonable timeout, which sometimes was not the case (e.g. bug 28998288). Note that this test should only fail on devices that exhibited bug 27763633, which was fixed in May. Tested: - Checked that the test still passes. - Didn't verify that the test still fails on a device with the bug. I tried against NRD49C but it seemed like the current AOSP adb doesn't want to talk to that build's adb. Bug: 29030362 Change-Id: I0e25c1be52afaae207e7701bbccdfc681cb0a458
e765ba60627643781097c8b5f5cb39785c8bd61d	22-Jun-2016	Tobias Thierer <tobiast@google.com>	Fix ServerSocketConcurrentCloseTest's package The test was in package java.net while everything else in the same directory was in libcore.java.net. I'm surprised this even compiled. Tested: cts-tradefed run cts -c libcore.java.net.ServerSocketConcurrentCloseTest Change-Id: Ie2c96bb3aa1326fe2b9257822ddb6bfe69dc1f46
46b99406f674c5b4fda9820b6372e24b496c529d	16-May-2016	Tobias Thierer <tobiast@google.com>	Fix race condition in accept() on socket that is concurrently closed. This CL is the result of long digging into the rabbit hole after observing a flaky test. I've added a unit test for behavior at the ServerSocket level, although the root cause of the bug sits lower. ServerSocket.accept() checks for a closed connection at the Java level before descending into deeper (including native) levels. The problem was that if the socket was closed() concurrently after this initial check at the Java level, later native code didn't deal with the closed state correctly and could block indefinitely. Specifically, a closed socket internally sets the value of its fd field to -1. Native code reads that field via (*env)->GetIntField(env, fdObj, IO_fd_fdID); and passes it to NET_Timeout, which then blocks. Other places in the code read the same field and pass it to NET_Accept and NET_Connect, which do not block when they get an fd value of -1. There are 7 places that I found which call NET_Timeout, and they all get their fd value from the Java field using the code snippet above. NET_Timeout is documented to be a "Wrapper for poll(s, timeout)". However, it was never a particularly thin wrapper: unlike NET_Poll, NET_Timeout already has a bunch of extra logic on top of the poll syscall, namely ignoring other signals and adjusting the timeout. All of this suggests that NET_Timeout is the right place to handle the case of fd AKA s being < 0. As discussed on the bug, observability in the sense of ServerSocketConcurrentCloseTest failing was caused by: https://android.googlesource.com/platform/libcore/+/eeb3b74 It's therefore probably a good idea to investigate whether our current mechanics for closing sockets are the right ones. Establishing correctness of such an alternative change would require very careful analysis, though. Therefore I went for this fix instead, which is much easier to verify as safe and corrects all observable issues that we currently know about. Tested: - Ran the test many times from Android Studio (adjusted test's packageName). This is the only test that I also did on devices without the fix. - on build with the fix, the test passes 25/25 times. - on build without the fix, the test fails 12/15 times. - Also ran a couple of times from CTS (25181x pass, 19x skipped): make cts && cts-tradefed run cts -m CtsLibcoreTestCases \ --test java.net.ServerSocketConcurrentCloseTest - Ran all of the libcore tests once through CTS (25181 tests pass, 19 tests not run): make cts && cts-tradefed run cts -m CtsLibcoreTestCases - make cts && cts-tradefed run cts -m CtsNetTestCases Tests were run on a Nexus 6P. I previously verified that the bug also occurs on Nexus 5X and Nexus 6 running N. (cherry picked from commit bcb10462e854ba7172bab840329b6d3126a2fdd4) Bug: 27763633 Change-Id: I074acebe3d3c8a252bf5107bf45be9589d098c6a
bcb10462e854ba7172bab840329b6d3126a2fdd4	16-May-2016	Tobias Thierer <tobiast@google.com>	Fix race condition in accept() on socket that is concurrently closed. This CL is the result of long digging into the rabbit hole after observing a flaky test. I've added a unit test for behavior at the ServerSocket level, although the root cause of the bug sits lower. ServerSocket.accept() checks for a closed connection at the Java level before descending into deeper (including native) levels. The problem was that if the socket was closed() concurrently after this initial check at the Java level, later native code didn't deal with the closed state correctly and could block indefinitely. Specifically, a closed socket internally sets the value of its fd field to -1. Native code reads that field via (*env)->GetIntField(env, fdObj, IO_fd_fdID); and passes it to NET_Timeout, which then blocks. Other places in the code read the same field and pass it to NET_Accept and NET_Connect, which do not block when they get an fd value of -1. There are 7 places that I found which call NET_Timeout, and they all get their fd value from the Java field using the code snippet above. NET_Timeout is documented to be a "Wrapper for poll(s, timeout)". However, it was never a particularly thin wrapper: unlike NET_Poll, NET_Timeout already has a bunch of extra logic on top of the poll syscall, namely ignoring other signals and adjusting the timeout. All of this suggests that NET_Timeout is the right place to handle the case of fd AKA s being < 0. As discussed on the bug, observability in the sense of ServerSocketConcurrentCloseTest failing was caused by: https://android.googlesource.com/platform/libcore/+/eeb3b74 It's therefore probably a good idea to investigate whether our current mechanics for closing sockets are the right ones. Establishing correctness of such an alternative change would require very careful analysis, though. Therefore I went for this fix instead, which is much easier to verify as safe and corrects all observable issues that we currently know about. Tested: - Ran the test many times from Android Studio (adjusted test's packageName). This is the only test that I also did on devices without the fix. - on build with the fix, the test passes 25/25 times. - on build without the fix, the test fails 12/15 times. - Also ran a couple of times from CTS (25181x pass, 19x skipped): make cts && cts-tradefed run cts -m CtsLibcoreTestCases \ --test java.net.ServerSocketConcurrentCloseTest - Ran all of the libcore tests once through CTS (25181 tests pass, 19 tests not run): make cts && cts-tradefed run cts -m CtsLibcoreTestCases - make cts && cts-tradefed run cts -m CtsNetTestCases Tests were run on a Nexus 6P. I previously verified that the bug also occurs on Nexus 5X and Nexus 6 running N. Bug: 27763633 Change-Id: I074acebe3d3c8a252bf5107bf45be9589d098c6a