9315d4ae2857cc7f1292d906d9352d8df448970f |
|
05-Sep-2017 |
Tobias Thierer <tobiast@google.com> |
ServerSocketConcurrentCloseTest: license header This file was originally added by myself in commit bcb10462e854ba7172bab840329b6d3126a2fdd4 (May 2016). Test: Treehugger Change-Id: I2dee98752d91e6f6df466eac97fcaf57609c9590
|
f1866bd83f2c05e3e0bd52ed40296edf2599f28e |
|
25-Aug-2017 |
Tobias Thierer <tobiast@google.com> |
Attempt to de-flake ServerSocketConcurrentCloseTest Relax non-critical assertion that sometimes fails on old devices and for which investigation has not revealed any serious underlying cause. The test is running 100 iterations; during each iteration, separate server and client threads repeatedly connect to each other, until they're shut down after 50msec. On some systems, the test runs out of sockets/connections since they're still stuck in TIMED_WAIT state, leading to later iterations failing to achieve any connections. The test had an assertion that connections were achieved in *each* iteration, and that assertion would fail on such systems. Since the assertion was only ever intended to guard in a bug in the test, rather than a bug in the code under test, this CL relaxes that assertion to only assert that *some* (currently >= 5) out of the 100 iterations achieve any connections. The main purpose of the test, ensuring that the server thread does not block indefinitely when the ServerSocket is closed, is unaffected. Bug: 65024353 Test: ran ServerSocketConcurrentCloseTest once (on Pixel XL). (cherry picked from commit 1cbfa9d8393e3ce68706a3d221462afe69db7f97) Change-Id: I8c11954bc6374850fcb2962a352fa7683de48308
|
1cbfa9d8393e3ce68706a3d221462afe69db7f97 |
|
25-Aug-2017 |
Tobias Thierer <tobiast@google.com> |
Attempt to de-flake ServerSocketConcurrentCloseTest Relax non-critical assertion that sometimes fails on old devices and for which investigation has not revealed any serious underlying cause. The test is running 100 iterations; during each iteration, separate server and client threads repeatedly connect to each other, until they're shut down after 50msec. On some systems, the test runs out of sockets/connections since they're still stuck in TIMED_WAIT state, leading to later iterations failing to achieve any connections. The test had an assertion that connections were achieved in *each* iteration, and that assertion would fail on such systems. Since the assertion was only ever intended to guard in a bug in the test, rather than a bug in the code under test, this CL relaxes that assertion to only assert that *some* (currently >= 5) out of the 100 iterations achieve any connections. The main purpose of the test, ensuring that the server thread does not block indefinitely when the ServerSocket is closed, is unaffected. Bug: 65024353 Test: ran ServerSocketConcurrentCloseTest once (on Pixel XL). Change-Id: I247509ff671a0e0dd86c79805346e76a00761230
|
7177fa7f9d1f256019fabb85c56d192237d2fe34 |
|
05-Jul-2017 |
Tobias Thierer <tobiast@google.com> |
Fix early termination in ServerSocketConcurrentCloseTest Commit 7956ee1980b5da2ede5d0555cabe5755fcaef813 added a fallback to give the server/client threads an extra couple of 50msec blocks per iteration. It was intended that those extra blocks would only be given when the threads didn't manage to make any connections in the first 50msec; but the value was read before it was written (after the socket is closed) so the extra 50msec blocks were always given, by accident. This exhausted sockets on some systems. This CL fix this by incrementing the connection counter during the test run. Bug: 63311387 Test: ServerSocketConcurrentCloseTest Change-Id: I706d5db09998a7fae0ceadd3dce8aedbaa6a48f4
|
cecbeeb99fe09060d719af26c1caa4e35117e46d |
|
05-Jul-2017 |
Kazuhiro Inaba <kinaba@google.com> |
DO NOT MERGE: Stabilize testConcurrentServerSocketCloseReliablyThrows. As commented inside the test code, iterating this test too many times will make it flaky by piling up sockets in TIME_WAIT state and eventually failing. Reducing the iteration number mitigates the issue. The iteration number is already reduced in the master branch for speeding up the test: http://r.android.com/258101/. The current CL extracts from the master CL only the part relevant to the flaky test. Bug: 63265871 Test: CtsLibcoreTestCases (on affected device) Change-Id: I9b121e180835e58a6ed419f78d0619418b1030fa
|
bbea315f534b92d98566afd18ef61ccd198bd5b1 |
|
27-Jun-2017 |
Tobias Thierer <tobiast@google.com> |
Work around ServerSocketConcurrentCloseTest flakiness The test involves 100 iterations of a client and server thread connecting to each other and disconnecting over and over, until the ServerSocket involved is closed at the end of the iteration. Before this CL, each iteration took 50msec (counted from when the threads signaled that they had started running by counting down a latch. On recent builds on some devices, the two threads didn't manage to establish any connection within those 50msec but only when other tests had run previously. The exact cause for this was not determined, but threads left over from over from previous tests appeared to not be involved. This CL increases the maximum time for each iteration to a maximum of 150msec (in steps of 50msec, until at least one connection was established), which makes the test pass (5/5 tries) on a device where it previously consistently failed (15/16 tries failed). Test: CtsLibcoreTestCases (on affected device) Bug: 62859398 (cherry picked from commit 7956ee1980b5da2ede5d0555cabe5755fcaef813) Change-Id: I19368fa2cac3b43f4c361b785c9ff3202c87652f
|
7956ee1980b5da2ede5d0555cabe5755fcaef813 |
|
27-Jun-2017 |
Tobias Thierer <tobiast@google.com> |
Work around ServerSocketConcurrentCloseTest flakiness The test involves 100 iterations of a client and server thread connecting to each other and disconnecting over and over, until the ServerSocket involved is closed at the end of the iteration. Before this CL, each iteration took 50msec (counted from when the threads signaled that they had started running by counting down a latch. On recent builds on some devices, the two threads didn't manage to establish any connection within those 50msec but only when other tests had run previously. The exact cause for this was not determined, but threads left over from over from previous tests appeared to not be involved. This CL increases the maximum time for each iteration to a maximum of 150msec (in steps of 50msec, until at least one connection was established), which makes the test pass (5/5 tries) on a device where it previously consistently failed (15/16 tries failed). Test: CtsLibcoreTestCases (on affected device) Bug: 62859398 Change-Id: I60eb30e9aa96ab1c0d2b70a74d38757c2bb84825
|
5381c95c802b9ddbfd9b795ca5823c332898c73d |
|
17-Feb-2017 |
Tobias Thierer <tobiast@google.com> |
Attempt to de-flaky ServerSocketConcurrentCloseTest The test runs 100 iterations, each asserting that connections between a server and client thread are made within 50msec. Rarely, this assertion fails in one of those iterations. Before this CL, the test was counting the time for the server and client Threads to start up towards those 50msec, which is unreasonable since no connections can be made while the threads are still starting. This CL modifies the test to allow the threads 1 second to start, which is not counted towards the 50msec. Test: Ran 10,000 iterations (the equivalent of running ServerSocketConcurrentCloseTest 100 times in a row) Bug: 35463422 Change-Id: I04ff834661ecc96b5328d82926ef766bb30e5433
|
5e5169a394fd431609c3419bb4052e71dc73fed3 |
|
17-Aug-2016 |
Tobias Thierer <tobiast@google.com> |
Speed up needlessly slow libcore CTS tests On a Nexus 6P, run cts -p android.core.tests.libcore.package.libcore currently runs for about 20 minutes. This consists of two runs (arm32 and arm64 runtimes) that involved 587sec each spent running the set of test methods. This CL speeds up the test methods that ran the longest (11-21sec per method) as measured by comparing wallclock timestamps in a CTS run. The speedup is about 10% (59sec faster per run, to now 528sec per run). The speedup is achieved by Thread.sleep()ing for shorter in tests that slept for what appeared like excessive amounts of time, and by decreasing data sizes / loop counts for probabilistic tests for race conditions. There was no evidence apparent that these had found any bugs in years, so needing to run them more often to find a potential bug appears like a fair trade-off for a 10% speedup. Test: run cts -p android.core.tests.libcore.package.libcore (ran this three times) Bug: 30948560 Change-Id: I9b28ce60268746b3a5edfc5b94dcc2ec29c8feca
|
79a03395aa123f547cc4507a73e57fd4afa625b9 |
|
22-Jun-2016 |
Tobias Thierer <tobiast@google.com> |
Let testImplAccept_detectsClosedState() fail quickly when it fails. Previously this test relied on the test runner enforcing a reasonable timeout, which sometimes was not the case (e.g. bug 28998288). Note that this test should only fail on devices that exhibited bug 27763633, which was fixed in May. Tested: - Checked that the test still passes. - Didn't verify that the test still fails on a device with the bug. I tried against NRD49C but it seemed like the current AOSP adb doesn't want to talk to that build's adb. Bug: 29030362 Change-Id: I0e25c1be52afaae207e7701bbccdfc681cb0a458
|
e765ba60627643781097c8b5f5cb39785c8bd61d |
|
22-Jun-2016 |
Tobias Thierer <tobiast@google.com> |
Fix ServerSocketConcurrentCloseTest's package The test was in package java.net while everything else in the same directory was in libcore.java.net. I'm surprised this even compiled. Tested: cts-tradefed run cts -c libcore.java.net.ServerSocketConcurrentCloseTest Change-Id: Ie2c96bb3aa1326fe2b9257822ddb6bfe69dc1f46
|
46b99406f674c5b4fda9820b6372e24b496c529d |
|
16-May-2016 |
Tobias Thierer <tobiast@google.com> |
Fix race condition in accept() on socket that is concurrently closed. This CL is the result of long digging into the rabbit hole after observing a flaky test. I've added a unit test for behavior at the ServerSocket level, although the root cause of the bug sits lower. ServerSocket.accept() checks for a closed connection at the Java level before descending into deeper (including native) levels. The problem was that if the socket was closed() concurrently *after* this initial check at the Java level, later native code didn't deal with the closed state correctly and could block indefinitely. Specifically, a closed socket internally sets the value of its fd field to -1. Native code reads that field via (*env)->GetIntField(env, fdObj, IO_fd_fdID); and passes it to NET_Timeout, which then blocks. Other places in the code read the same field and pass it to NET_Accept and NET_Connect, which do not block when they get an fd value of -1. There are 7 places that I found which call NET_Timeout, and they all get their fd value from the Java field using the code snippet above. NET_Timeout is documented to be a "Wrapper for poll(s, timeout)". However, it was never a particularly thin wrapper: unlike NET_Poll, NET_Timeout already has a bunch of extra logic on top of the poll syscall, namely ignoring other signals and adjusting the timeout. All of this suggests that NET_Timeout is the right place to handle the case of fd AKA s being < 0. As discussed on the bug, observability in the sense of ServerSocketConcurrentCloseTest failing was caused by: https://android.googlesource.com/platform/libcore/+/eeb3b74 It's therefore probably a good idea to investigate whether our current mechanics for closing sockets are the right ones. Establishing correctness of such an alternative change would require very careful analysis, though. Therefore I went for this fix instead, which is much easier to verify as safe and corrects all observable issues that we currently know about. Tested: - Ran the test many times from Android Studio (adjusted test's packageName). This is the only test that I also did on devices without the fix. - on build with the fix, the test passes 25/25 times. - on build without the fix, the test fails 12/15 times. - Also ran a couple of times from CTS (25181x pass, 19x skipped): make cts && cts-tradefed run cts -m CtsLibcoreTestCases \ --test java.net.ServerSocketConcurrentCloseTest - Ran all of the libcore tests once through CTS (25181 tests pass, 19 tests not run): make cts && cts-tradefed run cts -m CtsLibcoreTestCases - make cts && cts-tradefed run cts -m CtsNetTestCases Tests were run on a Nexus 6P. I previously verified that the bug also occurs on Nexus 5X and Nexus 6 running N. (cherry picked from commit bcb10462e854ba7172bab840329b6d3126a2fdd4) Bug: 27763633 Change-Id: I074acebe3d3c8a252bf5107bf45be9589d098c6a
|
bcb10462e854ba7172bab840329b6d3126a2fdd4 |
|
16-May-2016 |
Tobias Thierer <tobiast@google.com> |
Fix race condition in accept() on socket that is concurrently closed. This CL is the result of long digging into the rabbit hole after observing a flaky test. I've added a unit test for behavior at the ServerSocket level, although the root cause of the bug sits lower. ServerSocket.accept() checks for a closed connection at the Java level before descending into deeper (including native) levels. The problem was that if the socket was closed() concurrently *after* this initial check at the Java level, later native code didn't deal with the closed state correctly and could block indefinitely. Specifically, a closed socket internally sets the value of its fd field to -1. Native code reads that field via (*env)->GetIntField(env, fdObj, IO_fd_fdID); and passes it to NET_Timeout, which then blocks. Other places in the code read the same field and pass it to NET_Accept and NET_Connect, which do not block when they get an fd value of -1. There are 7 places that I found which call NET_Timeout, and they all get their fd value from the Java field using the code snippet above. NET_Timeout is documented to be a "Wrapper for poll(s, timeout)". However, it was never a particularly thin wrapper: unlike NET_Poll, NET_Timeout already has a bunch of extra logic on top of the poll syscall, namely ignoring other signals and adjusting the timeout. All of this suggests that NET_Timeout is the right place to handle the case of fd AKA s being < 0. As discussed on the bug, observability in the sense of ServerSocketConcurrentCloseTest failing was caused by: https://android.googlesource.com/platform/libcore/+/eeb3b74 It's therefore probably a good idea to investigate whether our current mechanics for closing sockets are the right ones. Establishing correctness of such an alternative change would require very careful analysis, though. Therefore I went for this fix instead, which is much easier to verify as safe and corrects all observable issues that we currently know about. Tested: - Ran the test many times from Android Studio (adjusted test's packageName). This is the only test that I also did on devices without the fix. - on build with the fix, the test passes 25/25 times. - on build without the fix, the test fails 12/15 times. - Also ran a couple of times from CTS (25181x pass, 19x skipped): make cts && cts-tradefed run cts -m CtsLibcoreTestCases \ --test java.net.ServerSocketConcurrentCloseTest - Ran all of the libcore tests once through CTS (25181 tests pass, 19 tests not run): make cts && cts-tradefed run cts -m CtsLibcoreTestCases - make cts && cts-tradefed run cts -m CtsNetTestCases Tests were run on a Nexus 6P. I previously verified that the bug also occurs on Nexus 5X and Nexus 6 running N. Bug: 27763633 Change-Id: I074acebe3d3c8a252bf5107bf45be9589d098c6a
|