History log of /libcore/ojluni/src/main/native/linux_close.cpp
Revision Date Author Comments
fde01e06d8aacfc18c2d4bf1930e271ae5c812d1 02-Aug-2016 Yi Kong <yikong@google.com> Remove unnecessary +x flag

Source code files and serialized resource files should not have x flag set.

This is a follow up to commit 49965c1d, where native codes and some Java
source files are left out.

Bug: 29977629
Test: Build, CTS tests
Change-Id: I475491284cf5784ed499daa434c2845cdadea3a0
46b99406f674c5b4fda9820b6372e24b496c529d 16-May-2016 Tobias Thierer <tobiast@google.com> Fix race condition in accept() on socket that is concurrently closed.

This CL is the result of long digging into the rabbit hole after
observing a flaky test. I've added a unit test for behavior at
the ServerSocket level, although the root cause of the bug sits
lower.

ServerSocket.accept() checks for a closed connection at the Java
level before descending into deeper (including native) levels.
The problem was that if the socket was closed() concurrently
*after* this initial check at the Java level, later native code
didn't deal with the closed state correctly and could block
indefinitely.

Specifically, a closed socket internally sets the value of its
fd field to -1. Native code reads that field via

(*env)->GetIntField(env, fdObj, IO_fd_fdID);

and passes it to NET_Timeout, which then blocks. Other places
in the code read the same field and pass it to NET_Accept and
NET_Connect, which do not block when they get an fd value of -1.

There are 7 places that I found which call NET_Timeout, and they
all get their fd value from the Java field using the code snippet
above. NET_Timeout is documented to be a "Wrapper for
poll(s, timeout)". However, it was never a particularly thin
wrapper: unlike NET_Poll, NET_Timeout already has a bunch of
extra logic on top of the poll syscall, namely ignoring other
signals and adjusting the timeout. All of this suggests that
NET_Timeout is the right place to handle the case of fd AKA s
being < 0.

As discussed on the bug, observability in the sense of
ServerSocketConcurrentCloseTest failing was caused by:

https://android.googlesource.com/platform/libcore/+/eeb3b74

It's therefore probably a good idea to investigate whether our
current mechanics for closing sockets are the right ones.
Establishing correctness of such an alternative change would
require very careful analysis, though. Therefore I went for
this fix instead, which is much easier to verify as safe and
corrects all observable issues that we currently know about.

Tested:
- Ran the test many times from Android Studio (adjusted
test's packageName). This is the only test that I also did
on devices without the fix.
- on build with the fix, the test passes 25/25 times.
- on build without the fix, the test fails 12/15 times.
- Also ran a couple of times from CTS (25181x pass, 19x skipped):
make cts && cts-tradefed run cts -m CtsLibcoreTestCases \
--test java.net.ServerSocketConcurrentCloseTest
- Ran all of the libcore tests once through CTS
(25181 tests pass, 19 tests not run):
make cts && cts-tradefed run cts -m CtsLibcoreTestCases
- make cts && cts-tradefed run cts -m CtsNetTestCases

Tests were run on a Nexus 6P. I previously verified that the
bug also occurs on Nexus 5X and Nexus 6 running N.

(cherry picked from commit bcb10462e854ba7172bab840329b6d3126a2fdd4)

Bug: 27763633
Change-Id: I074acebe3d3c8a252bf5107bf45be9589d098c6a
bcb10462e854ba7172bab840329b6d3126a2fdd4 16-May-2016 Tobias Thierer <tobiast@google.com> Fix race condition in accept() on socket that is concurrently closed.

This CL is the result of long digging into the rabbit hole after
observing a flaky test. I've added a unit test for behavior at
the ServerSocket level, although the root cause of the bug sits
lower.

ServerSocket.accept() checks for a closed connection at the Java
level before descending into deeper (including native) levels.
The problem was that if the socket was closed() concurrently
*after* this initial check at the Java level, later native code
didn't deal with the closed state correctly and could block
indefinitely.

Specifically, a closed socket internally sets the value of its
fd field to -1. Native code reads that field via

(*env)->GetIntField(env, fdObj, IO_fd_fdID);

and passes it to NET_Timeout, which then blocks. Other places
in the code read the same field and pass it to NET_Accept and
NET_Connect, which do not block when they get an fd value of -1.

There are 7 places that I found which call NET_Timeout, and they
all get their fd value from the Java field using the code snippet
above. NET_Timeout is documented to be a "Wrapper for
poll(s, timeout)". However, it was never a particularly thin
wrapper: unlike NET_Poll, NET_Timeout already has a bunch of
extra logic on top of the poll syscall, namely ignoring other
signals and adjusting the timeout. All of this suggests that
NET_Timeout is the right place to handle the case of fd AKA s
being < 0.

As discussed on the bug, observability in the sense of
ServerSocketConcurrentCloseTest failing was caused by:

https://android.googlesource.com/platform/libcore/+/eeb3b74

It's therefore probably a good idea to investigate whether our
current mechanics for closing sockets are the right ones.
Establishing correctness of such an alternative change would
require very careful analysis, though. Therefore I went for
this fix instead, which is much easier to verify as safe and
corrects all observable issues that we currently know about.

Tested:
- Ran the test many times from Android Studio (adjusted
test's packageName). This is the only test that I also did
on devices without the fix.
- on build with the fix, the test passes 25/25 times.
- on build without the fix, the test fails 12/15 times.
- Also ran a couple of times from CTS (25181x pass, 19x skipped):
make cts && cts-tradefed run cts -m CtsLibcoreTestCases \
--test java.net.ServerSocketConcurrentCloseTest
- Ran all of the libcore tests once through CTS
(25181 tests pass, 19 tests not run):
make cts && cts-tradefed run cts -m CtsLibcoreTestCases
- make cts && cts-tradefed run cts -m CtsNetTestCases

Tests were run on a Nexus 6P. I previously verified that the
bug also occurs on Nexus 5X and Nexus 6 running N.

Bug: 27763633
Change-Id: I074acebe3d3c8a252bf5107bf45be9589d098c6a
583eb0e4738456f0547014a4857a14456be267ee 16-Dec-2015 Przemyslaw Szczepaniak <pszczepaniak@google.com> Move enso net code to use AsynchronousCloseMonitor

- Moved AsynchronousCloseMonitor to the libnativehelper,
it manages shared linked list of IO blocked threads shared
by both libopenjdk and libjavacore
- linux_close.c no longer allocates a table of RLIMIT_NOFILE
size.
- linux_close.c became linux_close.cpp so it can use
AsynchronousCloseMonitor for close signal detection.

Bug: 26127752
Change-Id: If8e71d3d3a04b0a723c7a8bd0398a36542ae7864