History log of /dalvik/dx/tests/130-numthread-multidex-deterministic/run
Revision Date Author Comments
a9ce640c36c040b9d2d6df879c71fa3ba38f0ceb 18-Dec-2017 Orion Hodson <oth@google.com> dx: Test script fixes for OpenJDK 9

Add build scripts to each directory and class files generated with
Java 8 compiler. The tests use the latter as golden inputs and remove
the compiler and jasmin as a test running dependency. The build
scripts are only present for updating tests.

There are a few tests where the class files are not added because
there are too many generated during execution. The tests still using
the compiler are 095, 124, 129, 130, 131 and 143.

Bug: 70525148
Test: export EXPERIMENTAL_USE_OPENJDK9=1.8; . build/envsetup.sh; lunch; art/tools/buildbot-build --host -j32; dalvik/dx/tests/run-all-tests --seq
Test: export EXPERIMENTAL_USE_OPENJDK9=false; . build/envsetup.sh; lunch; art/tools/buildbot-build --host -j32; dalvik/dx/tests/run-all-tests --seq

Change-Id: Ifd49e3e81bccb3a0317e9f5677f73d4c5445965e
9feebf2627346eee3a9c6a117d7f279784a51720 01-Mar-2017 Orion Hodson <oth@google.com> Update dx tests and test runner scripts

Update the expected output for a range dx tests that have bitrotted.

Add support for known failures to run-all-tests until all tests are
working.

Test: dalvik/dx/run-all-tests
Change-Id: I2f774e428acab67f7c1e12bcb349352b8068ef0b
bd3b381a74023a63b3713749e4be02429467f789 13-Nov-2014 Peter Jensen <jensenp@google.com> Support --num-threads with --multi-dex (take 2)

With fix for regression introduced in original commit.

The current dx implementation supports options --multi-dex, for applications
not fitting within the dex format limitations; and --num-threads=N, triggers
concurrent processing of multiple input files. However, the implementation
has the following limitations:

The --num-threads option is disabled when used together with --multi-dex.
The --num-threads option implements concurrency at the level of classpath
entries, and does nothing when the classes to be translated are specified
with a single classpath element (e.g. single jar output from Proguard).
The existing --num-threads implementation may produce indeterministic output.
The heuristic used by the --multi-dex option to determine when to rotate the
dex output file is overly conservative.

The primary objective of this change is:
Concurrent translation of classes, independently of input specification format.
Support --num-threads=N in both mono- and multi-dex mode.
Deterministic class output order.
Near optimal use of dex file format capacity.

This is accomplished by reorganizing the dx workflow in a pipeline of
concurrent phases.

read-class | parse-class | translate-class | add-to-dex | convert-dex-to-byte[];
output-dex-files-or-jar

To manage dex file rotation (i.e. --multi-dex support), the parse-class and
add-to-dex phases are synchronized to prevent forwarding classes to the
translate-class phase if it could potentially result in breaking the dex
format limitations. The heuristic currently used to estimate the number of
indices needed for a class is improved, to minimize the amount of serialization
imposed by this feedback mechanism, and to improve the use of dex file capacity.

The translate-class and convert-dex-to-byte[] phases are further parallelized
with configurable (--num-threads=N option) thread pools. This allow translating
classes concurrently, while also performing output conversion in parallel.
Separate collector threads are used to collect results from the thread pools
in deterministic order.

Testing was performed on an Ubuntu system, with 6 cores and 12 hardware threads.
The taskset command was used to experimentally establish that running with more
than 8 hardware threads does not provide any additional benefit.

Experiments shows that the argument to --num-threads should not exceed the
lesser of the number of available hardware threads, and 5. Setting it to a
higher value results in no additional benefit.

The gain is generally larger for larger applications, and not significant for
small applications with less than a few thousands classes. Experiments with
generated classes shows that for large applications gains as high as 50% may
be possible.

For an existing real-life application with more than 11k classes, and requiring
2 dex files, a speed-up of 37% was achieved (--num-threads=5, 8 hardware
threads, 4g Java heap). A speedup of 31% was observed for another application
with ~7 classes.

For small applications, use of --num-threads=N>1 doesn’t provide significant
benefit. Running with --num-threads=1, the modified dx is slightly faster,
but no significant gain is observed unless the application requires multiple
dex files.

The one case where a significant regression may be observed is when using
--num-threads=N>1, with a single hardware thread. This is an inappropriate
configuration, even with the current implementation. However, because of
the limitations of the current implementation, such configurations may exist.
For instance, a configuration using both --multi-dex and --num-threads=5 will
currently generate a warning about using the two options together. With the
new implementation, the options can legitimately be used together, and could
result in an ~20% regression running on a single hardware thread.
Note: the current dx implementation, without --num-threads option, is already
approximately 50% slower with 1 hardware thread, compared to running with 2
or more. With 2 hardware threads the implementations are practically at par
(a little better, or a little worse, depending on the application).

Testing:
Tested with 6 existing applications ranging in size from 1K - 12K classes.
Updated and tested with relevant existing unit tests (one test changed to
account for better dex rotation heuristic).
Added unit test to test deterministic output.
Added unit performance test. By default run script merely validates that
--multi-dex and --num-threads can be used together (fast). However, the test
is configurable to perform performance test, over sets of generated classes.

Signed-off-by: Peter Jensen <jensenp@google.com>

(cherry picked from commit 845d9d0eed0f6556e11ee7f7204fda9c8dd41154)

(cherry picked from commit dd140a22d90495045024334a91770acaad8e065e)

Change-Id: I33a8ea0451efc0af7eb1d72e80cb926d6583d569
dd140a22d90495045024334a91770acaad8e065e 13-Nov-2014 Peter Jensen <jensenp@google.com> Support --num-threads with --multi-dex (take 2)

With fix for regression introduced in original commit.

The current dx implementation supports options --multi-dex, for applications
not fitting within the dex format limitations; and --num-threads=N, triggers
concurrent processing of multiple input files. However, the implementation
has the following limitations:

The --num-threads option is disabled when used together with --multi-dex.
The --num-threads option implements concurrency at the level of classpath
entries, and does nothing when the classes to be translated are specified
with a single classpath element (e.g. single jar output from Proguard).
The existing --num-threads implementation may produce indeterministic output.
The heuristic used by the --multi-dex option to determine when to rotate the
dex output file is overly conservative.

The primary objective of this change is:
Concurrent translation of classes, independently of input specification format.
Support --num-threads=N in both mono- and multi-dex mode.
Deterministic class output order.
Near optimal use of dex file format capacity.

This is accomplished by reorganizing the dx workflow in a pipeline of
concurrent phases.

read-class | parse-class | translate-class | add-to-dex | convert-dex-to-byte[];
output-dex-files-or-jar

To manage dex file rotation (i.e. --multi-dex support), the parse-class and
add-to-dex phases are synchronized to prevent forwarding classes to the
translate-class phase if it could potentially result in breaking the dex
format limitations. The heuristic currently used to estimate the number of
indices needed for a class is improved, to minimize the amount of serialization
imposed by this feedback mechanism, and to improve the use of dex file capacity.

The translate-class and convert-dex-to-byte[] phases are further parallelized
with configurable (--num-threads=N option) thread pools. This allow translating
classes concurrently, while also performing output conversion in parallel.
Separate collector threads are used to collect results from the thread pools
in deterministic order.

Testing was performed on an Ubuntu system, with 6 cores and 12 hardware threads.
The taskset command was used to experimentally establish that running with more
than 8 hardware threads does not provide any additional benefit.

Experiments shows that the argument to --num-threads should not exceed the
lesser of the number of available hardware threads, and 5. Setting it to a
higher value results in no additional benefit.

The gain is generally larger for larger applications, and not significant for
small applications with less than a few thousands classes. Experiments with
generated classes shows that for large applications gains as high as 50% may
be possible.

For an existing real-life application with more than 11k classes, and requiring
2 dex files, a speed-up of 37% was achieved (--num-threads=5, 8 hardware
threads, 4g Java heap). A speedup of 31% was observed for another application
with ~7 classes.

For small applications, use of --num-threads=N>1 doesn’t provide significant
benefit. Running with --num-threads=1, the modified dx is slightly faster,
but no significant gain is observed unless the application requires multiple
dex files.

The one case where a significant regression may be observed is when using
--num-threads=N>1, with a single hardware thread. This is an inappropriate
configuration, even with the current implementation. However, because of
the limitations of the current implementation, such configurations may exist.
For instance, a configuration using both --multi-dex and --num-threads=5 will
currently generate a warning about using the two options together. With the
new implementation, the options can legitimately be used together, and could
result in an ~20% regression running on a single hardware thread.
Note: the current dx implementation, without --num-threads option, is already
approximately 50% slower with 1 hardware thread, compared to running with 2
or more. With 2 hardware threads the implementations are practically at par
(a little better, or a little worse, depending on the application).

Testing:
Tested with 6 existing applications ranging in size from 1K - 12K classes.
Updated and tested with relevant existing unit tests (one test changed to
account for better dex rotation heuristic).
Added unit test to test deterministic output.
Added unit performance test. By default run script merely validates that
--multi-dex and --num-threads can be used together (fast). However, the test
is configurable to perform performance test, over sets of generated classes.

Signed-off-by: Peter Jensen <jensenp@google.com>

(cherry picked from commit 845d9d0eed0f6556e11ee7f7204fda9c8dd41154)

Change-Id: I721effa31c3b1a8b427d3a18ec554a19c5e9765b
c8b036e3fb5e88eb501e953a8a8838b547f2dab4 09-Feb-2015 Benoit Lamarche <benoitlamarche@google.com> Revert "Support --num-threads with --multi-dex"

This reverts commit 845d9d0eed0f6556e11ee7f7204fda9c8dd41154.

Bug: 19313927
Change-Id: Ia6582a3914cc33762aef74da1f5a6a153c8c0ab2
845d9d0eed0f6556e11ee7f7204fda9c8dd41154 13-Nov-2014 Peter Jensen <jensenp@google.com> Support --num-threads with --multi-dex

The current dx implementation supports options --multi-dex, for applications
not fitting within the dex format limitations; and --num-threads=N, triggers
concurrent processing of multiple input files. However, the implementation
has the following limitations:

The --num-threads option is disabled when used together with --multi-dex.
The --num-threads option implements concurrency at the level of classpath
entries, and does nothing when the classes to be translated are specified
with a single classpath element (e.g. single jar output from Proguard).
The existing --num-threads implementation may produce indeterministic output.
The heuristic used by the --multi-dex option to determine when to rotate the
dex output file is overly conservative.

The primary objective of this change is:
Concurrent translation of classes, independently of input specification format.
Support --num-threads=N in both mono- and multi-dex mode.
Deterministic class output order.
Near optimal use of dex file format capacity.

This is accomplished by reorganizing the dx workflow in a pipeline of
concurrent phases.

read-class | parse-class | translate-class | add-to-dex | convert-dex-to-byte[];
output-dex-files-or-jar

To manage dex file rotation (i.e. --multi-dex support), the parse-class and
add-to-dex phases are synchronized to prevent forwarding classes to the
translate-class phase if it could potentially result in breaking the dex
format limitations. The heuristic currently used to estimate the number of
indices needed for a class is improved, to minimize the amount of serialization
imposed by this feedback mechanism, and to improve the use of dex file capacity.

The translate-class and convert-dex-to-byte[] phases are further parallelized
with configurable (--num-threads=N option) thread pools. This allow translating
classes concurrently, while also performing output conversion in parallel.
Separate collector threads are used to collect results from the thread pools
in deterministic order.

Testing was performed on an Ubuntu system, with 6 cores and 12 hardware threads.
The taskset command was used to experimentally establish that running with more
than 8 hardware threads does not provide any additional benefit.

Experiments shows that the argument to --num-threads should not exceed the
lesser of the number of available hardware threads, and 5. Setting it to a
higher value results in no additional benefit.

The gain is generally larger for larger applications, and not significant for
small applications with less than a few thousands classes. Experiments with
generated classes shows that for large applications gains as high as 50% may
be possible.

For an existing real-life application with more than 11k classes, and requiring
2 dex files, a speed-up of 37% was achieved (--num-threads=5, 8 hardware
threads, 4g Java heap). A speedup of 31% was observed for another application
with ~7 classes.

For small applications, use of --num-threads=N>1 doesn’t provide significant
benefit. Running with --num-threads=1, the modified dx is slightly faster,
but no significant gain is observed unless the application requires multiple
dex files.

The one case where a significant regression may be observed is when using
--num-threads=N>1, with a single hardware thread. This is an inappropriate
configuration, even with the current implementation. However, because of
the limitations of the current implementation, such configurations may exist.
For instance, a configuration using both --multi-dex and --num-threads=5 will
currently generate a warning about using the two options together. With the
new implementation, the options can legitimately be used together, and could
result in an ~20% regression running on a single hardware thread.
Note: the current dx implementation, without --num-threads option, is already
approximately 50% slower with 1 hardware thread, compared to running with 2
or more. With 2 hardware threads the implementations are practically at par
(a little better, or a little worse, depending on the application).

Testing:
Tested with 6 existing applications ranging in size from 1K - 12K classes.
Updated and tested with relevant existing unit tests (one test changed to
account for better dex rotation heuristic).
Added unit test to test deterministic output.
Added unit performance test. By default run script merely validates that
--multi-dex and --num-threads can be used together (fast). However, the test
is configurable to perform performance test, over sets of generated classes.

Change-Id: Ic2d11c422396e97171c2e6ceae9477113e261b8e
Signed-off-by: Peter Jensen <jensenp@google.com>