1ee8d33dde23e083b3798429de7fc3b41a9f491f |
|
06-May-2014 |
Phillip Lougher <phillip@squashfs.org.uk> |
caches-queues-lists: protect against multiple inclusion Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
8bb17b0275fa35318ad35c8fd477023004f940aa |
|
31-Mar-2014 |
Phillip Lougher <phillip@squashfs.org.uk> |
Mksquashfs: significantly optimise fragment duplicate checking Remove the last remaining parallelisation bottleneck in Mksquashfs - fragment duplicate checking which was previously done on the single main thread. Back in 2006 when I first parallelised Mksquashfs, doing this on the main thread was initially not considered to be too much of an issue. If you don't have (m)any duplicates then you avoid the issue full stop. But even when you do have fragments which need to be checked, the necessary work of memcmp (memory compare) is not too arduous and is much faster than the upstream file reader thread, and much much faster than the downstream fragment compressor thread(s), and so if Mksquashfs is running slow then this is "not the bottleneck you're looking for", that's going to either be fragment compression or file reading. This is on the basis most duplicates are local and the fragment referenced can be found in the fragment cache. But often when I ran Mksquashfs and had a lot of duplicates the performance of Mksquashfs would be disappointing, normally without duplicates, I expected to get full processor utilisation, but with duplicates you might get roughly 200% or even 100% (i.e. one processor core), at least for the time it was hitting a run of duplicates in the source filesystem. Increasing the size of the fragment cache would reduce the performance hit. Which gave a substantial hint the problem was fragment cache misses which caused fragment blocks to be read back off disk and decompressed on the single main thread. But it was evident that wasn't the whole story. The culprit has always self-evidently been the single threaded duplicate checking on the main thread, this has been apparent almost since the initial parallelisation of Mksquashfs in 2006, but although I've had my suspicions as to why (the hint above), with the demands/prioritisation of extra functionality, this has remained on my TODO list until now. Analysis now has shown the problem to be a triple whammy: 1. With duplicates (and even without), there are substantial fragment cache misses, which make the single main thread spend a lot of the time duplicate checking reading in fragment blocks off disk, decompressing them, and then memcmp'ing them for a match. This is because with a large filesystem, many fragments match at the checksum level even though they're not actually a match at the byte level - the checksums eliminate most files, but if you've got a large filesystem that still leaves multiple files which match, and this match is random, and does not follow locality of reference. So invariably these fragment blocks are no longer in the fragment cache (if you're compressing 1Gbyte of files and have a 64Mbyte (default) fragment cache, most checksum matches will invariably not be in the cache, because they do not follow the "locality of reference rules", the checksum matches can literally be anywhere in the part of the filesystem already compressed and written to disk). The checksum matches in theory could be reduced by improving the discriminating power of the checksums, but this is a zero sum game, the extra processing overhead of computing a more sophisticated checksum for *all* blocks would easily outweigh the benefits of less checksum matches. 2. Even with the knowledge the main thread spends a lot of the time reading in and decompressing fragment blocks, we're left with the fact the main thread has enough "bandwidth" to do this without becoming a bottleneck, so there's more to the story. The "more to the story" is that the main thread spends most of its time asleep! As fragment compression is the bottleneck in any Mksquashfs run, we run out of "empty fragment blocks" because all of the fragment blocks become filled, and get queued on the fragment compression threads waiting for them to be compressed. So the main thread sleeps waiting for an "empty fragment block" even though it has a queue of files which it could be duplicate checking. 3. When the main thread does wake up having got an "empty fragment block" and it starts to do duplicate checking, if that duplicate checking takes a long time (because it has to read in fragment blocks and decompress them), then it 1. stops passing fragments to the fragment decompressor threads, and 2. stops taking fragments from the reader thread... So both the fragment compressor threads and the reader thread starve. Now, because both the reader thread and the fragment compressor threads have deep queues, this doesn't happen instantaenously, but only if the main thread hits a run of files which need multiple fragment blocks to be read off disk, and decompressed. Unfortunately, that *does* happen. So, we end up with the situation the main thread doesn't duplicate check files ahead of time because it is blocked on the fragment compressor threads. When it does wake up and do duplicate checking (because it didn't do it ahead of time), it ends up starving the fragment compressor threads and reader thread for that duration - hence we get a CPU utilisation of 100% *or less* because only that main thread is running. The solution is to move duplicate checking to multiple one per-core front end processing threads ahead of the main thread (interposed between the reader thread and the main thread). So the front-end threads do duplicate checking on behalf of the main thread. This eliminates the main thread bottleneck at a stroke, because the front-end threads can duplicate check ahead of time, even though the main thread is blocked on the fragment compressors. In theory simple, in practice extremely difficult. Two issues have to be dealt with: 1. It introduces a level of fragment cache synchronisation hitherto avoided due to clever design in Mksquashfs. Mksquashfs parallelisation is coded on the producer-consumer principle. The producer thread creates buffers in the cache, fills them in, and then passes them to the consumer thread via a queue, the consumer thread only "sees" the buffers when they're read from the queue, at which time the consumer and producer has inherently synchronised, because the consumer only gets them once the producer thread has explicitly done with the buffer. This technique AFAIK was introduced in CSP (communicating sequential processes) and was adopted in the largely forgotten about descendant OCCAM. This technique eliminates explicit buffer locking. The front-end threads break this model because we get multiple threads opportunistically looking up fragments in the fragment cache, and then creating them if they're not available. So we get the problem threads can lookup buffers and get them whilst they're still being filled in, and we get races where two threads can simultaneously create the same buffers. This can, obviously, be dealt with by introducing the concept of "locked" buffers etc. but it means adding an additional set of cache APIs only for the fragment processing threads. 2. The front-end threads have to synchronise with the main thread to do duplicate checking. At the time the front-end threads do duplicate checking, there may exist no duplicates, but the duplicate may exist being duplicate checked itself. Think of the case where we have two files alphabetically one after another, say "a" and "b", "a" goes to front-end thread 1, and "b" goes to front-end thread 2, at this time neither file "exists" because it's being duplicate checked, thread 2 cannot determine file "b" is a duplicate of file "a" because it doesn't "exist" at this time. This has to be done without introducing an inherent synchronisation point on the main thread, which will only reintroduce the main thread bottleneck "by the back door". But is actually more complex than that. There are two additional points where synchronisation with the main thread "by the back door" has to be avoided to get optimum performance. But, you'll have to look at the code because this commit entry is too long as it is. But, the upshot of this improvement, is Mksquashfs speeds up by 10% - 60% depemding on the ratio of duplicate files to non-duplicate files in the source filesystem, which is a significant improvement. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
c6424dc4d55a711ff862409fa07b9c143049aba7 |
|
13-Mar-2014 |
Phillip Lougher <phillip@squashfs.org.uk> |
process_fragments: move fragment checksumming to the process fragment threads Move fragment checksumming to the process fragment threads, and away from the main thread. This has a couple of minor side effects: 1. Tail end fragments (fragments belonging to files larger than the block size) are now checksummed up front. 2. This means add_non_dup() gains an extra checksum_frag_flag, because we now have a combination of True/False statuses for the block checksum and the fragment checksum. Previously, we either had both the block checksum and fragment checksum, or neither. (fragments pre-existing on disk on append are not checksummed up front). 3. duplicate() no longer needs the fragment checksum to be passed, because it is contained within the file_buffer structure which is always passed. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
8bcdeb66b77869500bacaa3cd0c1dc9c9b4c2676 |
|
27-Feb-2014 |
Phillip Lougher <phillip@squashfs.org.uk> |
caches-queues-lists: Merge sequence and index in struct file_buffer Now that the deflator thread(s) do not hash the writer block, the index and sequence fields are never in use at the same time. The sequence field is needed to correctly order the blocks output by the seq_queue, but at this time, none of the blocks are hashed and index is unused. Once the blocks are obtained from the seq_queue they are either discarded (uncompressed fragment blocks and empty file blocks) and never hashed, or they are hashed by the main thread later. But when this happens the sequence field is no longer in use. So we can merge the two fields which saves 8 bytes. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
c9352f0dc22fe7fceb4bfa28719cac784f492694 |
|
27-Feb-2014 |
Phillip Lougher <phillip@squashfs.org.uk> |
caches-queues-lists: Rearrange struct file_buffer Move long longs to the start of the structure, on 32-bit nachines these are, obviously, longer than pointers. Making them first avoids the necessity of the compiler to insert 4-bytes of padding between the pointers and the long longs to get 8 byte alignment. The reduction of the structure size from 68 bytes to 64 bytes, also means the compiler doesn't have to pad the structure out to 72 bytes to get an 8 byte multiple (again to ensure 8 byte alignment of long longs). This makes a saving of 8 bytes on 32-bit machines. On 64-bit machines there is no difference. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
23d01af409e5a98559e471d20b76b73299f6a7a9 |
|
23-Feb-2014 |
Phillip Lougher <phillip@squashfs.org.uk> |
caches-queues-lists: collapse struct file_buffer unions into one with structs Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
84e20d70e906c24483d0b9dc6a38f310ee78728d |
|
23-Feb-2014 |
Phillip Lougher <phillip@squashfs.org.uk> |
Mksquashfs: change cache_rehash() to cache_hash() Originally caches in Mksquashfs on calling cache_get() were always hashed to the index supplied on calling cache_get(). For writer blocks obtained in the deflator() threads this created a problem because at the time of getting the cache block the ultimate index was not known (the index is the disk location where the block is finally stored, which is not known until the writer block is processed by the main thread). The solution adopted then was to cache_get() the writer block with a dummy index, and later when the real index was known to rehash the writer block. This was good and worked well. However, since then a new cache_get_nohash() call has been implemented. This was introduced for reader buffers which are not looked up and therefore never need to be hashed. We can use this new call here by changing cache_rehash() to cache_hash() - get an initial unhashed buffer via cache_get_nohash() and later create a hash mapping via cache_hash(). The semantic changes from cache_rehash() to cache_hash() are because we are non-longer rehashing (removing an existing hash and adding a new hash) but adding a hash to a cache entry that has never been hashed. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
b9929819ee40be51923a4bee41aafb279313f4a3 |
|
24-Jan-2014 |
Phillip Lougher <phillip@squashfs.org.uk> |
caches-queues-lists: implement seq_queue_flush() Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
52c51f6f57536c72f2af0a77b4e67d378391327c |
|
24-Jan-2014 |
Phillip Lougher <phillip@squashfs.org.uk> |
caches-queues-lists: abstract hash table size and don't hard code it everywhere Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
2511927d1c835f6b3a3a8a91bfb4229c665a3554 |
|
24-Jan-2014 |
Phillip Lougher <phillip@squashfs.org.uk> |
caches-queues-lists: implement queue_flush() Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
6164b5f840437a54d70b1155af0e01fc4bef48d2 |
|
23-May-2013 |
Phillip Lougher <phillip@squashfs.org.uk> |
mksquashfs: replace fragment_locked list with a queue Replace the one-off implementation specific fragment locked list with a generic queue. This has both the advantage of reusing generic infrastructure reducing code overhead, and it means the queue and its size can be reported in the dump_status() function. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
e57c3b5840076959cffd493ba4053007c86161f5 |
|
23-May-2013 |
Phillip Lougher <phillip@squashfs.org.uk> |
caches-queues-lists: add queue_empty() Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
f54d70166a6a8de3ea56c5598a3171281836a906 |
|
22-May-2013 |
Phillip Lougher <phillip@squashfs.org.uk> |
caches-queues-lists: Get rid of now unused field in struct file_buffer next was only used in the mksquashfs internal queue when reordering buffers received from the deflate thread(s). This is now handled by the new seq queue implementation Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
137dcfcc5368ae9c78b4a8a7fb780dda21583036 |
|
15-May-2013 |
Phillip Lougher <phillip@squashfs.org.uk> |
caches-queues-lists: fix bug caused by new seq_queue implementation When adding the seq_queue implementation I did not want to increase the existing size of struct file_buffer. This meant I needed to reuse fields that are not used at the time the file_buffer is being enqueued to the seq_queue. The seq_queue implementation was intended to use the free_prev and free_next pointers, as they're guaranteed not to be used (they're used to keep track of free buffers in the caches, which by definition as the buffer is in use, these fields are not in use). Unfortunately, when turning the hash functions into macros I forgot to do this! With the consequence the seq_queue has been re-using the hash_next and hash_prev pointers, which are already in use if the buffer being queued is compressed (and by definition stored hashed in the write cache). Because the effect of this is to mutually corrupt the hash lists, this causes subtle failures in lookup which largely do not cause mksquashfs to fail. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
40e1876a205a36d4b725a1b6db8a71757190d17b |
|
12-May-2013 |
Phillip Lougher <phillip@squashfs.org.uk> |
caches-queues-lists: implement dump_seq_queue Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
fcd941550c13b9c7158b3164f29c4e00dbfda99c |
|
11-May-2013 |
Phillip Lougher <phillip@squashfs.org.uk> |
caches-queues-lists: add a specialised "sequential queue" implementation Add a specialised "sequential queue" implementation. This rather than outputing the data in the order pushed to the queue, outputs it in the order specified in the "sequence" field in the pushed entries. Also only output data if the queue holds the next sequential number to the previous output buffer. This ensures that buffers pushed to the queue out of sequence are delivered in the correct sequential order. This queue is necessary because the queue collects data from the deflate threads, and also (in the future and originally a couple of releases ago) directly from the reader thread. Depending on the necessary time to compress the buffers, the deflate threads can deliver buffers out of sequence, for instance deflate_thread_1 takes buffer_1, and deflate_thread_2 takes buffer_2, if buffer_2 takes less time to compress, this will be delivered before buffer_1 by the deflate threads. Previously(*) (and in the future) uncompressed fragment buffers were sent directly to the main thread, rather than going via the deflate threads. Due to this a fragment block (representing block n) could arrive before buffers n-1, n-2 etc because they have gone via the deflate threads and taken time to compress. (*) this was changed to queue the fragment buffers via the deflate threads. This was done for two reasons: 1. it was noticed queueing fragment blocks directly to the main thread resulted in a significant number of out of order blocks, and a lot of unnecessary wake-ups of the main thread. Queuing via the deflate threads eliminated that source of out of order blocks, but with the consequence that the deflate_threads got more wake ups. But queueing via the deflate threads at least enabled the next reason to take place. 2. Queueing via the deflate threads enabled the deflate threads to do sparse checking (all zero) on the fragment blocks, and to "promote" them to sparse blocks if they were all zero. But for a long time a better solution was required, and this is it ... Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
9ddeae253410ec344ad8345f722f952749f01419 |
|
09-May-2013 |
Phillip Lougher <phillip@squashfs.org.uk> |
caches-queues-lists: add a hash table macro implementation and implement insert/remove_cache_hash_table using them. This allows the hash table insert/remove code to be reused, without ending up with lots of different cut and pasted copies. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
b4edf5dd9588f281a9d45b893f6be42b282d4ca1 |
|
01-May-2013 |
Phillip Lougher <phillip@squashfs.org.uk> |
caches-queues-lists: add code tracking max historical size of shrinking cache Plus add missing code to update cache->used variable in cache_lookup() for nonshrinking lookup caches if the lookup moves a buffer from the freelist. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
316ab63e3361639621b90a02bbb31d30990dbb63 |
|
29-Apr-2013 |
Phillip Lougher <phillip@squashfs.org.uk> |
caches-queues-lists: get rid of "keep" blocks and make behaviour more explicit Originally the caches in mksquashfs were "buffer pools" that expanded as necessary when readahead/number of buffers in flight increased, and shrank as the number of buffers in flight decreased. It was realised that the "buffer pools" could also be used to lookup blocks in duplicate checking. Firstly, when the fragment/blocks of the file being duplicate checked were "still in flight" it was faster to look them up in the "buffer pool" rather than have to wait for the blocks to be written to disk (as was originally necessary because blocks in flight were inaccessible). Secondly, when blocks were given back to the "buffer pool", if rather than shrinking the cache, the blocks were put onto a freelist, duplicate checking might find the blocks there which is faster than having to read them back from the output filesystem. From this evolved the concept of "keep" blocks, which when a block was obtained from the cache via cache_get() if the keep flag was set this signified that the block was to be kept when released via cache_put(). This was perfectly adequate until now when code is being added to monitor/record cache behaviour. As the keep flag is a per block flag there is no cache specific information that indicates whether a cache is being used as a non-shrinking lookup cache or as shrinking non-lookup based "cache" (in this case it is really acting as a buffer pool). This is important because it turns out the statistics to be gathered differs depending on the cache usage. A non-shrinking lookup cache has count which represents the maximum size the cache has grown to, and used which represents the size of the cache currently in use. A shrinking non-lookup cache on the other hand shrinks as blocks become unused, and so count and used are one and the same. Given this there is no way of determining what the maximum size the shrinking cache has historically grown to using just count and used; used is completely redundant, and the maximum historical size is not recorded. So as first step, rearchitect the "keep" code moving the flag from being a per block flag to being a per cache flag, making it explicit whether the cache is being used as a non-shrinking lookup cache or as a shrinking non-lookup based cache. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
7a241a44cb40422c2f59e47e50c866ff78356b6c |
|
29-Apr-2013 |
Phillip Lougher <phillip@squashfs.org.uk> |
info: add a used parameter to the cache dump information Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
e3ef7b81ce21d8da7007daeac4c351d2c2c11f9d |
|
27-Apr-2013 |
Phillip Lougher <phillip@squashfs.org.uk> |
info: add code to dump cache state Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
7538d74f6fbefc11f20fe33ab75d5f197b2dee5c |
|
22-Apr-2013 |
Phillip Lougher <phillip@squashfs.org.uk> |
info: add initial code to dump queue state when sent SIGHUP Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
c97dac5d763d3947b2370b8186611fa655864fcc |
|
14-Apr-2013 |
Phillip Lougher <phillip@squashfs.org.uk> |
queues-caches-lists: move cache freelist allocate logic into here, and make it per a per cache option. On appending we want to change the allocate logic for the fragment and writer caches to be grow first, rather than allocate from the free list first. This is because on appending with the use free list first logic the caches never grow very large leading to reduced cache effectiveness. By making it a per cache option we fix the issue that on appending we also made the reader cache grow first. This was *not* a major problem except because we never do lookup on the reader cache we don't need to do it! This commit is mainly because it always should have been a cache private variable rather than a global. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
7ffdf2a24b1871d251b7e97b1c8492375fc2b19d |
|
14-Apr-2013 |
Phillip Lougher <phillip@squashfs.org.uk> |
caches-queues-lists: move definitions of {insert|remove}_fragment_list from mksquashfs.c to here. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|
71f3964ce67bf6c9eb4816872cd5db1d69d8cf28 |
|
09-Apr-2013 |
Phillip Lougher <phillip@squashfs.org.uk> |
mksquashfs: move the caches, queues and lists implementations to a separate file, and out of mksquashfs.c. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
/external/squashfs-tools/squashfs-tools/caches-queues-lists.h
|