1/*
2 * Copyright (c) 2012, 2013, Oracle and/or its affiliates. All rights reserved.
3 * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
4 *
5 * This code is free software; you can redistribute it and/or modify it
6 * under the terms of the GNU General Public License version 2 only, as
7 * published by the Free Software Foundation.  Oracle designates this
8 * particular file as subject to the "Classpath" exception as provided
9 * by Oracle in the LICENSE file that accompanied this code.
10 *
11 * This code is distributed in the hope that it will be useful, but WITHOUT
12 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
13 * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
14 * version 2 for more details (a copy is included in the LICENSE file that
15 * accompanied this code).
16 *
17 * You should have received a copy of the GNU General Public License version
18 * 2 along with this work; if not, write to the Free Software Foundation,
19 * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
20 *
21 * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
22 * or visit www.oracle.com if you need additional information or have any
23 * questions.
24 */
25
26/**
27 * Classes to support functional-style operations on streams of elements, such
28 * as map-reduce transformations on collections.  For example:
29 *
30 * <pre>{@code
31 *     int sum = widgets.stream()
32 *                      .filter(b -> b.getColor() == RED)
33 *                      .mapToInt(b -> b.getWeight())
34 *                      .sum();
35 * }</pre>
36 *
37 * <p>Here we use {@code widgets}, a {@code Collection<Widget>},
38 * as a source for a stream, and then perform a filter-map-reduce on the stream
39 * to obtain the sum of the weights of the red widgets.  (Summation is an
40 * example of a <a href="package-summary.html#Reduction">reduction</a>
41 * operation.)
42 *
43 * <p>The key abstraction introduced in this package is <em>stream</em>.  The
44 * classes {@link java.util.stream.Stream}, {@link java.util.stream.IntStream},
45 * {@link java.util.stream.LongStream}, and {@link java.util.stream.DoubleStream}
46 * are streams over objects and the primitive {@code int}, {@code long} and
47 * {@code double} types.  Streams differ from collections in several ways:
48 *
49 * <ul>
50 *     <li>No storage.  A stream is not a data structure that stores elements;
51 *     instead, it conveys elements from a source such as a data structure,
52 *     an array, a generator function, or an I/O channel, through a pipeline of
53 *     computational operations.</li>
54 *     <li>Functional in nature.  An operation on a stream produces a result,
55 *     but does not modify its source.  For example, filtering a {@code Stream}
56 *     obtained from a collection produces a new {@code Stream} without the
57 *     filtered elements, rather than removing elements from the source
58 *     collection.</li>
59 *     <li>Laziness-seeking.  Many stream operations, such as filtering, mapping,
60 *     or duplicate removal, can be implemented lazily, exposing opportunities
61 *     for optimization.  For example, "find the first {@code String} with
62 *     three consecutive vowels" need not examine all the input strings.
63 *     Stream operations are divided into intermediate ({@code Stream}-producing)
64 *     operations and terminal (value- or side-effect-producing) operations.
65 *     Intermediate operations are always lazy.</li>
66 *     <li>Possibly unbounded.  While collections have a finite size, streams
67 *     need not.  Short-circuiting operations such as {@code limit(n)} or
68 *     {@code findFirst()} can allow computations on infinite streams to
69 *     complete in finite time.</li>
70 *     <li>Consumable. The elements of a stream are only visited once during
71 *     the life of a stream. Like an {@link java.util.Iterator}, a new stream
72 *     must be generated to revisit the same elements of the source.
73 *     </li>
74 * </ul>
75 *
76 * Streams can be obtained in a number of ways. Some examples include:
77 * <ul>
78 *     <li>From a {@link java.util.Collection} via the {@code stream()} and
79 *     {@code parallelStream()} methods;</li>
80 *     <li>From an array via {@link java.util.Arrays#stream(Object[])};</li>
81 *     <li>From static factory methods on the stream classes, such as
82 *     {@link java.util.stream.Stream#of(Object[])},
83 *     {@link java.util.stream.IntStream#range(int, int)}
84 *     or {@link java.util.stream.Stream#iterate(Object, UnaryOperator)};</li>
85 *     </li>
86 * </ul>
87 *
88 * <p>Additional stream sources can be provided by third-party libraries using
89 * <a href="package-summary.html#StreamSources">these techniques</a>.
90 *
91 * <h2><a name="StreamOps">Stream operations and pipelines</a></h2>
92 *
93 * <p>Stream operations are divided into <em>intermediate</em> and
94 * <em>terminal</em> operations, and are combined to form <em>stream
95 * pipelines</em>.  A stream pipeline consists of a source (such as a
96 * {@code Collection}, an array, a generator function, or an I/O channel);
97 * followed by zero or more intermediate operations such as
98 * {@code Stream.filter} or {@code Stream.map}; and a terminal operation such
99 * as {@code Stream.forEach} or {@code Stream.reduce}.
100 *
101 * <p>Intermediate operations return a new stream.  They are always
102 * <em>lazy</em>; executing an intermediate operation such as
103 * {@code filter()} does not actually perform any filtering, but instead
104 * creates a new stream that, when traversed, contains the elements of
105 * the initial stream that match the given predicate.  Traversal
106 * of the pipeline source does not begin until the terminal operation of the
107 * pipeline is executed.
108 *
109 * <p>Terminal operations, such as {@code Stream.forEach} or
110 * {@code IntStream.sum}, may traverse the stream to produce a result or a
111 * side-effect. After the terminal operation is performed, the stream pipeline
112 * is considered consumed, and can no longer be used; if you need to traverse
113 * the same data source again, you must return to the data source to get a new
114 * stream.  In almost all cases, terminal operations are <em>eager</em>,
115 * completing their traversal of the data source and processing of the pipeline
116 * before returning.  Only the terminal operations {@code iterator()} and
117 * {@code spliterator()} are not; these are provided as an "escape hatch" to enable
118 * arbitrary client-controlled pipeline traversals in the event that the
119 * existing operations are not sufficient to the task.
120 *
121 * <p> Processing streams lazily allows for significant efficiencies; in a
122 * pipeline such as the filter-map-sum example above, filtering, mapping, and
123 * summing can be fused into a single pass on the data, with minimal
124 * intermediate state. Laziness also allows avoiding examining all the data
125 * when it is not necessary; for operations such as "find the first string
126 * longer than 1000 characters", it is only necessary to examine just enough
127 * strings to find one that has the desired characteristics without examining
128 * all of the strings available from the source. (This behavior becomes even
129 * more important when the input stream is infinite and not merely large.)
130 *
131 * <p>Intermediate operations are further divided into <em>stateless</em>
132 * and <em>stateful</em> operations. Stateless operations, such as {@code filter}
133 * and {@code map}, retain no state from previously seen element when processing
134 * a new element -- each element can be processed
135 * independently of operations on other elements.  Stateful operations, such as
136 * {@code distinct} and {@code sorted}, may incorporate state from previously
137 * seen elements when processing new elements.
138 *
139 * <p>Stateful operations may need to process the entire input
140 * before producing a result.  For example, one cannot produce any results from
141 * sorting a stream until one has seen all elements of the stream.  As a result,
142 * under parallel computation, some pipelines containing stateful intermediate
143 * operations may require multiple passes on the data or may need to buffer
144 * significant data.  Pipelines containing exclusively stateless intermediate
145 * operations can be processed in a single pass, whether sequential or parallel,
146 * with minimal data buffering.
147 *
148 * <p>Further, some operations are deemed <em>short-circuiting</em> operations.
149 * An intermediate operation is short-circuiting if, when presented with
150 * infinite input, it may produce a finite stream as a result.  A terminal
151 * operation is short-circuiting if, when presented with infinite input, it may
152 * terminate in finite time.  Having a short-circuiting operation in the pipeline
153 * is a necessary, but not sufficient, condition for the processing of an infinite
154 * stream to terminate normally in finite time.
155 *
156 * <h3>Parallelism</h3>
157 *
158 * <p>Processing elements with an explicit {@code for-}loop is inherently serial.
159 * Streams facilitate parallel execution by reframing the computation as a pipeline of
160 * aggregate operations, rather than as imperative operations on each individual
161 * element.  All streams operations can execute either in serial or in parallel.
162 * The stream implementations in the JDK create serial streams unless parallelism is
163 * explicitly requested.  For example, {@code Collection} has methods
164 * {@link java.util.Collection#stream} and {@link java.util.Collection#parallelStream},
165 * which produce sequential and parallel streams respectively; other
166 * stream-bearing methods such as {@link java.util.stream.IntStream#range(int, int)}
167 * produce sequential streams but these streams can be efficiently parallelized by
168 * invoking their {@link java.util.stream.BaseStream#parallel()} method.
169 * To execute the prior "sum of weights of widgets" query in parallel, we would
170 * do:
171 *
172 * <pre>{@code
173 *     int sumOfWeights = widgets.}<code><b>parallelStream()</b></code>{@code
174 *                               .filter(b -> b.getColor() == RED)
175 *                               .mapToInt(b -> b.getWeight())
176 *                               .sum();
177 * }</pre>
178 *
179 * <p>The only difference between the serial and parallel versions of this
180 * example is the creation of the initial stream, using "{@code parallelStream()}"
181 * instead of "{@code stream()}".  When the terminal operation is initiated,
182 * the stream pipeline is executed sequentially or in parallel depending on the
183 * orientation of the stream on which it is invoked.  Whether a stream will execute in serial or
184 * parallel can be determined with the {@code isParallel()} method, and the
185 * orientation of a stream can be modified with the
186 * {@link java.util.stream.BaseStream#sequential()} and
187 * {@link java.util.stream.BaseStream#parallel()} operations.  When the terminal
188 * operation is initiated, the stream pipeline is executed sequentially or in
189 * parallel depending on the mode of the stream on which it is invoked.
190 *
191 * <p>Except for operations identified as explicitly nondeterministic, such
192 * as {@code findAny()}, whether a stream executes sequentially or in parallel
193 * should not change the result of the computation.
194 *
195 * <p>Most stream operations accept parameters that describe user-specified
196 * behavior, which are often lambda expressions.  To preserve correct behavior,
197 * these <em>behavioral parameters</em> must be <em>non-interfering</em>, and in
198 * most cases must be <em>stateless</em>.  Such parameters are always instances
199 * of a <a href="../function/package-summary.html">functional interface</a> such
200 * as {@link java.util.function.Function}, and are often lambda expressions or
201 * method references.
202 *
203 * <h3><a name="NonInterference">Non-interference</a></h3>
204 *
205 * Streams enable you to execute possibly-parallel aggregate operations over a
206 * variety of data sources, including even non-thread-safe collections such as
207 * {@code ArrayList}. This is possible only if we can prevent
208 * <em>interference</em> with the data source during the execution of a stream
209 * pipeline.  Except for the escape-hatch operations {@code iterator()} and
210 * {@code spliterator()}, execution begins when the terminal operation is
211 * invoked, and ends when the terminal operation completes.  For most data
212 * sources, preventing interference means ensuring that the data source is
213 * <em>not modified at all</em> during the execution of the stream pipeline.
214 * The notable exception to this are streams whose sources are concurrent
215 * collections, which are specifically designed to handle concurrent modification.
216 * Concurrent stream sources are those whose {@code Spliterator} reports the
217 * {@code CONCURRENT} characteristic.
218 *
219 * <p>Accordingly, behavioral parameters in stream pipelines whose source might
220 * not be concurrent should never modify the stream's data source.
221 * A behavioral parameter is said to <em>interfere</em> with a non-concurrent
222 * data source if it modifies, or causes to be
223 * modified, the stream's data source.  The need for non-interference applies
224 * to all pipelines, not just parallel ones.  Unless the stream source is
225 * concurrent, modifying a stream's data source during execution of a stream
226 * pipeline can cause exceptions, incorrect answers, or nonconformant behavior.
227 *
228 * For well-behaved stream sources, the source can be modified before the
229 * terminal operation commences and those modifications will be reflected in
230 * the covered elements.  For example, consider the following code:
231 *
232 * <pre>{@code
233 *     List<String> l = new ArrayList(Arrays.asList("one", "two"));
234 *     Stream<String> sl = l.stream();
235 *     l.add("three");
236 *     String s = sl.collect(joining(" "));
237 * }</pre>
238 *
239 * First a list is created consisting of two strings: "one"; and "two". Then a
240 * stream is created from that list. Next the list is modified by adding a third
241 * string: "three". Finally the elements of the stream are collected and joined
242 * together. Since the list was modified before the terminal {@code collect}
243 * operation commenced the result will be a string of "one two three". All the
244 * streams returned from JDK collections, and most other JDK classes,
245 * are well-behaved in this manner; for streams generated by other libraries, see
246 * <a href="package-summary.html#StreamSources">Low-level stream
247 * construction</a> for requirements for building well-behaved streams.
248 *
249 * <h3><a name="Statelessness">Stateless behaviors</a></h3>
250 *
251 * Stream pipeline results may be nondeterministic or incorrect if the behavioral
252 * parameters to the stream operations are <em>stateful</em>.  A stateful lambda
253 * (or other object implementing the appropriate functional interface) is one
254 * whose result depends on any state which might change during the execution
255 * of the stream pipeline.  An example of a stateful lambda is the parameter
256 * to {@code map()} in:
257 *
258 * <pre>{@code
259 *     Set<Integer> seen = Collections.synchronizedSet(new HashSet<>());
260 *     stream.parallel().map(e -> { if (seen.add(e)) return 0; else return e; })...
261 * }</pre>
262 *
263 * Here, if the mapping operation is performed in parallel, the results for the
264 * same input could vary from run to run, due to thread scheduling differences,
265 * whereas, with a stateless lambda expression the results would always be the
266 * same.
267 *
268 * <p>Note also that attempting to access mutable state from behavioral parameters
269 * presents you with a bad choice with respect to safety and performance; if
270 * you do not synchronize access to that state, you have a data race and
271 * therefore your code is broken, but if you do synchronize access to that
272 * state, you risk having contention undermine the parallelism you are seeking
273 * to benefit from.  The best approach is to avoid stateful behavioral
274 * parameters to stream operations entirely; there is usually a way to
275 * restructure the stream pipeline to avoid statefulness.
276 *
277 * <h3>Side-effects</h3>
278 *
279 * Side-effects in behavioral parameters to stream operations are, in general,
280 * discouraged, as they can often lead to unwitting violations of the
281 * statelessness requirement, as well as other thread-safety hazards.
282 *
283 * <p>If the behavioral parameters do have side-effects, unless explicitly
284 * stated, there are no guarantees as to the
285 * <a href="../concurrent/package-summary.html#MemoryVisibility"><i>visibility</i></a>
286 * of those side-effects to other threads, nor are there any guarantees that
287 * different operations on the "same" element within the same stream pipeline
288 * are executed in the same thread.  Further, the ordering of those effects
289 * may be surprising.  Even when a pipeline is constrained to produce a
290 * <em>result</em> that is consistent with the encounter order of the stream
291 * source (for example, {@code IntStream.range(0,5).parallel().map(x -> x*2).toArray()}
292 * must produce {@code [0, 2, 4, 6, 8]}), no guarantees are made as to the order
293 * in which the mapper function is applied to individual elements, or in what
294 * thread any behavioral parameter is executed for a given element.
295 *
296 * <p>Many computations where one might be tempted to use side effects can be more
297 * safely and efficiently expressed without side-effects, such as using
298 * <a href="package-summary.html#Reduction">reduction</a> instead of mutable
299 * accumulators. However, side-effects such as using {@code println()} for debugging
300 * purposes are usually harmless.  A small number of stream operations, such as
301 * {@code forEach()} and {@code peek()}, can operate only via side-effects;
302 * these should be used with care.
303 *
304 * <p>As an example of how to transform a stream pipeline that inappropriately
305 * uses side-effects to one that does not, the following code searches a stream
306 * of strings for those matching a given regular expression, and puts the
307 * matches in a list.
308 *
309 * <pre>{@code
310 *     ArrayList<String> results = new ArrayList<>();
311 *     stream.filter(s -> pattern.matcher(s).matches())
312 *           .forEach(s -> results.add(s));  // Unnecessary use of side-effects!
313 * }</pre>
314 *
315 * This code unnecessarily uses side-effects.  If executed in parallel, the
316 * non-thread-safety of {@code ArrayList} would cause incorrect results, and
317 * adding needed synchronization would cause contention, undermining the
318 * benefit of parallelism.  Furthermore, using side-effects here is completely
319 * unnecessary; the {@code forEach()} can simply be replaced with a reduction
320 * operation that is safer, more efficient, and more amenable to
321 * parallelization:
322 *
323 * <pre>{@code
324 *     List<String>results =
325 *         stream.filter(s -> pattern.matcher(s).matches())
326 *               .collect(Collectors.toList());  // No side-effects!
327 * }</pre>
328 *
329 * <h3><a name="Ordering">Ordering</a></h3>
330 *
331 * <p>Streams may or may not have a defined <em>encounter order</em>.  Whether
332 * or not a stream has an encounter order depends on the source and the
333 * intermediate operations.  Certain stream sources (such as {@code List} or
334 * arrays) are intrinsically ordered, whereas others (such as {@code HashSet})
335 * are not.  Some intermediate operations, such as {@code sorted()}, may impose
336 * an encounter order on an otherwise unordered stream, and others may render an
337 * ordered stream unordered, such as {@link java.util.stream.BaseStream#unordered()}.
338 * Further, some terminal operations may ignore encounter order, such as
339 * {@code forEach()}.
340 *
341 * <p>If a stream is ordered, most operations are constrained to operate on the
342 * elements in their encounter order; if the source of a stream is a {@code List}
343 * containing {@code [1, 2, 3]}, then the result of executing {@code map(x -> x*2)}
344 * must be {@code [2, 4, 6]}.  However, if the source has no defined encounter
345 * order, then any permutation of the values {@code [2, 4, 6]} would be a valid
346 * result.
347 *
348 * <p>For sequential streams, the presence or absence of an encounter order does
349 * not affect performance, only determinism.  If a stream is ordered, repeated
350 * execution of identical stream pipelines on an identical source will produce
351 * an identical result; if it is not ordered, repeated execution might produce
352 * different results.
353 *
354 * <p>For parallel streams, relaxing the ordering constraint can sometimes enable
355 * more efficient execution.  Certain aggregate operations,
356 * such as filtering duplicates ({@code distinct()}) or grouped reductions
357 * ({@code Collectors.groupingBy()}) can be implemented more efficiently if ordering of elements
358 * is not relevant.  Similarly, operations that are intrinsically tied to encounter order,
359 * such as {@code limit()}, may require
360 * buffering to ensure proper ordering, undermining the benefit of parallelism.
361 * In cases where the stream has an encounter order, but the user does not
362 * particularly <em>care</em> about that encounter order, explicitly de-ordering
363 * the stream with {@link java.util.stream.BaseStream#unordered() unordered()} may
364 * improve parallel performance for some stateful or terminal operations.
365 * However, most stream pipelines, such as the "sum of weight of blocks" example
366 * above, still parallelize efficiently even under ordering constraints.
367 *
368 * <h2><a name="Reduction">Reduction operations</a></h2>
369 *
370 * A <em>reduction</em> operation (also called a <em>fold</em>) takes a sequence
371 * of input elements and combines them into a single summary result by repeated
372 * application of a combining operation, such as finding the sum or maximum of
373 * a set of numbers, or accumulating elements into a list.  The streams classes have
374 * multiple forms of general reduction operations, called
375 * {@link java.util.stream.Stream#reduce(java.util.function.BinaryOperator) reduce()}
376 * and {@link java.util.stream.Stream#collect(java.util.stream.Collector) collect()},
377 * as well as multiple specialized reduction forms such as
378 * {@link java.util.stream.IntStream#sum() sum()}, {@link java.util.stream.IntStream#max() max()},
379 * or {@link java.util.stream.IntStream#count() count()}.
380 *
381 * <p>Of course, such operations can be readily implemented as simple sequential
382 * loops, as in:
383 * <pre>{@code
384 *    int sum = 0;
385 *    for (int x : numbers) {
386 *       sum += x;
387 *    }
388 * }</pre>
389 * However, there are good reasons to prefer a reduce operation
390 * over a mutative accumulation such as the above.  Not only is a reduction
391 * "more abstract" -- it operates on the stream as a whole rather than individual
392 * elements -- but a properly constructed reduce operation is inherently
393 * parallelizable, so long as the function(s) used to process the elements
394 * are <a href="package-summary.html#Associativity">associative</a> and
395 * <a href="package-summary.html#NonInterfering">stateless</a>.
396 * For example, given a stream of numbers for which we want to find the sum, we
397 * can write:
398 * <pre>{@code
399 *    int sum = numbers.stream().reduce(0, (x,y) -> x+y);
400 * }</pre>
401 * or:
402 * <pre>{@code
403 *    int sum = numbers.stream().reduce(0, Integer::sum);
404 * }</pre>
405 *
406 * <p>These reduction operations can run safely in parallel with almost no
407 * modification:
408 * <pre>{@code
409 *    int sum = numbers.parallelStream().reduce(0, Integer::sum);
410 * }</pre>
411 *
412 * <p>Reduction parallellizes well because the implementation
413 * can operate on subsets of the data in parallel, and then combine the
414 * intermediate results to get the final correct answer.  (Even if the language
415 * had a "parallel for-each" construct, the mutative accumulation approach would
416 * still required the developer to provide
417 * thread-safe updates to the shared accumulating variable {@code sum}, and
418 * the required synchronization would then likely eliminate any performance gain from
419 * parallelism.)  Using {@code reduce()} instead removes all of the
420 * burden of parallelizing the reduction operation, and the library can provide
421 * an efficient parallel implementation with no additional synchronization
422 * required.
423 *
424 * <p>The "widgets" examples shown earlier shows how reduction combines with
425 * other operations to replace for loops with bulk operations.  If {@code widgets}
426 * is a collection of {@code Widget} objects, which have a {@code getWeight} method,
427 * we can find the heaviest widget with:
428 * <pre>{@code
429 *     OptionalInt heaviest = widgets.parallelStream()
430 *                                   .mapToInt(Widget::getWeight)
431 *                                   .max();
432 * }</pre>
433 *
434 * <p>In its more general form, a {@code reduce} operation on elements of type
435 * {@code <T>} yielding a result of type {@code <U>} requires three parameters:
436 * <pre>{@code
437 * <U> U reduce(U identity,
438 *              BiFunction<U, ? super T, U> accumulator,
439 *              BinaryOperator<U> combiner);
440 * }</pre>
441 * Here, the <em>identity</em> element is both an initial seed value for the reduction
442 * and a default result if there are no input elements. The <em>accumulator</em>
443 * function takes a partial result and the next element, and produces a new
444 * partial result. The <em>combiner</em> function combines two partial results
445 * to produce a new partial result.  (The combiner is necessary in parallel
446 * reductions, where the input is partitioned, a partial accumulation computed
447 * for each partition, and then the partial results are combined to produce a
448 * final result.)
449 *
450 * <p>More formally, the {@code identity} value must be an <em>identity</em> for
451 * the combiner function. This means that for all {@code u},
452 * {@code combiner.apply(identity, u)} is equal to {@code u}. Additionally, the
453 * {@code combiner} function must be <a href="package-summary.html#Associativity">associative</a> and
454 * must be compatible with the {@code accumulator} function: for all {@code u}
455 * and {@code t}, {@code combiner.apply(u, accumulator.apply(identity, t))} must
456 * be {@code equals()} to {@code accumulator.apply(u, t)}.
457 *
458 * <p>The three-argument form is a generalization of the two-argument form,
459 * incorporating a mapping step into the accumulation step.  We could
460 * re-cast the simple sum-of-weights example using the more general form as
461 * follows:
462 * <pre>{@code
463 *     int sumOfWeights = widgets.stream()
464 *                               .reduce(0,
465 *                                       (sum, b) -> sum + b.getWeight())
466 *                                       Integer::sum);
467 * }</pre>
468 * though the explicit map-reduce form is more readable and therefore should
469 * usually be preferred. The generalized form is provided for cases where
470 * significant work can be optimized away by combining mapping and reducing
471 * into a single function.
472 *
473 * <h3><a name="MutableReduction">Mutable reduction</a></h3>
474 *
475 * A <em>mutable reduction operation</em> accumulates input elements into a
476 * mutable result container, such as a {@code Collection} or {@code StringBuilder},
477 * as it processes the elements in the stream.
478 *
479 * <p>If we wanted to take a stream of strings and concatenate them into a
480 * single long string, we <em>could</em> achieve this with ordinary reduction:
481 * <pre>{@code
482 *     String concatenated = strings.reduce("", String::concat)
483 * }</pre>
484 *
485 * <p>We would get the desired result, and it would even work in parallel.  However,
486 * we might not be happy about the performance!  Such an implementation would do
487 * a great deal of string copying, and the run time would be <em>O(n^2)</em> in
488 * the number of characters.  A more performant approach would be to accumulate
489 * the results into a {@link java.lang.StringBuilder}, which is a mutable
490 * container for accumulating strings.  We can use the same technique to
491 * parallelize mutable reduction as we do with ordinary reduction.
492 *
493 * <p>The mutable reduction operation is called
494 * {@link java.util.stream.Stream#collect(Collector) collect()},
495 * as it collects together the desired results into a result container such
496 * as a {@code Collection}.
497 * A {@code collect} operation requires three functions:
498 * a supplier function to construct new instances of the result container, an
499 * accumulator function to incorporate an input element into a result
500 * container, and a combining function to merge the contents of one result
501 * container into another.  The form of this is very similar to the general
502 * form of ordinary reduction:
503 * <pre>{@code
504 * <R> R collect(Supplier<R> supplier,
505 *               BiConsumer<R, ? super T> accumulator,
506 *               BiConsumer<R, R> combiner);
507 * }</pre>
508 * <p>As with {@code reduce()}, a benefit of expressing {@code collect} in this
509 * abstract way is that it is directly amenable to parallelization: we can
510 * accumulate partial results in parallel and then combine them, so long as the
511 * accumulation and combining functions satisfy the appropriate requirements.
512 * For example, to collect the String representations of the elements in a
513 * stream into an {@code ArrayList}, we could write the obvious sequential
514 * for-each form:
515 * <pre>{@code
516 *     ArrayList<String> strings = new ArrayList<>();
517 *     for (T element : stream) {
518 *         strings.add(element.toString());
519 *     }
520 * }</pre>
521 * Or we could use a parallelizable collect form:
522 * <pre>{@code
523 *     ArrayList<String> strings = stream.collect(() -> new ArrayList<>(),
524 *                                                (c, e) -> c.add(e.toString()),
525 *                                                (c1, c2) -> c1.addAll(c2));
526 * }</pre>
527 * or, pulling the mapping operation out of the accumulator function, we could
528 * express it more succinctly as:
529 * <pre>{@code
530 *     List<String> strings = stream.map(Object::toString)
531 *                                  .collect(ArrayList::new, ArrayList::add, ArrayList::addAll);
532 * }</pre>
533 * Here, our supplier is just the {@link java.util.ArrayList#ArrayList()
534 * ArrayList constructor}, the accumulator adds the stringified element to an
535 * {@code ArrayList}, and the combiner simply uses {@link java.util.ArrayList#addAll addAll}
536 * to copy the strings from one container into the other.
537 *
538 * <p>The three aspects of {@code collect} -- supplier, accumulator, and
539 * combiner -- are tightly coupled.  We can use the abstraction of a
540 * {@link java.util.stream.Collector} to capture all three aspects.  The
541 * above example for collecting strings into a {@code List} can be rewritten
542 * using a standard {@code Collector} as:
543 * <pre>{@code
544 *     List<String> strings = stream.map(Object::toString)
545 *                                  .collect(Collectors.toList());
546 * }</pre>
547 *
548 * <p>Packaging mutable reductions into a Collector has another advantage:
549 * composability.  The class {@link java.util.stream.Collectors} contains a
550 * number of predefined factories for collectors, including combinators
551 * that transform one collector into another.  For example, suppose we have a
552 * collector that computes the sum of the salaries of a stream of
553 * employees, as follows:
554 *
555 * <pre>{@code
556 *     Collector<Employee, ?, Integer> summingSalaries
557 *         = Collectors.summingInt(Employee::getSalary);
558 * }</pre>
559 *
560 * (The {@code ?} for the second type parameter merely indicates that we don't
561 * care about the intermediate representation used by this collector.)
562 * If we wanted to create a collector to tabulate the sum of salaries by
563 * department, we could reuse {@code summingSalaries} using
564 * {@link java.util.stream.Collectors#groupingBy(java.util.function.Function, java.util.stream.Collector) groupingBy}:
565 *
566 * <pre>{@code
567 *     Map<Department, Integer> salariesByDept
568 *         = employees.stream().collect(Collectors.groupingBy(Employee::getDepartment,
569 *                                                            summingSalaries));
570 * }</pre>
571 *
572 * <p>As with the regular reduction operation, {@code collect()} operations can
573 * only be parallelized if appropriate conditions are met.  For any partially
574 * accumulated result, combining it with an empty result container must
575 * produce an equivalent result.  That is, for a partially accumulated result
576 * {@code p} that is the result of any series of accumulator and combiner
577 * invocations, {@code p} must be equivalent to
578 * {@code combiner.apply(p, supplier.get())}.
579 *
580 * <p>Further, however the computation is split, it must produce an equivalent
581 * result.  For any input elements {@code t1} and {@code t2}, the results
582 * {@code r1} and {@code r2} in the computation below must be equivalent:
583 * <pre>{@code
584 *     A a1 = supplier.get();
585 *     accumulator.accept(a1, t1);
586 *     accumulator.accept(a1, t2);
587 *     R r1 = finisher.apply(a1);  // result without splitting
588 *
589 *     A a2 = supplier.get();
590 *     accumulator.accept(a2, t1);
591 *     A a3 = supplier.get();
592 *     accumulator.accept(a3, t2);
593 *     R r2 = finisher.apply(combiner.apply(a2, a3));  // result with splitting
594 * }</pre>
595 *
596 * <p>Here, equivalence generally means according to {@link java.lang.Object#equals(Object)}.
597 * but in some cases equivalence may be relaxed to account for differences in
598 * order.
599 *
600 * <h3><a name="ConcurrentReduction">Reduction, concurrency, and ordering</a></h3>
601 *
602 * With some complex reduction operations, for example a {@code collect()} that
603 * produces a {@code Map}, such as:
604 * <pre>{@code
605 *     Map<Buyer, List<Transaction>> salesByBuyer
606 *         = txns.parallelStream()
607 *               .collect(Collectors.groupingBy(Transaction::getBuyer));
608 * }</pre>
609 * it may actually be counterproductive to perform the operation in parallel.
610 * This is because the combining step (merging one {@code Map} into another by
611 * key) can be expensive for some {@code Map} implementations.
612 *
613 * <p>Suppose, however, that the result container used in this reduction
614 * was a concurrently modifiable collection -- such as a
615 * {@link java.util.concurrent.ConcurrentHashMap}. In that case, the parallel
616 * invocations of the accumulator could actually deposit their results
617 * concurrently into the same shared result container, eliminating the need for
618 * the combiner to merge distinct result containers. This potentially provides
619 * a boost to the parallel execution performance. We call this a
620 * <em>concurrent</em> reduction.
621 *
622 * <p>A {@link java.util.stream.Collector} that supports concurrent reduction is
623 * marked with the {@link java.util.stream.Collector.Characteristics#CONCURRENT}
624 * characteristic.  However, a concurrent collection also has a downside.  If
625 * multiple threads are depositing results concurrently into a shared container,
626 * the order in which results are deposited is non-deterministic. Consequently,
627 * a concurrent reduction is only possible if ordering is not important for the
628 * stream being processed. The {@link java.util.stream.Stream#collect(Collector)}
629 * implementation will only perform a concurrent reduction if
630 * <ul>
631 * <li>The stream is parallel;</li>
632 * <li>The collector has the
633 * {@link java.util.stream.Collector.Characteristics#CONCURRENT} characteristic,
634 * and;</li>
635 * <li>Either the stream is unordered, or the collector has the
636 * {@link java.util.stream.Collector.Characteristics#UNORDERED} characteristic.
637 * </ul>
638 * You can ensure the stream is unordered by using the
639 * {@link java.util.stream.BaseStream#unordered()} method.  For example:
640 * <pre>{@code
641 *     Map<Buyer, List<Transaction>> salesByBuyer
642 *         = txns.parallelStream()
643 *               .unordered()
644 *               .collect(groupingByConcurrent(Transaction::getBuyer));
645 * }</pre>
646 * (where {@link java.util.stream.Collectors#groupingByConcurrent} is the
647 * concurrent equivalent of {@code groupingBy}).
648 *
649 * <p>Note that if it is important that the elements for a given key appear in
650 * the order they appear in the source, then we cannot use a concurrent
651 * reduction, as ordering is one of the casualties of concurrent insertion.
652 * We would then be constrained to implement either a sequential reduction or
653 * a merge-based parallel reduction.
654 *
655 * <h3><a name="Associativity">Associativity</a></h3>
656 *
657 * An operator or function {@code op} is <em>associative</em> if the following
658 * holds:
659 * <pre>{@code
660 *     (a op b) op c == a op (b op c)
661 * }</pre>
662 * The importance of this to parallel evaluation can be seen if we expand this
663 * to four terms:
664 * <pre>{@code
665 *     a op b op c op d == (a op b) op (c op d)
666 * }</pre>
667 * So we can evaluate {@code (a op b)} in parallel with {@code (c op d)}, and
668 * then invoke {@code op} on the results.
669 *
670 * <p>Examples of associative operations include numeric addition, min, and
671 * max, and string concatenation.
672 *
673 * <h2><a name="StreamSources">Low-level stream construction</a></h2>
674 *
675 * So far, all the stream examples have used methods like
676 * {@link java.util.Collection#stream()} or {@link java.util.Arrays#stream(Object[])}
677 * to obtain a stream.  How are those stream-bearing methods implemented?
678 *
679 * <p>The class {@link java.util.stream.StreamSupport} has a number of
680 * low-level methods for creating a stream, all using some form of a
681 * {@link java.util.Spliterator}. A spliterator is the parallel analogue of an
682 * {@link java.util.Iterator}; it describes a (possibly infinite) collection of
683 * elements, with support for sequentially advancing, bulk traversal, and
684 * splitting off some portion of the input into another spliterator which can
685 * be processed in parallel.  At the lowest level, all streams are driven by a
686 * spliterator.
687 *
688 * <p>There are a number of implementation choices in implementing a
689 * spliterator, nearly all of which are tradeoffs between simplicity of
690 * implementation and runtime performance of streams using that spliterator.
691 * The simplest, but least performant, way to create a spliterator is to
692 * create one from an iterator using
693 * {@link java.util.Spliterators#spliteratorUnknownSize(java.util.Iterator, int)}.
694 * While such a spliterator will work, it will likely offer poor parallel
695 * performance, since we have lost sizing information (how big is the
696 * underlying data set), as well as being constrained to a simplistic
697 * splitting algorithm.
698 *
699 * <p>A higher-quality spliterator will provide balanced and known-size
700 * splits, accurate sizing information, and a number of other
701 * {@link java.util.Spliterator#characteristics() characteristics} of the
702 * spliterator or data that can be used by implementations to optimize
703 * execution.
704 *
705 * <p>Spliterators for mutable data sources have an additional challenge;
706 * timing of binding to the data, since the data could change between the time
707 * the spliterator is created and the time the stream pipeline is executed.
708 * Ideally, a spliterator for a stream would report a characteristic of
709
710 * {@code IMMUTABLE} or {@code CONCURRENT}; if not it should be
711 * <a href="../Spliterator.html#binding"><em>late-binding</em></a>. If a source
712 * cannot directly supply a recommended spliterator, it may indirectly supply
713 * a spliterator using a {@code Supplier}, and construct a stream via the
714 * {@code Supplier}-accepting versions of
715 * {@link java.util.stream.StreamSupport#stream(Supplier, int, boolean) stream()}.
716 * The spliterator is obtained from the supplier only after the terminal
717 * operation of the stream pipeline commences.
718 *
719 * <p>These requirements significantly reduce the scope of potential
720 * interference between mutations of the stream source and execution of stream
721 * pipelines. Streams based on spliterators with the desired characteristics,
722 * or those using the Supplier-based factory forms, are immune to
723 * modifications of the data source prior to commencement of the terminal
724 * operation (provided the behavioral parameters to the stream operations meet
725 * the required criteria for non-interference and statelessness).  See
726 * <a href="package-summary.html#NonInterference">Non-Interference</a>
727 * for more details.
728 *
729 * @since 1.8
730 */
731package java.util.stream;
732
733import java.util.function.BinaryOperator;
734import java.util.function.UnaryOperator;
735