diff -r 000000000000 -r 212417b74b72 rt/emul/compact/src/main/java/java/util/concurrent/Exchanger.java
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/rt/emul/compact/src/main/java/java/util/concurrent/Exchanger.java Sat Mar 19 10:46:31 2016 +0100
@@ -0,0 +1,687 @@
+/*
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This code is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 only, as
+ * published by the Free Software Foundation. Oracle designates this
+ * particular file as subject to the "Classpath" exception as provided
+ * by Oracle in the LICENSE file that accompanied this code.
+ *
+ * This code is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * version 2 for more details (a copy is included in the LICENSE file that
+ * accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License version
+ * 2 along with this work; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
+ * or visit www.oracle.com if you need additional information or have any
+ * questions.
+ */
+
+/*
+ * This file is available under and governed by the GNU General Public
+ * License version 2 only, as published by the Free Software Foundation.
+ * However, the following notice accompanied the original version of this
+ * file:
+ *
+ * Written by Doug Lea, Bill Scherer, and Michael Scott with
+ * assistance from members of JCP JSR-166 Expert Group and released to
+ * the public domain, as explained at
+ * http://creativecommons.org/publicdomain/zero/1.0/
+ */
+
+package java.util.concurrent;
+import java.util.concurrent.atomic.*;
+import java.util.concurrent.locks.LockSupport;
+
+/**
+ * A synchronization point at which threads can pair and swap elements
+ * within pairs. Each thread presents some object on entry to the
+ * {@link #exchange exchange} method, matches with a partner thread,
+ * and receives its partner's object on return. An Exchanger may be
+ * viewed as a bidirectional form of a {@link SynchronousQueue}.
+ * Exchangers may be useful in applications such as genetic algorithms
+ * and pipeline designs.
+ *
+ *
Sample Usage:
+ * Here are the highlights of a class that uses an {@code Exchanger}
+ * to swap buffers between threads so that the thread filling the
+ * buffer gets a freshly emptied one when it needs it, handing off the
+ * filled one to the thread emptying the buffer.
+ *
Memory consistency effects: For each pair of threads that
+ * successfully exchange objects via an {@code Exchanger}, actions
+ * prior to the {@code exchange()} in each thread
+ * happen-before
+ * those subsequent to a return from the corresponding {@code exchange()}
+ * in the other thread.
+ *
+ * @since 1.5
+ * @author Doug Lea and Bill Scherer and Michael Scott
+ * @param The type of objects that may be exchanged
+ */
+public class Exchanger {
+ /*
+ * Algorithm Description:
+ *
+ * The basic idea is to maintain a "slot", which is a reference to
+ * a Node containing both an Item to offer and a "hole" waiting to
+ * get filled in. If an incoming "occupying" thread sees that the
+ * slot is null, it CAS'es (compareAndSets) a Node there and waits
+ * for another to invoke exchange. That second "fulfilling" thread
+ * sees that the slot is non-null, and so CASes it back to null,
+ * also exchanging items by CASing the hole, plus waking up the
+ * occupying thread if it is blocked. In each case CAS'es may
+ * fail because a slot at first appears non-null but is null upon
+ * CAS, or vice-versa. So threads may need to retry these
+ * actions.
+ *
+ * This simple approach works great when there are only a few
+ * threads using an Exchanger, but performance rapidly
+ * deteriorates due to CAS contention on the single slot when
+ * there are lots of threads using an exchanger. So instead we use
+ * an "arena"; basically a kind of hash table with a dynamically
+ * varying number of slots, any one of which can be used by
+ * threads performing an exchange. Incoming threads pick slots
+ * based on a hash of their Thread ids. If an incoming thread
+ * fails to CAS in its chosen slot, it picks an alternative slot
+ * instead. And similarly from there. If a thread successfully
+ * CASes into a slot but no other thread arrives, it tries
+ * another, heading toward the zero slot, which always exists even
+ * if the table shrinks. The particular mechanics controlling this
+ * are as follows:
+ *
+ * Waiting: Slot zero is special in that it is the only slot that
+ * exists when there is no contention. A thread occupying slot
+ * zero will block if no thread fulfills it after a short spin.
+ * In other cases, occupying threads eventually give up and try
+ * another slot. Waiting threads spin for a while (a period that
+ * should be a little less than a typical context-switch time)
+ * before either blocking (if slot zero) or giving up (if other
+ * slots) and restarting. There is no reason for threads to block
+ * unless there are unlikely to be any other threads present.
+ * Occupants are mainly avoiding memory contention so sit there
+ * quietly polling for a shorter period than it would take to
+ * block and then unblock them. Non-slot-zero waits that elapse
+ * because of lack of other threads waste around one extra
+ * context-switch time per try, which is still on average much
+ * faster than alternative approaches.
+ *
+ * Sizing: Usually, using only a few slots suffices to reduce
+ * contention. Especially with small numbers of threads, using
+ * too many slots can lead to just as poor performance as using
+ * too few of them, and there's not much room for error. The
+ * variable "max" maintains the number of slots actually in
+ * use. It is increased when a thread sees too many CAS
+ * failures. (This is analogous to resizing a regular hash table
+ * based on a target load factor, except here, growth steps are
+ * just one-by-one rather than proportional.) Growth requires
+ * contention failures in each of three tried slots. Requiring
+ * multiple failures for expansion copes with the fact that some
+ * failed CASes are not due to contention but instead to simple
+ * races between two threads or thread pre-emptions occurring
+ * between reading and CASing. Also, very transient peak
+ * contention can be much higher than the average sustainable
+ * levels. An attempt to decrease the max limit is usually made
+ * when a non-slot-zero wait elapses without being fulfilled.
+ * Threads experiencing elapsed waits move closer to zero, so
+ * eventually find existing (or future) threads even if the table
+ * has been shrunk due to inactivity. The chosen mechanics and
+ * thresholds for growing and shrinking are intrinsically
+ * entangled with indexing and hashing inside the exchange code,
+ * and can't be nicely abstracted out.
+ *
+ * Hashing: Each thread picks its initial slot to use in accord
+ * with a simple hashcode. The sequence is the same on each
+ * encounter by any given thread, but effectively random across
+ * threads. Using arenas encounters the classic cost vs quality
+ * tradeoffs of all hash tables. Here, we use a one-step FNV-1a
+ * hash code based on the current thread's Thread.getId(), along
+ * with a cheap approximation to a mod operation to select an
+ * index. The downside of optimizing index selection in this way
+ * is that the code is hardwired to use a maximum table size of
+ * 32. But this value more than suffices for known platforms and
+ * applications.
+ *
+ * Probing: On sensed contention of a selected slot, we probe
+ * sequentially through the table, analogously to linear probing
+ * after collision in a hash table. (We move circularly, in
+ * reverse order, to mesh best with table growth and shrinkage
+ * rules.) Except that to minimize the effects of false-alarms
+ * and cache thrashing, we try the first selected slot twice
+ * before moving.
+ *
+ * Padding: Even with contention management, slots are heavily
+ * contended, so use cache-padding to avoid poor memory
+ * performance. Because of this, slots are lazily constructed
+ * only when used, to avoid wasting this space unnecessarily.
+ * While isolation of locations is not much of an issue at first
+ * in an application, as time goes on and garbage-collectors
+ * perform compaction, slots are very likely to be moved adjacent
+ * to each other, which can cause much thrashing of cache lines on
+ * MPs unless padding is employed.
+ *
+ * This is an improvement of the algorithm described in the paper
+ * "A Scalable Elimination-based Exchange Channel" by William
+ * Scherer, Doug Lea, and Michael Scott in Proceedings of SCOOL05
+ * workshop. Available at: http://hdl.handle.net/1802/2104
+ */
+
+ /** The number of CPUs, for sizing and spin control */
+ private static final int NCPU = Runtime.getRuntime().availableProcessors();
+
+ /**
+ * The capacity of the arena. Set to a value that provides more
+ * than enough space to handle contention. On small machines
+ * most slots won't be used, but it is still not wasted because
+ * the extra space provides some machine-level address padding
+ * to minimize interference with heavily CAS'ed Slot locations.
+ * And on very large machines, performance eventually becomes
+ * bounded by memory bandwidth, not numbers of threads/CPUs.
+ * This constant cannot be changed without also modifying
+ * indexing and hashing algorithms.
+ */
+ private static final int CAPACITY = 32;
+
+ /**
+ * The value of "max" that will hold all threads without
+ * contention. When this value is less than CAPACITY, some
+ * otherwise wasted expansion can be avoided.
+ */
+ private static final int FULL =
+ Math.max(0, Math.min(CAPACITY, NCPU / 2) - 1);
+
+ /**
+ * The number of times to spin (doing nothing except polling a
+ * memory location) before blocking or giving up while waiting to
+ * be fulfilled. Should be zero on uniprocessors. On
+ * multiprocessors, this value should be large enough so that two
+ * threads exchanging items as fast as possible block only when
+ * one of them is stalled (due to GC or preemption), but not much
+ * longer, to avoid wasting CPU resources. Seen differently, this
+ * value is a little over half the number of cycles of an average
+ * context switch time on most systems. The value here is
+ * approximately the average of those across a range of tested
+ * systems.
+ */
+ private static final int SPINS = (NCPU == 1) ? 0 : 2000;
+
+ /**
+ * The number of times to spin before blocking in timed waits.
+ * Timed waits spin more slowly because checking the time takes
+ * time. The best value relies mainly on the relative rate of
+ * System.nanoTime vs memory accesses. The value is empirically
+ * derived to work well across a variety of systems.
+ */
+ private static final int TIMED_SPINS = SPINS / 20;
+
+ /**
+ * Sentinel item representing cancellation of a wait due to
+ * interruption, timeout, or elapsed spin-waits. This value is
+ * placed in holes on cancellation, and used as a return value
+ * from waiting methods to indicate failure to set or get hole.
+ */
+ private static final Object CANCEL = new Object();
+
+ /**
+ * Value representing null arguments/returns from public
+ * methods. This disambiguates from internal requirement that
+ * holes start out as null to mean they are not yet set.
+ */
+ private static final Object NULL_ITEM = new Object();
+
+ /**
+ * Nodes hold partially exchanged data. This class
+ * opportunistically subclasses AtomicReference to represent the
+ * hole. So get() returns hole, and compareAndSet CAS'es value
+ * into hole. This class cannot be parameterized as "V" because
+ * of the use of non-V CANCEL sentinels.
+ */
+ private static final class Node extends AtomicReference