要计算n
从源中读取的最大值的平均值,您至少需要存储这些值。由于在结束前的任何给定点,您都不知道n
整体上的一些最大值是否会在以后出现,因此您需要跟踪迄今为止n
看到的最大值。
一种简单的方法是将最大值存储在堆或优先级队列中,因为这样可以轻松添加新值并查找(和删除)最小的存储值。默认值PriorityQueue
非常适合此任务,因为它使用元素的自然顺序,因此poll
ing 会删除最小的存储元素。如果要计算n
最小元素的平均值,则需要将 aPriorityQueue
与自定义一起使用Comparator
(或者在这种特殊情况下,简单地否定所有值并使用自然排序也可以)。
实现所需的懒惰方式(更少的代码)是简单地将每个传入值添加到队列中,如果队列的大小超过n
[那么它必须n+1
]从队列中删除最小的元素:
// vp is the value provider
while(vp.hasNext()) {
// read the next value and add it to the queue
pq.add(vp.nextValue());
if (pq.size() > topSize) {
pq.poll();
}
稍微复杂一点的方法是先检查是否需要添加新的值,只有在需要的时候才修改队列,
double newValue = vp.nextValue();
// Check if we have to put the new value in the queue
// that is the case when the queue is not yet full, or the smallest
// stored value is smaller than the new
if (pq.size() < topSize || pq.peek() < newValue) {
// remove the smallest value from the queue only if it is full
if (pq.size() == topSize()) {
pq.poll();
}
pq.add(newValue);
}
这种方式可能更有效,因为向队列中添加一个值和删除最小的都是O(log size)
操作,而与最小的存储值比较是O(1)
。因此,如果有许多值小于n
之前看到的最大值,则第二种方法可以节省一些工作。
如果性能很关键,请注意 aPriorityQueue
不能存储原始类型,例如double
,因此存储(和检索以进行平均计算)分别涉及装箱(将double
值包装在Double
对象中)。拆箱(double
从对象中提取值Double
),从而间接从队列的底层数组到实际值。这些成本可以通过double[]
自己使用 raw 实现基于堆的优先级队列来避免。(但这应该很少是必要的,通常,装箱和间接的成本只占整个处理的一小部分。)
一个简单的完整工作示例:
import java.util.PriorityQueue;
/**
* Example class to collect the largest values from a stream and compute their
* average.
*/
public class Average {
// number of values we want to save
private int topSize;
// number of values read so far
private long count = 0;
// priority queue to save the largest topSize values
private PriorityQueue<Double> pq;
// source of read values, could be a file reader, a device reader, or whatever
private ValueProvider vp;
/**
* Construct an <code>Average</code> to sample the largest <code>n</code>
* values from the source.
*
* @param tops Number of values to save for averaging.
* @param v Source of the values to sample.
*
* @throws IllegalArgumentException when the specified number of values is less than one.
*/
public Average(int tops, ValueProvider v) throws IllegalArgumentException {
if (tops < 1) {
throw new IllegalArgumentException("Can't get average of fewer than one values.");
}
topSize = tops;
vp = v;
// Initialise queue to needed capacity; topSize + 1, since we first add
// and then poll. Thus no resizing should ever be necessary.
pq = new PriorityQueue<Double>(topSize+1);
}
/**
* Compute the average of the values stored in the <code>PriorityQueue<Double></code>
*
* @param prio The queue to average.
* @return the average of the values stored in the queue.
*/
public static double average(PriorityQueue<Double> prio) throws IllegalArgumentException {
if (prio == null || prio.size() == 0) {
throw new IllegalArgumentException("Priority queue argument is null or empty.");
}
double sum = 0;
for(Double d : prio) {
sum += d;
}
return sum/prio.size();
}
/**
* Reads values from the provider until exhausted, reporting the average
* of the largest <code>topSize</code> values read so far from time to time
* and when the source is exhausted.
*/
public void collectAverage() {
while(vp.hasNext()) {
// read the next value and add it to the queue
pq.add(vp.nextValue());
++count;
// If the queue was already full, we now have
// topSize + 1 values in it, so we remove the smallest.
// That is, conveniently, what the default PriorityQueue<Double>
// gives us. If we wanted for example the smallest, we'd need
// to use a PriorityQueue with a custom Comparator (or negate
// the values).
if (pq.size() > topSize) {
pq.poll();
}
// Occasionally report the running average of the largest topSize
// values read so far. This may not be desired.
if (count % (topSize*25) == 0 || count < 11) {
System.out.printf("Average of top %d values after collecting %d is %f\n",
pq.size(), count, average(pq));
}
}
// Report final average. Returning the average would be a natural choice too.
System.out.printf("Average of top %d values of %d total is %f\n",
pq.size(), count, average(pq));
}
public static void main(String[] args) {
Average a = new Average(100, new SimpleProvider(123456));
a.collectAverage();
}
}
使用界面
/**
* Interface for a source of <code>double</code>s.
*/
public interface ValueProvider {
/**
* Gets the next value from the source.
*
* @return The next value if there is one.
* @throws RuntimeException if the source is exhausted.
*/
public double nextValue() throws RuntimeException;
/**
* Checks whether the source has more values to deliver.
*
* @return whether there is at least one more value to be obtained from the source.
*/
public boolean hasNext();
}
和实现类
/**
* Simple provider of a stream of <code>double</code>s.
*/
public class SimpleProvider implements ValueProvider {
// State determining which value to return next.
private long state = 0;
// Last allowed state.
private final long end;
/**
* Construct a provider of <code>e</code> values.
*
* @param e the number of values to yield.
*/
public SimpleProvider(long e) {
end = e > 0 ? e : 0;
}
/**
* Default constructor to provide 10000 values.
*/
public SimpleProvider() {
this(10000);
}
public double nextValue() {
++state;
return Math.log(state)*Math.sin(state) + Math.cos(state/2.0);
}
public boolean hasNext() {
return state < end;
}
}