-1

Suppose you want to implement an algorithm that works in the following way:

You read in from a file that contains values of the form:

Mark Buy  20 1 100
Bob  Sell 20 2 90

Where the input takes the form:

<name><buy or sell><quantity><time><company><buy maximum or sell minimum>

What's the fastest way to match buyers and sellers (for some company, where buyers and sellers are matched only if the person with the highest buy for that company is greater than the person with the lowest sell for that company). The buy or sell that is top-most will be the one that determines what price to use.

So in the example given we'd have "Mark, at time 1, bought 20 of Google for $100 from Bob, at time 2."

How can we optimize this algorithm for speed? Would reading in the entire file first be an optimal solution?

4

1 回答 1

1

What you need is two priority queues per commodity: one for active buy bids (prioritized on max-price), and one for active sell bids (prioritized on min-price), plus an overall queue for bid creation/expiry events (prioritized on time). (If your bids are in a batch file as described, rather than a causal/online sequence, you can just sort the creation/expiry events, but you still need the buy and sell queues)

Using priority queues is the crux; everything after that is plumbing:

foreach bid creation/expiry event, in chronological order:
  if the event is an expiry:
    delete the bid from the appropriate queue
  else, the event is a creation:
    add the bid to the appropriate queue
    repeat until no further transaction can be performed:
      find max-active-buy and min-active-sell bids for the given commodity
      if they match:
        execute (and record) the transaction
        update partially fulfilled bids, and remove completely fulfilled ones

When performing as a batch operation, you could simplify things a bit by sorting out each individual commodity, and executing each one separately. However, this will not work if the markets interact in any way (such as checking for sufficient account balance).


Priority queue operations can have asymptotic performance of O(log N) in the number of items. There are many fast, practical priority queue datastructures available which achieve this asymptotic limit.

Since you are evaluating an entire file as a batch, you may want to look into priority queues with amortized performance guarantees -- but if you expect to use your code in a real-time setting, you should probably stick to priority queues with strict per-query guarantees.

于 2013-03-29T21:22:30.673 回答