I receive orders from FAST and process them. I receive about 2-3 thousands orders per second
Really? You work at an exchange? Becuase seriously, I get data from 5 exchanges, but those aren ot orders ;) I suggest you get your term in line - you get 2-3 thousand EVENTS, but I really doubt you get ORDERS.
Have you ever thought of doing a multi stage processing setup? I.e. you get data in 2 thread, hand it over to another thread to find the instrument (id instead strings), hand it over to another thread to update order book, hand it over to another thread to do indicators, hand irt over to X threads to do strategies?
No need to schedule tasks al lthe time, just synced queues with one tas processing messages on each of them. Can be super fficient with a no-lock approach.
Brutally speaking: I am all for multi threaded, but all in core processing must maintain cardinality, so classical multi threading is out. Why? I need fully repeatable processing, so that unit tests get determined output.
So far I process them asynchronous and I have dedicated thread per instrument
You do not trade a LOT, right? I mean, I track about 200.000 instruments (5 complete exchanges). Allocating 200.000 threads would be - ah - prohibitive ;)
GO staged pipeline - that means that the core loops can be small and you can distribute them to enough cores that you are a lot more scalable. THen properly optimize - for example it is quite common for updates of one instrument to come followed by another update for the SAME instrument (for example multiple executions while a large order executes). Take advantage of that.