Summary
You are essentially talking about running a task host for long running tasks, and being able to cancel those tasks. Your specific question seems to want to know the best way to implement this in .NET. Your architecture is good, although you are brave to roll your own rather than using existing frameworks, and you haven't mentioned scaling your architecture later.
My preference is for using the TPL Task object. It supports cancellation, and is easy to poll for progress, etc. You can only use this in .NET 4 onwards.
It is hard to provide code without basically designing a whole job hosting engine for you and knowing your .NET version. I have described the steps in detail below, with references to example code.
Your approach of using the Windows Service OnCustomCommand is fine, you could also use a messaging service (see below) if you have that option for client-service comms. This would be more appropriate for a scenario where you have many clients talking to a central job service, and the job service is not on the same machine as the client.
Running and cancelling tasks on threads
Before we look at your exact context, it would be good to review MSDN - Asynchronous Programming Patterns. There are three main .NET patterns to run and cancel jobs on threads, and I list them in order of preference for use:
- TAP: Task-based Asynchronous Pattern
- Based on Task, which has been available only since .NET 4
- The prefered way to run and control any thread-based activity from .NET 4 onwards
- Much simpler to implement that EAP
- EAP: Event-based Asynchronous Pattern
- Your only option if you don't have .NET 4 or later.
- Hard to implement, but once you have understood it you can roll it out and it is very reliable to use
- APM: Asynchronous Programming Model
- No longer relevant unless you maintain legacy code or use old APIs.
- Even with .NET 1.1 you can implement a version of EAP, so I will not cover this as you say you are implementing your own solution
The architecture
Imagine this like a REST based service.
- The client submits a job, and gets returned an identifier for the job
- A job engine then picks up the job when it is ready, and starts running it
- If the client doesn't want the job any more, then they delete the job, using it's identifier
This way the client is completely isolated from the workings of the job engine, and the job engine can be improved over time.
The job engine
The approach is as follows:
- For a submitted task, generate a universal identifier (UID) so that you can:
- Identify a running task
- Poll for results
- Cancel the task if required
- return that UID to the client
- queue the job using that identifier
- when you have resources
- run the job by creating a Task
- store the Task in a dictionary against the UID as a key
When the client wants results, they send the request with the UID and you return progress by checking against the Task that you retrieve from the dictionary. If the task is complete they can then send a request for the completed data, or in your case just go and read the completed files.
When they want to cancel they send the request with the UID, and you cancel the Task by finding it in the dictionary and telling it to cancel.
Cancelling inside a job
Inside your code you will need to regularly check your cancellation token to see if you should stop running code (see How do I abort/cancel TPL Tasks? if you are using the TAP pattern, or Albahari if you are using EAP). At that point you will exit your job processing, and your code, if designed well, should dispose of IDiposables where required, remove big strings from memory etc.
The basic premise of cancellation is that you check your cancellation token:
- After a block of work that takes a long time (e.g. a call to an external API)
- Inside a loop (
for
, foreach
, do
or while
) that you control, you check on each iteration
- Within a long block of sequential code, that might take "some time", you insert points to check on a regular basis
You need to define how quickly you need to react to a cancellation - for a windows service it should be within milliseconds, preferably, to make sure that windows doesn't have problems restarting or stopping the service.
Some people do this whole process with threads, and by terminating the thread - this is ugly and not recommended any more.
Reliability
You need to ask: what happens if your server restarts, the windows service crashes, or any other exception happens causing you to lose incomplete jobs? In this case you may want a queue architecture that is reliable in order to be able to restart jobs, or rebuild the queue of jobs you haven't started yet.
If you don't want to scale, this is simple - use a local database that the windows service stored job information in.
- On submission of a job, record its details in the database
- When you start a job, record that against the job record in the database
- When the client collects the job, mark it for delayed garbage collection in the database, and then delete it after a set amount of time (1 hour, 1 day ...)
- If your service restarts and there are "in progress jobs" then requeue them and then start your job engine again.
If you do want to scale, or your clients are on many computers, and you have a job engine "farm" of 1 or more servers, then look at using a message queue instead of directly communicating using OnCustomCommand
.
Message Queues have multiple benefits. They will allow you to reliably submit jobs to a central queue that many workers can then pick up and process, and to decouple your clients and servers so you can scale out your job running services. They are used to ensure jobs are reliably submitted and processed in a highly decoupled fashion, and this can work locally or globally, but always reliably, you can even then combine it with running your windows service on cloud workers which you can dynamically scale.
Examples of technologies are MSMQ (if you want to maintain your own, or must stay inside your own firewall), or Windows Azure Service Bus (WASB) - which is cheap, and already done for you. In either case you will want to use Patterns and Best Practices for Enterprise Integration. In the case of WASB then there are many (MSDN), many (MSDN samples for BrokeredMessaging etc.), many (new Task-based API) developer resources, and NuGet packages for you to use