I'm writing a stateful server in Clojure backed by Neo4j that can serve socket requests, like HTTP. Which means, of course, that I need to be able to start and stop socket servers from within this server. Design-wise, I would want to be able to declare a "service" within this server and start and stop it.
What I'm trying to wrap my mind around in Clojure is how to ensure that starting and stopping these services is thread-safe. This server I'm writing will have NREPL embedded inside it and process incoming requests in a parallel way. Some of these requests will be administrative: start service X, stop service Y. Which opens up the possibility that two start requests come in at the same time.
- Starting should synchronously check a "running" flag and a "starting" flag and fail if either are set. In the same transaction, the "starting" flag should be set.
- After the "starting" flag is set, the transaction closes. That makes the "starting" flag visible to other transactions.
- Then the (start) function actually starts the service.
- If (start) succeeds, the "running" and "starting" flags are synchronously set.
- If (start) fails, the "starting" flag is set and the exception is returned.
Stopping needs the same thing, checking a "running" flag and checking and setting it's own "stopping" flag.
I'm trying to reason through all possible combinations of (start) and (stop).
Have I missed anything?
Is there a library for this already? If not, what should a library like this look like? I'll open source it and put it on Github.
Edit:
This is what I have so far. There's a hole I can see though. What am I missing?
(ns extenium.db
(:require [clojure.tools.logging :as log])
(:import org.neo4j.graphdb.factory.GraphDatabaseFactory))
(def ^:private
db- (ref {:ref nil
:running false
:starting false
:stopping false}))
(defn stop []
(dosync
(if (or (not (:running (ensure db-)))
(:stopping (ensure db-)))
(throw (IllegalStateException. "Database already stopped or stopping."))
(alter db- assoc :stopping true)))
(try
(log/info "Stopping database")
(.shutdown (:ref db-))
(dosync
(alter db- assoc :ref nil))
(log/info "Stopped database")
(finally
(dosync
(alter db- assoc :stopping false)))))
In the try block, I log, then call .shutdown, then log again. If the first log fails (I/O exceptions can happen), then (:stopping db-) is set to false, which unblocks it and is fine. .shutdown is a void function from Neo4j, so I don't have to evaluate a return value. If it fails, (:stopping db-) is set to false, so that's fine too. Then I set the (:ref db-) to nil. What if that fails? (:stopping db-) is set to false, but the (:ref db-) is left hanging. So that's a hole. Same case with the second log call. What am I missing?
Would this be better if I just used Clojure's locking primitives instead of a ref dance?