I am working on an application dealing mainly with public health indicators. The related concepts and knowledge are kept in an OWL ontology. There will also be eventually a (potentially important) number of numerical facts (e.g. indicator for X has value Y), which will grow over time, as more data gets crunched and added to the application. Given that querying this system will imply manipulating concepts (from the ontology), but also (numerical) facts, I am wondering what could be (in broad terms) an ideal data model/storage architecture for it.
I've been contemplating for instance an hybrid architecture where the facts would be stored in a separate SQL database (i.e. using a pure relational model, not a RDF-over-relational one), and for which the querying would be decomposed in two phases: the second (SQL) being derived (or guided) from concepts retrieved from the first (ontology).
As I read however about robust triple stores being able to handle massive amounts of data (billion+ triples), it suggests that I could also try to keep my facts in an RDF store (perhaps implemented with a relational DB). This would have the benefit I suppose of offering a more unified query interface (as I could query simultaneously in the the schema and fact stores using a same API or query engine, instead of mixing SQL in the process as with my hybrid approach). On the other hand, I guess I'd also lose the data crunching capabilities of a relational DB (assuming a triple store is not optimized for operations like aggregation, reduction, etc.) which might be useful in my context. As a final piece of information, I have already invested some energy in beginning to learn the Jena framework, so I'd appreciate if the suggestions could take it into account.
(I already asked this question on answers.semanticweb.com, to no avail.)