I am doing some analytics using Solr and specifically using the faceting and pivot functionality for a large set of log files. I have a large log file that I have indexed in Solr along the lines of.
Keyword Visits log_date_ISO
1 red 1,938 2013-01-01
2 blue 435 2013-02-01
3 green 318 2013-04-01
4 red blue 279 2013-01-01
I then run a query and facet by 'log_date_ISO' to give me keyword counts by date that contain the query term. Two questions:
(1) Is there a way to sum the visits per keyword for each date - because what I really want is to sum visits across keywords that contain the query:
-> e.g. if I ran query 'red' for the above - I would want date 2013-01-01 to have a count of 1938 + 279 = 2217 (i.e. the sum of the visits associated with the keywords that contain the query 'red') rather than '2' (i.e. the count of the keywords containing the query).
(2) Is there a way to normalise by monthly query volume?
-> e.g. if the query volume for '2013-01-01' was 10,000 then the normalised volume for the query 'red' would be 2217/10000 = 0.2217
LAST RESORT: If these are not possible, I will pre-process the log file using pandas/python to group by date, then by keyword then normalise - but was wondering if it was possible in Solr.
Thanks in advance.