I have a bunch of work that needs to be performed at specific time intervals. However, we have limited resources to do that work, each day. Therefore, I am trying to optimize the start time dates (start time dates can only be moved forward not backward) so that resources used everyday are more less similar to what we have budgeted for.
These functions are used in example below::
# Function to shift/rotate a vector
shifter <- function(x, n = 1) {
if (n == 0) x else c(tail(x, -n), head(x, n))
}
# Getting a range of dates
get_date_range <- function(current_date = Sys.Date(), next_planned_date = Sys.Date() + 5)
{
seq.Date(as.Date(current_date), as.Date(next_planned_date), "days")
}
Assume a toy example dataset :: Here task P1 starts on 14th while P2 starts on 15th. Value of zero means that no work is done for that task on that day.
# EXAMPLE TOY DATASET
datain = data.frame(dated = c("2018-12-14", "2018-12-15", "2018-12-16", "2018-12-17"),
P1 = c(1,2,0,3), P2 = c(0,4,0,6)) %>%
mutate(dated = as.character(dated))
#The amount of resources that can be used in a day
max_work = 4
# We will use all the possible combination of start dates to
# search for the best one
possible_start_dates <- do.call(expand.grid, date_range_of_all)
# Utilisation stores the capacity used during each
# combination of start dates
# We will use the minimum of thse utilisation
utilisation <- NULL # utilisation difference; absolute value
utilisation_act <- NULL # actual utilisation including negative utilisation
# copy of data for making changes
ndatain <- datain
# Move data across possible start dates and
# calculate the possible utilisation in each movements
for(i in 1:nrow(possible_start_dates)) # for every combination
{
for(j in 1:ncol(possible_start_dates)) # for every plan
{
# Number of days that are different
days_diff = difftime(oriz_start_date[["Plan_Start_Date"]][j],
possible_start_dates[i,j], tz = "UTC", units = "days" ) %>% as.numeric()
# Move the start dates
ndatain[, (j+1)] <- shifter(datain[, (j+1)], days_diff)
}
if(is.null(utilisation)) # first iteration
{
# calculate the utilisation
utilisation = c(i, abs(max_work - rowSums(ndatain %>% select(-dated))))
utilisation_act <- c(i, max_work - rowSums(ndatain %>% select(-dated)))
}else{ # everything except first iteration
utilisation = rbind(utilisation, c(i,abs(max_work - rowSums(ndatain %>% select(-dated)))))
utilisation_act <- rbind(utilisation_act, c(i, max_work - rowSums(ndatain %>% select(-dated))))
}
}
# convert matrix to dataframe
row.names(utilisation) <- paste0("Row", 1:nrow(utilisation))
utilisation <- as.data.frame(utilisation)
row.names(utilisation_act) <- paste0("Row", 1:nrow(utilisation_act))
utilisation_act <- as.data.frame(utilisation_act)
# Total utilisation
tot_util = rowSums(utilisation[-1])
# replace negative utilisation with zero
utilisation_act[utilisation_act < 0] <- 0
tot_util_act = rowSums(utilisation_act[-1])
# Index of all possible start dates producing minimum utilization changes
indx_min_all = which(min(tot_util) == tot_util)
indx_min_all_act = which(min(tot_util_act) == tot_util_act)
# The minimum possible dates that are minimum of actual utilisation
candidate_dates <- possible_start_dates[intersect(indx_min_all, indx_min_all_act), ]
# Now check which of them are closest to the current starting dates; so that the movement is not much
time_diff <- c()
for(i in 1:nrow(candidate_dates))
{
# we will add this value in inner loop so here we
timediff_indv <- 0
for(j in 1:ncol(candidate_dates))
{
diff_days <- difftime(oriz_start_date[["Plan_Start_Date"]][j],
candidate_dates[i,j], tz = "UTC", units = "days" ) %>% as.numeric()
# print(oriz_start_date[["Plan_Start_Date"]][j])
# print(candidate_dates[i,j])
#
# print(diff_days)
timediff_indv <- timediff_indv + diff_days
}
time_diff <- c(time_diff, timediff_indv)
}
# Alternatives
fin_dates <- candidate_dates[min(time_diff) == time_diff, ]
The above code runs well and produces the expected output; however it does not scale well. I have very large dataset (Two years worth of work and for more than thousand different tasks repeating in intervals) and searching through every possible combination is not a viable option. Are there ways I can formulate this problem as a standard optimization problem and use Rglpk
or Rcplex
or some even better solution. Thanks for inputs.