In this post, we will explore learning-augmented algorithms for solving the job scheduling problem as proposed by [1]. We will start by examining using oracles to perform job scheduling for jobs whose service time follow simple distributions and then derive a more general model.

A Basic Example

Suppose we have two types of jobs, short jobs and long jobs, which take time $s$ and $l$ respectively where $s < l$. Our goal is to minimize the average waiting time for jobs by finding the optimal job schedule without preemption. The optimal solution to this problem is to put the shortest jobs first and this can be trivially proved using proof-by-contradiction.

Now, suppose that we only have predictions of whether jobs are short or long rather than the true value. In this setting, the natural strategy would be to put the shortest predicted jobs first, which means that the performance of the schedule as determined by the average waiting time now depends on the performance of the classifier.

To evaluate the performance of this predicted information algorithm, we can compare it to either the no information case where we simply choose a random order or the perfect information case where we put the shortest jobs first. Using the latter comparison, we can define the price of misprediction as the ratio of performance between predicted information and full information. This metric is a variation on competitive ratios and is similar to multiplicative regret in machine learning.

Suppose we have $n_s$ short jobs and $n_l$ long jobs. Then, the expected average waiting time in the perfect information case where we process the shortest jobs first is given by
\begin{equation}
\frac{1}{n} (n_s \frac{n_s-1}{2} s + n_l \frac{n_l-1}{2} l + n_l n_s s)
\end{equation}
where the additive terms come from the fact that, in expectation, each short job has to wait for half of all the short jobs to finish while each long job has to wait for all the short jobs as well as half of all the long jobs to finish. If we ignore the lower order terms, then this is asymptotically equal to
\begin{equation}
\frac{n_s^2 s + n_l^2 l + 2 n_s n_l s}{2n}
\end{equation}

On the other hand, the expected average waiting time in the predicted information case is given by
\begin{equation}
\frac{n_s^2 s + n_l^2 l + n_s n_l ((2 - (p+q))s + (p+q)l)}{2n}
\end{equation}
where $p$ is the probability that short jobs are misclassified while $q$ is the probability that long jobs are misclassified (proof in [1]).

Therefore, the asymptotic price of misprediction is the ratio
\begin{equation}
R = \frac{n_s^2 s + n_l^2 l + n_s n_l ((2 - (p+q))s + (p+q)l)}{n_s^2 s + n_l^2 l + 2 n_s n_l s}
\end{equation}
which is upper bounded by
\begin{equation}
R \leq 1 + \frac{(p+q) (\sqrt{l/s}-1)}{2}
\end{equation}
Note that in the no information case, we have $p = q = 1/2$. We conclude from the above equation that any model which produces error probabilities such that $p+q \leq 1.0$ will result in a lower expected average waiting time than the no information case.

A General Model

Let $g(x, y)$ be the probability density function for a job that requires time $x$ and is predicted to require time $y$. Using this, we can define the following density functions
\begin{align}
f_s(x) &= \int_{y=0}^\infty g(x, y) dy & \text{Service time density} \\
f_p(y) &= \int_{x=0}^\infty g(x, y) dx & \text{Predicted service density}
\end{align}

Analysis

In this setting, the expected waiting time for a job in the full information case is given by
\begin{equation}
(n-1) \int_{x=0}^{\infty} f_s(x) \int_{z=0}^x z f_s(z) ;dz;dx
\end{equation}
where $n$ is the total number of jobs. Intuitively, this expression captures the conditional expected waiting time for a job which has service time $x$ and integrates over all possible service times to get the expected waiting time.

Similarly, the expected waiting time for a job in the predicted information case is given by
\begin{equation}
(n-1) \int_{y=0}^\infty f_p(y) \int_{x=0}^\infty \int_{z=0}^y x g(x, z) ;dz;dx;dy
\end{equation}
where the extra integral comes from the fact that we need to consider the probability that our predicted time is wrong.

Example: Exponential Jobs

Suppose that our service times are independently and exponentially distributed with mean $1$ and that the predicted time for a job with actual time $x$ is exponentially distributed with mean $x$. In this setup, the joint distribution is equal to
\begin{equation}
g(x, y) = e^{-x-y/x} / x
\end{equation}
and if we plug this into the above formula, we find that the price of misprediction is exactly $4/3$ which demonstrates that even a weak predictor can offer significant improvements over the no information case.

The Queuing Setting

Next, we can consider a more standard setting where instead of having a fixed number of jobs, we assume that the jobs come in over time following a Poisson process with rate $\lambda<1$.

Suppose we have a standard FIFO queue where jobs arrive at rate $\lambda$ and have a service time distribution $S$. This is the ``no information" baseline we will try to improve upon.

Now, suppose we have known job times; in this case, the FIFO strategy is clearly sub-optimal. If we cannot preempt jobs, we should process the shortest jobs first (SJF). If we can preempt jobs, then we should process the jobs with the shortest remaining time first (SRTF). There are well known formulas describing the expected waiting time for both of these strategies.

Finally, suppose we have the predicted job times governed by $g(x, y)$, as before. Then the natural strategies would be to process the shortest predicted jobs first without preemption (SPJF) or to process the shortest predicted remaining time first with preemption (SPRTF).

Analysis

In the no information case, the waiting time $W$ is given by
\begin{align}
\rho &= \lambda E[S] & \text{Load at the queue.} \\
W &= \frac{\rho E[S^2]}{2 E[S] (1 - \rho)}
\end{align}

In the full information case, the SJF strategy produces waiting time $W$ as shown below:
\begin{align}
\rho_x &= \lambda \int_{t=0}^x t f_s(t) dt & \text{Load of jobs with service time up to x.} \\
W_x &= \frac{\rho E[S^2]}{2 E[S] (1 - \rho_x)^2} & \text{Waiting time for job with service time x.} \\
W &= \int_{x=0}^\infty f_s(x) W_x dx
\end{align}

In the predicted information case, the SPJF strategy produces waiting time $W$ as shown below:
\begin{align}
\rho_y &= \lambda \int_{t=0}^y \int_{t=0}^x x g(x, t) ;dx;dt & \text{Load of jobs with predicted service time up to y.} \\
W_y &= \frac{\rho E[S^2]}{2 E[S] (1 - \rho_y)^2} & \text{Waiting time for job with service time y.} \\
W &= \int_{y=0}^\infty f_p(y) W_y dy
\end{align}

Example

Suppose we have Poisson arrivals at rate $\lambda$, exponential service times with mean $1$, and predicted service time with mean $x$ for a job with actual service time $x$. If we apply the above analysis, then the price of misprediction is given by
\begin{equation}
\cfrac{
\int_{y=0}^\infty \cfrac{f_p(y)}{(1 - \rho_y)^2} dy
}{
\int_{x=0}^\infty \cfrac{f_s(x)}{(1 - \rho_x)^2} dx
}
\end{equation}

Mitzenmacher simulates this setup in [1] and provides experimental evidence that the above equations are accurate. Furthermore, Mitzenmacher experiments with a slight modification where the predictions are partially distributed according to a Weibull-1/2 distributions which has heavier tails (and therefore results in less accurate predictions) and examines the impact on queue performance.

The experiment results, shown in the above figure, indicates that preemptive queuing yields better results than non-preemptive queuing in all cases. Furthermore, the experimental results show that the predicted information case results in close-to-optimal performance even if the predictions themselves are not particularly accurate.

References

[1] Michael Mitzenmacher. Scheduling with Predictions and the Price of Misprediction. 2019.