Thursday, August 4, 2016

Queueing Theory Without Math

Queueing theory, as you might imagine, answers questions about queues. It expresses relationships between:
- arrival rate of items (where items might be customers, orders, calls, etc)
- processing rate of items (where processing might be checkout, fulfilling an order, answering the phone, etc)
- number of processors (where processors might be clerks, warehouse workers, customer service reps, etc)
- processing time
- length of the queue
- time spent in the queue
- noise in all of the above
The last item, noise, is most important. Arrivals usually aren't perfectly even. Processing time isn't perfectly even. Sometimes you'll get a bunch of arrivals at once, the processors will be swamped, and the items will spend a long time in the queue. Other times, arrivals will be slow, items will spend very little time in the queue, but the processors will have free time on their hands in between arrivals.

To help reason about this without math, we're going to imagine two queues, rather than one (see diagram below). One queue contains customers waiting for an employee. The other queue contains employees waiting for a customer. Whenever both a customer and employee are available, they pair up and do something (processing). Once they're done, the customer proceeds on their merry way, and the employee gets back into the employee queue to process another customer.

The two queue setup: customers queue up to wait for employees, and employees queue up to wait for customers. When a customer and an employee are available, they pair up and process. When processing is complete, the customer leaves, and the employee queues up for a another customer.

You might be thinking "but employees don't usually queue up, only customers queue up". The reason queueing theory is useful is that lots of things that we don't usually think of as queues, still act like queues. Any time you have a bunch of customers, or calls, or employees, or items waiting for the same thing, it's a queue. It doesn't need to be first-come, first-serve. Maybe service is random, like if incoming calls get routed randomly to customer service reps. Then the available reps are "queued", because they're all waiting for the same thing (an incoming call). When you go to the super market and there's a bunch of open cashiers, you can think of those cashiers as queued up awaiting customers.

Anyway, back to the two-queue diagram. Notice that at least one queue is always empty: either there are no waiting customers, or there are no waiting employees. If there were a waiting customer and a waiting employee, then they'd immediately pair up and start processing. Now sometimes, if you're lucky, a new customer will show up exactly when an employee finishes processing a previous customer, and the two will pair up without either of them needing to wait at all. That's the perfect scenario: both employee and customer have zero wait time. In reality, that never happens.

Both employees and customers will inevitably have to wait sometimes, because the real world is noisy. Sometimes a bunch of customers show up at once, or a few customers take a long time to process, and customers end up having to wait. The queue gets long, customer wait times go up, customers get unhappy. Other times processing goes quickly, or only a few customers show up, and employees have to wait. Customers wait times go down, but lots of employees are just queued up doing nothing, and management complains about costs. (see diagrams)

At least one of the two queues is always empty. In practice, there is a trade off between customers waiting (above) and employees waiting (below)

One of the key ideas in queueing theory is the tradeoff between how much time customers spend waiting and how much time employees spend waiting. By either adding more employees or making processing more efficient, we can cut customer wait time. But in either case, employees will spend more time waiting. Maybe we improve efficiency and customer wait times go down, but then management notices that employees are spending a lot of sitting around. They stop hiring for a while, the company grows, customers come into the system faster, and employees spend less time sitting around... but at the cost of longer customer wait times.

Let's call this the Golden Rule of Queueing Theory: The less time customers are waiting, the more time employees have nothing to do, and vice versa.

Corollary 1: If your employees are always busy, then your customers are waiting forever!

Corollary 2: In order to achieve low customer wait times, management must actively ensure that employees often have nothing to do.

Corollary 3: Efficiency improvements in processing will only improve customer wait times to the extent that employees have more free time.

Can we avoid the tradeoff? Sometimes, a bit. The reason for the tradeoff is noise: sometimes a bunch of customers show up at once and customers have to wait, other times few customers show up and employees have to wait. But if we can predict when a bunch of customers will show up, then maybe we can have extra employees available. If we know when few customers will show up, then maybe we can have fewer employees on hand at that time. Note that this requires both accurately predicting arrival rate, and the flexibility to match employees to that rate. Alternatively, if we're very flexible, then we can wait for a bunch of customers to show up and then add more employees right away. Walgreens, for example, does this: when the line at Walgreens exceeds three people, they will call store associates to the front, abandoning lower-priority tasks for later.

In practice, the flexibility part tends to be harder than the prediction part. Employees tend to have pretty set schedules. Ideally, employees will have alternative things to do, like at Walgreens. Then we can have plenty of employees without people sitting around. But if the employee is a specialist, then that can be tricky.

One trick that can help is combining individual queues into bigger queues. Imagine that two employees, Alice and Bob, each have their own separate queue of customers. Sometimes Alice is busy when Bob is bored, and sometimes it's the other way around. If they can share the same queue, then it's the same as Bob helping Alice when he's bored, and Alice helping Bob when she's bored. The easier it is to hand off a customer between people, the better this works. On the other hand, if Alice and Bob are two different kinds of specialist, then this doesn't work at all - Bob can't help Alice' customers, and Alice can't help Bob's customers.

Don't get too distracted by avoiding the trade off. Opportunities to better balance customers and employees are valuable and should definitely be pursued, but at the end of the day, the trade off will need to be addressed. A target needs to be set for customer wait times, with the understanding that sometimes capacity will exceed customers, and employees will have spare time on their hands. In fact, if customer wait times are to be kept small, then employees must have spare time. There need to be employees sitting around when a customer arrives, in order for the customer to be processed quickly. That's the crucial takeaway.