Kubernetes offers a wide range of functionalities to manage containerized applications and create complex distributed systems. One such utility is to automatically scale the number of replicas in a deployment when the workload increases. In this post we’ll see a simple example of how to use Kubernetes’ Horizontal Pod Autoscaler to dynamically adjust the number of workers that are consuming a task queue in a Flask + Celery app.

Write the Celery app

Celery is a framework to build software based on task queues. We’ll deploy a system composed by a Flask frontend web server that schedules tasks when receiving connections, a Redis server that acts as queue broker, and a set of workers that consume the tasks asynchronously. The number of workers will be decided by Kubernetes based on the workload at each time.

The following app.py Python file implements both the frontend web server and the worker; they just need to be started with the flask and celery CLIs respectively. The count task that is scheduled by Flask produces an active wait in the workers to force CPU usage.

from celery import Celery
from flask import Flask
import os

redis_server = os.environ['REDIS_SERVER']

flask_app = Flask(__name__)
celery_app = Celery('app', broker=f'redis://{redis_server}')

@celery_app.task
def count(num):
    print('Counting')
    for i in range(num):
        pass
    print('Done')

@flask_app.route('/')
def do_count():
    count.delay(1000000)
    return 'Task scheduled'

We create the image celery-app with this Dockerfile and add it to an image registry that is accessible from the cluster.

FROM python:alpine

WORKDIR /app
RUN pip install "celery[redis]" flask
COPY app.py .
EXPOSE 5000

Deploy the basic system

First, we deploy the Redis component that will serve as task queue.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: celery-redis
  labels:
    k8s-app: celery-redis
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: celery-redis
  template:
    metadata:
      labels:
        k8s-app: celery-redis
    spec:
      containers:
      - name: redis
        image: redis:alpine
        command: ["redis-server"]
        ports:
        - containerPort: 6379
          name: server
          protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  name: celery-redis
  labels:
    k8s-app: celery-redis
spec:
  selector:
    k8s-app: celery-redis
  ports:
  - name: server
    port: 6379
    protocol: TCP

Then, we create and expose the web server that will act as frontend.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: celery-web
  labels:
    k8s-app: celery-web
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: celery-web
  template:
    metadata:
      labels:
        k8s-app: celery-web
    spec:
      containers:
      - name: web
        image: celery-app
        env:
        - name: FLASK_APP
          value: "app.py"
        - name: REDIS_SERVER
          value: "celery-redis:6379/0"
        command: ["flask"]
        args: ["run", "--host=0.0.0.0"]
        ports:
        - containerPort: 5000
          name: web
          protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  name: celery-web
  labels:
    k8s-app: celery-web
spec:
  selector:
    k8s-app: celery-web
  ports:
  - name: web
    port: 5000
    protocol: TCP

Finally, we add the deployment for the workers with one replica, for now. Note that we need to specify the spec.requirements.requests.cpu parameter to be able to autoscale the number of replicas. The percentage that we set in the last object refers to this field.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: celery-worker
  labels:
    k8s-app: celery-worker
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: celery-worker
  template:
    metadata:
      labels:
        k8s-app: celery-worker
    spec:
      containers:
      - name: worker
        image: celery-app
        imagePullPolicy: Always
        env:
        - name: REDIS_SERVER
          value: "celery-redis:6379/0"
        command: ["celery"]
        args: ["-A", "app", "worker", "--loglevel=info"]
        resources:
          limits:
            cpu: 100m
          requests:
            cpu: 60m

Add pod autoscaling to the workers

At this point the system is fully functional, and the only worker handles the tasks added to the queue. However, we want to leverage the available resources of the cluster when needed and possible, so we are going to add pod autoscaling to the worker deployment.

The Kubernetes object in charge of this is the Horizontal Pod Autoscaler, or HPA, which targets a Deployment or Replica Controller and modifies it. The parameteres of the HPA define a maximum and minimum number of replicas, and the CPU usage threshold to trigger the scale.

In this case we want to add new replicas of the worker when the existing ones have an average usage of 50% of the requested CPU, up to a total of 3 replicas. Since we set the spec.requirements.requests.cpu value before to 60m, the actual threshold is an average of 30m of CPU.

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: celery-worker
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: celery-worker
  minReplicas: 1
  maxReplicas: 3
  targetCPUUtilizationPercentage: 50

Test the system

We can now create a pod in the cluster and generate HTTP requests to the web server at http://celery-web.default.svc.cluster.local:5000. If we monitor the HPA with kubectl, we can see the increase of the CPU usage, and the consequent creation of new replicas.

$ kubectl get hpa  # Before generating traffic
NAMESPACE   NAME            REFERENCE                  TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
default     celery-worker   Deployment/celery-worker   5%/50%    1         3         1          20m
$ kubectl get hpa  # The first pod begins handling tasks
NAMESPACE   NAME            REFERENCE                  TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
default     celery-worker   Deployment/celery-worker   120%/50%  1         3         1          29m
$ kubectl get hpa  # New replicas are created
NAMESPACE   NAME            REFERENCE                  TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
default     celery-worker   Deployment/celery-worker   120%/50%  1         3         3          29m

Once the workers finish processing the tasks, we see that the CPU usage falls down again. After a couple of minutes the HPA deletes two of the replicas, because they are no longer needed.

$ kubectl get hpa  # No more tasks being processed
NAMESPACE   NAME            REFERENCE                  TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
default     celery-worker   Deployment/celery-worker   7%/50%    1         3         3          30m
$ kubectl get hpa  # The number of replicas is downscaled
NAMESPACE   NAME            REFERENCE                  TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
default     celery-worker   Deployment/celery-worker   6%/50%    1         3         1          35m

Next steps in autoscaling

Depending on the kind of system, the usage spikes may be quite abrupt and require way more cluster resources than it would be possible to provision in advance. For these situations, the Cluster Autoscaler , also known as CA, can be a game changer (in public cloud environments only for now).

This component monitors the cluster and spins up new nodes when it detects that pods are not being created due to lack of resources. Likewise, it detects when the average node usage is low and deletes some of them to obtain higher pod density.

When used in combination with HPA, CA helps to keep the cloud resources used by the Kubernetes cluster to the strict minimum needed at each moment to deliver the workload, hence optimizing the costs.