Kubernetes offers a wide range of functionalities to manage containerized applications and create complex distributed systems. One such utility is to automatically scale the number of replicas in a deployment when the workload increases. In this post we’ll see a simple example of how to use Kubernetes’ Horizontal Pod Autoscaler to dynamically adjust the number of workers that are consuming a task queue in a Flask + Celery app.

# Write the Celery app

Celery is a framework to build software based on task queues. We’ll deploy a system composed by a Flask frontend web server that schedules tasks when receiving connections, a Redis server that acts as queue broker, and a set of workers that consume the tasks asynchronously. The number of workers will be decided by Kubernetes based on the workload at each time.

The following app.py Python file implements both the frontend web server and the worker; they just need to be started with the flask and celery CLIs respectively. The count task that is scheduled by Flask produces an active wait in the workers to force CPU usage.

from celery import Celery
from flask import Flask
import os

redis_server = os.environ['REDIS_SERVER']

flask_app = Flask(__name__)
celery_app = Celery('app', broker=f'redis://{redis_server}')

@celery_app.task
def count(num):
print('Counting')
for i in range(num):
pass
print('Done')

@flask_app.route('/')
def do_count():
count.delay(1000000)
return 'Task scheduled'


We create the image celery-app with this Dockerfile and add it to an image registry that is accessible from the cluster.

FROM python:alpine

WORKDIR /app
RUN pip install "celery[redis]" flask
COPY app.py .
EXPOSE 5000


# Deploy the basic system

First, we deploy the Redis component that will serve as task queue.

apiVersion: apps/v1
kind: Deployment
metadata:
name: celery-redis
labels:
k8s-app: celery-redis
spec:
replicas: 1
selector:
matchLabels:
k8s-app: celery-redis
template:
metadata:
labels:
k8s-app: celery-redis
spec:
containers:
- name: redis
image: redis:alpine
command: ["redis-server"]
ports:
- containerPort: 6379
name: server
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: celery-redis
labels:
k8s-app: celery-redis
spec:
selector:
k8s-app: celery-redis
ports:
- name: server
port: 6379
protocol: TCP


Then, we create and expose the web server that will act as frontend.

apiVersion: apps/v1
kind: Deployment
metadata:
name: celery-web
labels:
k8s-app: celery-web
spec:
replicas: 1
selector:
matchLabels:
k8s-app: celery-web
template:
metadata:
labels:
k8s-app: celery-web
spec:
containers:
- name: web
image: celery-app
env:
- name: FLASK_APP
value: "app.py"
- name: REDIS_SERVER
value: "celery-redis:6379/0"
command: ["flask"]
args: ["run", "--host=0.0.0.0"]
ports:
- containerPort: 5000
name: web
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: celery-web
labels:
k8s-app: celery-web
spec:
selector:
k8s-app: celery-web
ports:
- name: web
port: 5000
protocol: TCP


Finally, we add the deployment for the workers with one replica, for now. Note that we need to specify the spec.requirements.requests.cpu parameter to be able to autoscale the number of replicas. The percentage that we set in the last object refers to this field.

apiVersion: apps/v1
kind: Deployment
metadata:
name: celery-worker
labels:
k8s-app: celery-worker
spec:
replicas: 1
selector:
matchLabels:
k8s-app: celery-worker
template:
metadata:
labels:
k8s-app: celery-worker
spec:
containers:
- name: worker
image: celery-app
imagePullPolicy: Always
env:
- name: REDIS_SERVER
value: "celery-redis:6379/0"
command: ["celery"]
args: ["-A", "app", "worker", "--loglevel=info"]
resources:
limits:
cpu: 100m
requests:
cpu: 60m


# Add pod autoscaling to the workers

At this point the system is fully functional, and the only worker handles the tasks added to the queue. However, we want to leverage the available resources of the cluster when needed and possible, so we are going to add pod autoscaling to the worker deployment.

The Kubernetes object in charge of this is the Horizontal Pod Autoscaler, or HPA, which targets a Deployment or Replica Controller and modifies it. The parameteres of the HPA define a maximum and minimum number of replicas, and the CPU usage threshold to trigger the scale.

In this case we want to add new replicas of the worker when the existing ones have an average usage of 50% of the requested CPU, up to a total of 3 replicas. Since we set the spec.requirements.requests.cpu value before to 60m, the actual threshold is an average of 30m of CPU.

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: celery-worker
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: celery-worker
minReplicas: 1
maxReplicas: 3
targetCPUUtilizationPercentage: 50


# Test the system

We can now create a pod in the cluster and generate HTTP requests to the web server at http://celery-web.default.svc.cluster.local:5000. If we monitor the HPA with kubectl, we can see the increase of the CPU usage, and the consequent creation of new replicas.

$kubectl get hpa # Before generating traffic NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE default celery-worker Deployment/celery-worker 5%/50% 1 3 1 20m$ kubectl get hpa  # The first pod begins handling tasks
NAMESPACE   NAME            REFERENCE                  TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
default     celery-worker   Deployment/celery-worker   120%/50%  1         3         1          29m
$kubectl get hpa # New replicas are created NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE default celery-worker Deployment/celery-worker 120%/50% 1 3 3 29m  Once the workers finish processing the tasks, we see that the CPU usage falls down again. After a couple of minutes the HPA deletes two of the replicas, because they are no longer needed. $ kubectl get hpa  # No more tasks being processed
NAMESPACE   NAME            REFERENCE                  TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
default     celery-worker   Deployment/celery-worker   7%/50%    1         3         3          30m
\$ kubectl get hpa  # The number of replicas is downscaled
NAMESPACE   NAME            REFERENCE                  TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
default     celery-worker   Deployment/celery-worker   6%/50%    1         3         1          35m



# Next steps in autoscaling

Depending on the kind of system, the usage spikes may be quite abrupt and require way more cluster resources than it would be possible to provision in advance. For these situations, the Cluster Autoscaler , also known as CA, can be a game changer (in public cloud environments only for now).

This component monitors the cluster and spins up new nodes when it detects that pods are not being created due to lack of resources. Likewise, it detects when the average node usage is low and deletes some of them to obtain higher pod density.

When used in combination with HPA, CA helps to keep the cloud resources used by the Kubernetes cluster to the strict minimum needed at each moment to deliver the workload, hence optimizing the costs.