Kubernetes offers a wide range of functionalities to manage containerized applications and create complex distributed systems. One such utility is to automatically scale the number of replicas in a deployment when the workload increases. In this post we’ll see a simple example of how to use Kubernetes’ Horizontal Pod Autoscaler to dynamically adjust the number of workers that are consuming a task queue in a Flask + Celery app.
Write the Celery app
Celery is a framework to build software based on task queues. We’ll deploy a system composed by a Flask frontend web server that schedules tasks when receiving connections, a Redis server that acts as queue broker, and a set of workers that consume the tasks asynchronously. The number of workers will be decided by Kubernetes based on the workload at each time.
The following app.py
Python file implements both the frontend web server and the worker; they just need to
be started with the flask
and celery
CLIs respectively. The count
task that is scheduled by Flask
produces an active wait in the workers to force CPU usage.
from celery import Celery
from flask import Flask
import os
redis_server = os.environ['REDIS_SERVER']
flask_app = Flask(__name__)
celery_app = Celery('app', broker=f'redis://{redis_server}')
@celery_app.task
def count(num):
print('Counting')
for i in range(num):
pass
print('Done')
@flask_app.route('/')
def do_count():
count.delay(1000000)
return 'Task scheduled'
We create the image celery-app
with this Dockerfile and add it to an image registry that is
accessible from the cluster.
FROM python:alpine
WORKDIR /app
RUN pip install "celery[redis]" flask
COPY app.py .
EXPOSE 5000
Deploy the basic system
First, we deploy the Redis component that will serve as task queue.
apiVersion: apps/v1
kind: Deployment
metadata:
name: celery-redis
labels:
k8s-app: celery-redis
spec:
replicas: 1
selector:
matchLabels:
k8s-app: celery-redis
template:
metadata:
labels:
k8s-app: celery-redis
spec:
containers:
- name: redis
image: redis:alpine
command: ["redis-server"]
ports:
- containerPort: 6379
name: server
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: celery-redis
labels:
k8s-app: celery-redis
spec:
selector:
k8s-app: celery-redis
ports:
- name: server
port: 6379
protocol: TCP
Then, we create and expose the web server that will act as frontend.
apiVersion: apps/v1
kind: Deployment
metadata:
name: celery-web
labels:
k8s-app: celery-web
spec:
replicas: 1
selector:
matchLabels:
k8s-app: celery-web
template:
metadata:
labels:
k8s-app: celery-web
spec:
containers:
- name: web
image: celery-app
env:
- name: FLASK_APP
value: "app.py"
- name: REDIS_SERVER
value: "celery-redis:6379/0"
command: ["flask"]
args: ["run", "--host=0.0.0.0"]
ports:
- containerPort: 5000
name: web
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: celery-web
labels:
k8s-app: celery-web
spec:
selector:
k8s-app: celery-web
ports:
- name: web
port: 5000
protocol: TCP
Finally, we add the deployment for the workers with one replica, for now. Note that we need to specify the
spec.requirements.requests.cpu
parameter to be able to autoscale the number of replicas. The
percentage that we set in the last object refers to this field.
apiVersion: apps/v1
kind: Deployment
metadata:
name: celery-worker
labels:
k8s-app: celery-worker
spec:
replicas: 1
selector:
matchLabels:
k8s-app: celery-worker
template:
metadata:
labels:
k8s-app: celery-worker
spec:
containers:
- name: worker
image: celery-app
imagePullPolicy: Always
env:
- name: REDIS_SERVER
value: "celery-redis:6379/0"
command: ["celery"]
args: ["-A", "app", "worker", "--loglevel=info"]
resources:
limits:
cpu: 100m
requests:
cpu: 60m
Add pod autoscaling to the workers
At this point the system is fully functional, and the only worker handles the tasks added to the queue. However, we want to leverage the available resources of the cluster when needed and possible, so we are going to add pod autoscaling to the worker deployment.
The Kubernetes object in charge of this is the Horizontal Pod Autoscaler, or HPA, which targets a Deployment or Replica Controller and modifies it. The parameteres of the HPA define a maximum and minimum number of replicas, and the CPU usage threshold to trigger the scale.
In this case we want to add new replicas of the worker when the existing ones have an average usage of 50%
of the requested CPU, up to a total of 3 replicas. Since we set the spec.requirements.requests.cpu
value
before to 60m
, the actual threshold is an average of 30m
of CPU.
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: celery-worker
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: celery-worker
minReplicas: 1
maxReplicas: 3
targetCPUUtilizationPercentage: 50
Test the system
We can now create a pod in the cluster and generate HTTP requests to the web server at
http://celery-web.default.svc.cluster.local:5000
. If we monitor the HPA with kubectl
, we can
see the increase of the CPU usage, and the consequent creation of new replicas.
$ kubectl get hpa # Before generating traffic
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
default celery-worker Deployment/celery-worker 5%/50% 1 3 1 20m
$ kubectl get hpa # The first pod begins handling tasks
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
default celery-worker Deployment/celery-worker 120%/50% 1 3 1 29m
$ kubectl get hpa # New replicas are created
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
default celery-worker Deployment/celery-worker 120%/50% 1 3 3 29m
Once the workers finish processing the tasks, we see that the CPU usage falls down again. After a couple of minutes the HPA deletes two of the replicas, because they are no longer needed.
$ kubectl get hpa # No more tasks being processed
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
default celery-worker Deployment/celery-worker 7%/50% 1 3 3 30m
$ kubectl get hpa # The number of replicas is downscaled
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
default celery-worker Deployment/celery-worker 6%/50% 1 3 1 35m
Next steps in autoscaling
Depending on the kind of system, the usage spikes may be quite abrupt and require way more cluster resources than it would be possible to provision in advance. For these situations, the Cluster Autoscaler , also known as CA, can be a game changer (in public cloud environments only for now).
This component monitors the cluster and spins up new nodes when it detects that pods are not being created due to lack of resources. Likewise, it detects when the average node usage is low and deletes some of them to obtain higher pod density.
When used in combination with HPA, CA helps to keep the cloud resources used by the Kubernetes cluster to the strict minimum needed at each moment to deliver the workload, hence optimizing the costs.