By Nirajkanth Ravichandran
In the realm of data-intensive applications and large-scale data processing, the need for efficient and scalable computing resources is paramount. Kubernetes, an open-source container orchestration platform, has emerged as a powerful tool for managing and scaling containerized applications. When combined with Dask, a flexible parallel computing library in Python, you have the potential to create a highly resilient and dynamic data processing environment. In this article, we’ll explore the process of deploying a Dask cluster on top of a Kubernetes cluster, utilizing Minikube within the Windows Subsystem for Linux (WSL) environment.
Setting Up the Kubernetes Cluster
Before diving into deploying a Dask cluster, we need to have a Kubernetes cluster up and running on our local machine. While Minikube is a popular choice, setting it up within the WSL environment provides a powerful development setup.
Images from : https://codecrux.com/blog/minikube-tips.html
Selecting Minikube for Local Kubernetes in WSL
Minikube is an ideal choice for local Kubernetes development due to its simplicity and rapid setup. When combined with WSL, it enables a seamless Linux environment within a Windows operating system.
To begin, we need to meet the minimum requirements set by Minikube:
- At least 2 CPU cores
- 2 GB of available memory
- 20 GB of available disk space
- An active internet connection
- A container or virtual machine manager (For our guide, we’ll use Docker.)
Installing and Configuring Minikube
To install Minikube within the WSL environment, follow these steps:
- Enable WSL: Ensure that you have Windows Subsystem for Linux (WSL) enabled on your Windows machine. You can follow the official Microsoft documentation to set this up.
- Install Docker Desktop: Install Docker Desktop for Windows, which will be used to manage containers within the WSL environment. Then, enable the ‘Use the WSL2 based engine’ and ‘wsl distribution’ in the setting as shown in the following figures in my case it is ubuntu.
3. Install kubectl: Kubectl is the command-line tool used to interact with Kubernetes clusters. Install it within your WSL environment according to the instructions for Linux.
curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl
4. Install Minikube: Open a WSL terminal and install Minikube using a package manager or by downloading the binary from the official GitHub repository. You can simply use following commands.
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube
5. Verify Minikube installation by version checking and start Minikube cluster as follows,
minikube version
minikube start --driver=docker
6. Verify Cluster Status: After a successful startup, you can verify the status of your cluster as:
minikube status
Deploying a Dask Cluster on Kubernetes within WSL
With our Minikube Kubernetes cluster up and running within the WSL environment, we can proceed to deploy a Dask cluster onto this infrastructure.
Understanding Dask
Dask is a Python library that allows parallel and distributed computing. It enables us to scale our computations from a single machine to a cluster of machines, making it suitable for processing large datasets and complex computations.
Deploying Dask on Kubernetes within WSL
There are two primary methods for running a Kubernetes cluster: the classic approach and the operator method. However, it’s important to note that the classic approach is becoming deprecated. Therefore, for the purposes of this guide, I recommend opting for the operator-based approach.
To employ the Dask operator, the initial step involves installing a custom resource definition (CRD). This task can be accomplished through Helm, the package manager, following these steps:
- Install Helm: Helm is a package manager for Kubernetes applications. We’ll use it to deploy Dask using pre-configured Helm charts. Install Helm within your WSL environment according to the official documentation for Linux.
- Add Dask Helm Repository: Add the Dask Helm repository to your Helm configuration within the WSL terminal:
helm install --repo https://helm.dask.org --create-namespace -n dask-operator --generate-name dask-kubernetes-operator
Verifying the Kubernetes Cluster
Before proceeding further, let’s perform a quick check to ensure that our Kubernetes cluster is set up successfully. You can run the following Python code snippet within your preferred Python/Jupyter environment:
Verify dask-operator is running as follows.
kubectl get pods -A
Then execute the following code in jupyter environment where we need to install dask and dask-kubernetes.
from dask_kubernetes.operator import KubeCluster
cluster = KubeCluster(name="daskmlcluster",
image='ghcr.io/dask/dask:latest',
n_workers=2,
resources={"requests": {"memory": "0.5Gi"}, "limits": {"memory": "1.5Gi"}},
env={"FOO" : "barr"}
)
cluster
Let’s check again the details of pods, as shown in this figure we can observer mycluster-scheduler, and 2 workers under default namespace which ensures the dask Kubernetes cluster is running successfully.
Great! now we have completed deploying a Dask Cluster on Kubernetes using Minikube on WSL
Conclusion
In this article, we’ve explored the process of deploying a Dask cluster on top of a Kubernetes cluster using the operator method, utilizing Minikube within the Windows Subsystem for Linux (WSL) environment. By combining the strengths of Kubernetes’ container orchestration capabilities with Dask’s parallel computing functionalities, we’ve created a powerful setup for processing large datasets and complex computations. This deployment allows us to harness the benefits of distributed computing within a seamless Linux environment on a Windows machine, without the complexities of managing the underlying infrastructure. As you delve deeper into the world of Kubernetes and Dask, you’ll find endless possibilities for optimizing your data processing workflows.
Visit Axiata Digital Labs to find out more about our products and services.