Machine Learning in Predictive Maintenance

Last week, we looked at the deployment of applications on a Kubernetes cluster. This was one step in preparation of a bigger project, one which involves using machine learning(ML) for predictive maintenance of industrial processes. 

Two engineers talking while standing next to some machines in a factory.

What is Predictive Maintenance?

Predictive maintenance is the process of using historical data to find patterns that can help to predict future failures of industrial processes, AKA machine failure. Traditionally, predictive maintenance has been done by hard-coded rules that say things like, “when x rotations exceeds y threshold, this machine will fail.” The problem with these hard-coded rules is that they all have to be created by patterns that are readily apparent from the data (since they are written by humans).

How Machine Learning can help

Machine learning algorithms can pick up on patterns that would not be visible to the human eye. A ML model can take in a lot of data at a time and use it to “train” itself to find patterns from data that is otherwise unrelated. 

Without ML facilities are prone to performing unnecessary maintenance. By implementing machine learning, you can not only know when things are going to fail, but you can also reduce the wasted money that comes from over-maintenance. This will reduce cost while not risking downtime or the safety of workers. 

There are multiple ways that ML models can be implemented for predictive maintenance purposes. We will explore two common types.

Regression Approach:

This approach is used to predict the Remaining Useful Life (RUL) of an asset. It will tell how many days or cycles are left before a system fails. To implement this, you need static and historical data, with every event labeled. You can only implement this if there is only one type of failure. If there are multiple, you will need multiple models. This is the well-known linear regression approach, which finds a line of best fit according to the data provided.

Classification Approach:

This approach is used to predict if a machine will fail in the next N days or cycles. This can be more useful and accurate than using a regression, because you may only need to know if the system will fail soon. Implementation also requires static and historical labeled data. 

What are we trying to do?

Our approach involves time-series analysis to forecast when machines are due for maintenance.  Alongside that, a Remaining Useful Life model will most likely also be implemented just to have more information. For now, we are using free datasets available at the UCI Machine Learning Repository (UCI Machine Learning Repository) that are specifically made for the purpose of predictive maintenance. Once we have a working model, we can collect our own data and use that instead.

The model will be deployed on the Kubernetes cluster for purposes of redundancy.

Next week we will further explore the creation of our machine learning model. We will discuss cleaning and preparing data, and designing neural networks.

Roadblocks to the first Deployment (K3s)

Last week, we looked at setting up a Kubernetes cluster on three Jetson Nanos, to prepare them for application deployment. That’s what we tackled this week, and in today’s blog post we will look at the obstacles we’ve encountered.

For these examples, we have been trying to deploy a test application using the instructions from the Rancher docs. This uses a pre-built container provided by Rancher. To recreate my deployment do the following.

  1. Create a file called testdeploy.yaml and paste the following inside:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mysite
  labels:
    app: mysite
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mysite
  template:
    metadata:
      labels:
        app: mysite
    spec:
      containers:
        - name: mysite
          image: kellygriffin/hello:v1
          ports:
            - containerPort: 80
  1. Once you’ve saved the file, use the following command to run it. 
kubectl apply -f testdeploy.yaml
  1. View your deployment using:
kubectl get pods

The CrashLoopBackOff error:

If you are running the above steps on a Jetson Nano cluster as well, you may see the common CrashLoopBackOff error.

NAME                      READY   STATUS             RESTARTS      AGE
mysite-57b5b46f97-rfcgx   0/1     CrashLoopBackOff   5 (39s ago)   3m30s

This error is well documented for Kubernetes in general, and is often caused by insufficient resources, trying to access a locked file, or a locked database. However, in the case of the Jetson Nano cluster, I believe the reason for the CrashLoopBackOff error is the ARM_64 architecture it runs on and shares with the new M1 Macbook Pro. A lot of pre-made containers may be designed specifically to be run on x64 architectures or even x86 making them incompatible with the ARM_64 architecture. 

To test this theory, I tried to deploy the Nginx service on the cluster. Nginx should work as it is made to work with ARM_64 architectures. I ran the following, to create a deployment of Nginx using the Nginx image. 

kubectl create deployment nginx --image=nginx

After running kubectl get pods, we can see that the Nginx service is running fine on the cluster, while the testdeploy continues to restart over and over after crashing.

NAME                      READY   STATUS             RESTARTS        AGE
mysite-57b5b46f97-rfcgx   0/1     CrashLoopBackOff   7 (2m31s ago)   13m
nginx-85b98978db-gc6ms    1/1     Running            0               20s

Many premade containers available to use for testing may be designed for different architectures from the ARM_64, which caused the frustrating CrashLoopBackOff error. As long as your Nginx deployment works, your cluster should be working, it will just require learning and much trial and error to deploy properly. 

The ImagePullBackOff error:

The ImagePullBackOff error is a very finicky error. I encountered it the first few times trying to deploy the Nginx pod using the steps outlined above. 

NAME                      READY   STATUS             RESTARTS      AGE
mysite-57b5b46f97-rfcgx   0/1     ImagePullBackOff   0 (9s ago)   4m30s

At first I had no idea how to get around it, and I assumed it was a problem with the place it was pulling the image from. Maybe the image was locked, or corrupted I thought. But after retrying multiple times, I let it run for 8 or so minutes. It retried pulling multiple times, and then one time it just worked. If you’re encountering this error, I recommend letting it run for at least 10 or so minutes before trying something else. 

Those are a few of the more problematic errors I encountered when trying to get to the first deployment. The next step will be to deploy a machine learning model on the cluster. This will require a lot more in terms of containerization, and we will explore that containerization and the creation of the model itself next week.

K3S Cluster on Jetson Nano

Jetson Nano Development Board

Setting up a Kubernetes Cluster on Jetson Nano (with k3s)

The Jetson Nano is an easily accessible, yet powerful single board computer built to deploy machine learning applications and more. Kubernetes is the most popular orchestration system used to manage and automate your application deployment, through a Kubernetes Cluster. K3s is the more lightweight version of Kubernetes.

This week we look at setting up a Kubernetes cluster on two Jetson Nanos, although you can do it with as many worker nanos as you’d like. It can be tricky to do, especially with no guide that outlines how to do it specifically for the Jetson Nano’s unique architecture. Although there are many other guides out there, this one is specifically for the Nano and will address any specific issues that come with that.

What We Will Use:

  • 2 fresh Jetson Nanos running Ubuntu 18.04, with Jetpack SDK 4.5 installed.

Preliminary Steps:

The first thing we need to decide is which jetson will be our master node, and which one(s) will be our worker nodes. The master node is the nano that you will deploy the cluster from, and the worker node(s) will join the cluster. Name them accordingly. I have named mine master and node1.

Then, use SSH to work on all the Nanos easily. Use this:

ssh user@ <target ip address>

And then login as you normally would on the nano you are SSHing into.

We will need curl for this, so Use this to install curl on all nanos.

sudo apt-get install curl

To make things easier, I recommend running “sudo su” to avoid having to type sudo before everything.

1. Installing Master Node:

We will now configure the Master Node. On your master nano only, run:

curl -sfL https://get.k3s.io | sh -s - --no-deploy traefik --write-kubeconfig-mode 644 --node-name k3s-master-01

 This installs k3s and starts it, deploys a cluster, and sets this node as the master.

You can view that your master node is online by running:

kubectl get nodes

You should see your “k3s-master-01” node is the only one in the cluster.

For the next step, which is installing the worker nodes, we will need the master node’s token. To get it, run this:

cat /var/lib/rancher/k3s/server/node-token

And copy the token for the next step.

2. Installing Worker Node(s):

We will now configure the Worker Nodes. Do this on all the worker nodes you have.

curl -sfL https://get.k3s.io | K3S_NODE_NAME=k3s-worker-01 K3S_URL=https://<IP>:6443 K3S_TOKEN=<TOKEN> sh -

Replace <IP> and <TOKEN> with the master node’s ip address (you can get this by running ifconfig) and the token you previously saved.

Now, when you run “kubectl get nodes” on the master node, you can see that the worker has joined.

3. Bringing up the dashboard

At this step, you’re pretty much done, your cluster is up, and you can begin deploying containers. I will show you now how to bring up the dashboard to view all your containers once they are deployed.

First, run this on the master node. This will deploy the kubernetes dashboard.

GITHUB_URL=https://github.com/kubernetes/dashboard/releases
VERSION_KUBE_DASHBOARD=$(curl -w '%{url_effective}' -I -L -s -S ${GITHUB_URL}/latest -o /dev/null | sed -e 's|.*/||')
sudo k3s kubectl create -f https://raw.githubusercontent.com/kubernetes/dashboard/${VERSION_KUBE_DASHBOARD}/aio/deploy/recommended.yaml

Now we have to create a few files:

  • Dashboard.admin-user.yml (do vim dashboard.admin-user.yml), press i to enter insert mode, and paste the following.
apiVersion: v1
kind: ServiceAccount
metadata:
  name: admin-user
  namespace: kubernetes-dashboard

Press esc then :x to exit the vim editor and save.

  • Dashboard.admin-user-role.yml (do vim dashboard.admin-user-role.yml), press i to enter insert mode, and paste the following.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: admin-user
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: admin-user
  namespace: kubernetes-dashboard

Save the file the same way as the previous one.

Now we will deploy the admin-user configuration. Run:

k3s kubectl create -f dashboard.admin-user.yml -f dashboard.admin-user-role.yml

Now we will access the token needed to access the dashboard locally in a web browser. Run:

k3s kubectl -n kubernetes-dashboard describe secret admin-user-token | grep '^token'

And keep note of the very long token.

Now we will create a secure channel to the cluster. To do this, run:

k3s kubectl proxy

You should see:

Starting to serve on 127.0.0.1:8001

This means that the dashboard is being served at 127.0.0.1, on port 8001. At this link you will find your dashboard:

http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/

It will prompt you for the token we copied in the previous step. Paste it here, and you will have access to the kubernetes dashboard.

And you’re done! Once you deploy containerized apps, you will be able to see and manage them in the dashboard.

Other useful commands

To shut off your cluster, run:

k3s-killall.sh

To delete your dashboard, run:

sudo k3s kubectl delete ns kubernetes-dashboard
sudo k3s kubectl delete clusterrolebinding kubernetes-dashboard
sudo k3s kubectl delete clusterrole kubernetes-dashboar
d

To restart the cluster later, run:

sudo systemctl restart k3s

In next week’s blog post, we will look at containerizing apps and deploying them. We will also look at managing them between nodes, and using the dashboard more.