The Roadmap to Model Development

Last week, we looked at how Machine Learning can be useful in industry. Today we will explore the path to machine learning model development and the individual steps that go into that. The above roadmap outlines the four major steps in yellow on the left, and the smaller substeps on the right.

What is Data Preprocessing?

Data Preprocessing is one of the most important steps in tackling a machine learning problem. Depending on your dataset, you may have problems like missing values, useless features, and other types of noise. In the data preprocessing step, you focus on removing features that hold useless data, and addressing the missing values. We will use the Pandas library to work with our data. This is what our raw, unprocessed data looks like.

We can see that there are some features that would provide nothing of value to the machine learning model. These are features like the UDI and Product ID. They are just labels used to name each entry, so they are not actually useful. After removing them, our data is much more model-ready.

In our specific dataset, there were no missing or null values. In the real world this is usually not the case, and you should remove the features(columns) with null values entirely.

What is Exploratory Data Analysis?

Exploratory Data Analysis is a key step that involves the initial investigating of your dataset to find anything unusual or helpful. For example, when performing EDA, you may see that Feature X correlates directly with the output data. This can be helpful when selecting a model, and the inputs that go into the model. Exploratory Data Analysis involves a lot of charting and graphing to gain a better understanding of the summary of your dataset.

Using the .describe()  function from the Pandas library is useful to quickly view the stats of your data. You can see the mean for each numerical feature, the standard deviation, min and max, and percentiles.

Another important part of EDA is analyzing the skewness of your data. Using .skew() from the Pandas library, we can see just how skewed each feature is. If the skewness is between -0.5, and 0.5, the data is almost symmetrical. If it’s between -1 and 0.5 (negatively skewed) or between 0.5 and 1 (positively skewed), the data is slightly skewed. And lower than -1 or greater than 1 the data is extremely skewed.

Using Plotly, you can make many different graphs to view your data.

You can mix and match features to see if there is correlation. Here we compare air temperature and failure type.

Keep in mind that Exploratory Data Analysis is meant just that, Exploratory. There is no need to compare every single feature, it is only your initial exploration. EDA is not only useful to you, but it’s good practice so other engineers can look at your notebooks and easily understand it as they read along.

In the next blog post we will return for the third step in our Roadmap To Model Development which includes more data processing. Then, we will finally train and test our model(s).

Machine Learning in Predictive Maintenance

Last week, we looked at the deployment of applications on a Kubernetes cluster. This was one step in preparation of a bigger project, one which involves using machine learning(ML) for predictive maintenance of industrial processes. 

What is Predictive Maintenance?

Predictive maintenance is the process of using historical data to find patterns that can help to predict future failures of industrial processes, AKA machine failure. Traditionally, predictive maintenance has been done by hard-coded rules that say things like, “when x rotations exceeds y threshold, this machine will fail.” The problem with these hard-coded rules is that they all have to be created by patterns that are readily apparent from the data (since they are written by humans).

How Machine Learning can help

Machine learning algorithms can pick up on patterns that would not be visible to the human eye. A ML model can take in a lot of data at a time and use it to “train” itself to find patterns from data that is otherwise unrelated. 

Without ML facilities are prone to performing unnecessary maintenance. By implementing machine learning, you can not only know when things are going to fail, but you can also reduce the wasted money that comes from over-maintenance. This will reduce cost while not risking downtime or the safety of workers. 

There are multiple ways that ML models can be implemented for predictive maintenance purposes. We will explore two common types.

Regression Approach:

This approach is used to predict the Remaining Useful Life (RUL) of an asset. It will tell how many days or cycles are left before a system fails. To implement this, you need static and historical data, with every event labeled. You can only implement this if there is only one type of failure. If there are multiple, you will need multiple models. This is the well-known linear regression approach, which finds a line of best fit according to the data provided.

Classification Approach:

This approach is used to predict if a machine will fail in the next N days or cycles. This can be more useful and accurate than using a regression, because you may only need to know if the system will fail soon. Implementation also requires static and historical labeled data. 

What are we trying to do?

Our approach involves time-series analysis to forecast when machines are due for maintenance.  Alongside that, a Remaining Useful Life model will most likely also be implemented just to have more information. For now, we are using free datasets available at the UCI Machine Learning Repository (UCI Machine Learning Repository) that are specifically made for the purpose of predictive maintenance. Once we have a working model, we can collect our own data and use that instead.

The model will be deployed on the Kubernetes cluster for purposes of redundancy.

Next week we will further explore the creation of our machine learning model. We will discuss cleaning and preparing data, and designing neural networks.

K3S Cluster on Jetson Nano

Jetson Nano Development Board

Setting up a Kubernetes Cluster on Jetson Nano (with k3s)

The Jetson Nano is an easily accessible, yet powerful single board computer built to deploy machine learning applications and more. Kubernetes is the most popular orchestration system used to manage and automate your application deployment, through a Kubernetes Cluster. K3s is the more lightweight version of Kubernetes.

This week we look at setting up a Kubernetes cluster on two Jetson Nanos, although you can do it with as many worker nanos as you’d like. It can be tricky to do, especially with no guide that outlines how to do it specifically for the Jetson Nano’s unique architecture. Although there are many other guides out there, this one is specifically for the Nano and will address any specific issues that come with that.

What We Will Use:

  • 2 fresh Jetson Nanos running Ubuntu 18.04, with Jetpack SDK 4.5 installed.

Preliminary Steps:

The first thing we need to decide is which jetson will be our master node, and which one(s) will be our worker nodes. The master node is the nano that you will deploy the cluster from, and the worker node(s) will join the cluster. Name them accordingly. I have named mine master and node1.

Then, use SSH to work on all the Nanos easily. Use this:

ssh user@ <target ip address>

And then login as you normally would on the nano you are SSHing into.

We will need curl for this, so Use this to install curl on all nanos.

sudo apt-get install curl

To make things easier, I recommend running “sudo su” to avoid having to type sudo before everything.

1. Installing Master Node:

We will now configure the Master Node. On your master nano only, run:

curl -sfL https://get.k3s.io | sh -s - --no-deploy traefik --write-kubeconfig-mode 644 --node-name k3s-master-01

 This installs k3s and starts it, deploys a cluster, and sets this node as the master.

You can view that your master node is online by running:

kubectl get nodes

You should see your “k3s-master-01” node is the only one in the cluster.

For the next step, which is installing the worker nodes, we will need the master node’s token. To get it, run this:

cat /var/lib/rancher/k3s/server/node-token

And copy the token for the next step.

2. Installing Worker Node(s):

We will now configure the Worker Nodes. Do this on all the worker nodes you have.

curl -sfL https://get.k3s.io | K3S_NODE_NAME=k3s-worker-01 K3S_URL=https://<IP>:6443 K3S_TOKEN=<TOKEN> sh -

Replace <IP> and <TOKEN> with the master node’s ip address (you can get this by running ifconfig) and the token you previously saved.

Now, when you run “kubectl get nodes” on the master node, you can see that the worker has joined.

3. Bringing up the dashboard

At this step, you’re pretty much done, your cluster is up, and you can begin deploying containers. I will show you now how to bring up the dashboard to view all your containers once they are deployed.

First, run this on the master node. This will deploy the kubernetes dashboard.

GITHUB_URL=https://github.com/kubernetes/dashboard/releases
VERSION_KUBE_DASHBOARD=$(curl -w '%{url_effective}' -I -L -s -S ${GITHUB_URL}/latest -o /dev/null | sed -e 's|.*/||')
sudo k3s kubectl create -f https://raw.githubusercontent.com/kubernetes/dashboard/${VERSION_KUBE_DASHBOARD}/aio/deploy/recommended.yaml

Now we have to create a few files:

  • Dashboard.admin-user.yml (do vim dashboard.admin-user.yml), press i to enter insert mode, and paste the following.
apiVersion: v1
kind: ServiceAccount
metadata:
  name: admin-user
  namespace: kubernetes-dashboard

Press esc then :x to exit the vim editor and save.

  • Dashboard.admin-user-role.yml (do vim dashboard.admin-user-role.yml), press i to enter insert mode, and paste the following.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: admin-user
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: admin-user
  namespace: kubernetes-dashboard

Save the file the same way as the previous one.

Now we will deploy the admin-user configuration. Run:

k3s kubectl create -f dashboard.admin-user.yml -f dashboard.admin-user-role.yml

Now we will access the token needed to access the dashboard locally in a web browser. Run:

k3s kubectl -n kubernetes-dashboard describe secret admin-user-token | grep '^token'

And keep note of the very long token.

Now we will create a secure channel to the cluster. To do this, run:

k3s kubectl proxy

You should see:

Starting to serve on 127.0.0.1:8001

This means that the dashboard is being served at 127.0.0.1, on port 8001. At this link you will find your dashboard:

http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/

It will prompt you for the token we copied in the previous step. Paste it here, and you will have access to the kubernetes dashboard.

And you’re done! Once you deploy containerized apps, you will be able to see and manage them in the dashboard.

Other useful commands

To shut off your cluster, run:

k3s-killall.sh

To delete your dashboard, run:

sudo k3s kubectl delete ns kubernetes-dashboard
sudo k3s kubectl delete clusterrolebinding kubernetes-dashboard
sudo k3s kubectl delete clusterrole kubernetes-dashboar
d

To restart the cluster later, run:

sudo systemctl restart k3s

In next week’s blog post, we will look at containerizing apps and deploying them. We will also look at managing them between nodes, and using the dashboard more.

What is Edge AI?

What is Edge AI and How Does It Create a New Paradigm

Edge computing diagram
IEEE Edge Diagram

Artificial intelligence and machine learning (AI/ML) have been around for a while now. Without knowing the intricacies, people still understand that it has powerful uses, such as analyzing data, understanding pictures, or chatting with Siri. The edge is a much newer concept, which I first heard of with the development of 5G.

The edge is data processing with round trip servicing under 20 milliseconds. What does this actually mean? A more practical definition is self-explanatory. The edge is literally the closest point you can put your processing to where you need the result. This leaves a broader concept of edge. If your processing must be done in the cloud, then your edge is simply the closest/fastest servers. If your processing can be done locally, then your edge should be your local processor (smartphone?).

But what do you get when you marry the AI and edge concepts? At one point of time, AI algorithms became practical to implement in the cloud, running on powerful server racks. Since AI was too complex for local devices, the cloud AI was also the edge AI. This is no longer true. Powerhouses such as #arm and #NVIDIA have been making massive breakthroughs in optimizing AI performance. Startup #Hailo has developed their own AI processor. #STM has even put ML cores on individual sensors. AI can now run on embedded processors, sensors, internet gateways, security cameras, etc. The new edge AI is essentially anywhere.

Believe it or not, this new reality is a great thing. Since data is generated and processed locally, the edge AI device does not need to be connected to the internet. This means you can have smart devices that protect your privacy. You can have security systems that cannot be taken down by cutting the internet. AI analytics in remote areas? Got it. Edge enables AI to be anywhere it is useful, when it is useful.

#Millspaw Electronics is launching an edge AI brand, Normal AI. Be sure to connect to learn what we can do for you.