In last week’s blog post, we covered getting our accelerometer data and verifying it. In this blog post we will wrap up this project, by collecting the rest of our dataset and then training and testing our model.
This project is one that we have been working on consistently over the past few months. The main goal is to create a machine learning algorithm that can learn to identify anomalies in sets of data that the human eye would otherwise not see. Furthermore, we want to make it easily applicable to different sets of data. As long as you have “normal” data, you should be able to implement the algorithm.
- Jetson Nano
- BMA220 Accelerometer
- AC motor (or any motor with significant vibration)
Collecting anomaly data:
Last week we covered how to get x, y, and z output. Many industrial plants where this kind of predictive maintenance is needed are always collecting their data. They have logs of normal and failure logs, which we will be emulating. The data we collected from the normal running motor will be used as our “normal” data.
However, we still need our “anomaly” data. To artificially inject an anomaly, I recorded about 100 data points from the accelerometer where I was shaking it. This data looks subtly different from our “normal” data. I then marked each data point as either 0 (normal) or 1 (anomaly), and shuffled the whole dataset. This is what it should look like:
We now have a full dataset of about 1000 “normal” labeled data points and 100 “anomaly” labeled data points.
Model creation and training:
Firstly, we need to import some libraries. Use these import statements to import libraries we need for data handling, obtaining accuracy, and just creating our model.
import pandas as pd
We can then read in our .csv file that contains our data, using this line. Note that your .csv might be named differently.
df = pd.read_csv('vibrationdata.csv')
We can use df.head() to make sure our data is properly imported. You should see an output like this:
Next, we need to split our test and training data, as well as split our features and labels. Use these lines to split our features and labels into an x and y array, and then into a training set with 70% of the data and a testing set with 30%.
We can use the following code to create an instance of a Random Forest Classifier model, which was deemed the most accurate in our past testing. This instance uses 100 decision trees, as declared by the n_estimators parameter.
model = RandomForestClassifier(n_estimators=100)
Finally, we can fit our model with our training x and y datasets.
Testing the model’s accuracy:
We can use the following line to get our prediction for our testing data.
y_pred = model.predict(x_test)
Then use the following to compare the accuracy between our prediction and the actual output.
For my dataset, our model yielded an accuracy result of 95%.
Future usefulness of the model:
This model setup is very useful for the simple fact that it can be used with many different types of data, as long as you can obtain some “normal” data and a few anomalies.
With the implementation of live data processing, you will have a model that can alert you as soon as anything different occurs. And as long as you keep updating the model with new training data, it will be able to detect new problems before they become costly.
This concludes the main goal of our project, although there is still room for improvement. With more extensive data collection, we could get an even more accurate model.