Anomaly detection with accelerometer data

In last week’s blog post, we covered getting our accelerometer data and verifying it. In this blog post we will wrap up this project, by collecting the rest of our dataset and then training and testing our model.

Project goals:

This project is one that we have been working on consistently over the past few months. The main goal is to create a machine learning algorithm that can learn to identify anomalies in sets of data that the human eye would otherwise not see. Furthermore, we want to make it easily applicable to different sets of data. As long as you have “normal” data, you should be able to implement the algorithm.

Equipment used:

  • Jetson Nano
  • BMA220 Accelerometer
  • AC motor (or any motor with significant vibration)
  • JupyterLab

Collecting anomaly data:

Last week we covered how to get x, y, and z output. Many industrial plants where this kind of predictive maintenance is needed are always collecting their data. They have logs of normal and failure logs, which we will be emulating. The data we collected from the normal running motor will be used as our “normal” data. 

However, we still need our “anomaly” data. To artificially inject an anomaly, I recorded about 100 data points from the accelerometer where I was shaking it. This data looks subtly different from our “normal” data. I then marked each data point as either 0 (normal) or 1 (anomaly), and shuffled the whole dataset. This is what it should look like:

We now have a full dataset of about 1000 “normal” labeled data points and 100 “anomaly” labeled data points. 

Model creation and training:

Firstly, we need to import some libraries. Use these import statements to import libraries we need for data handling, obtaining accuracy, and just creating our model. 

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier

We can then read in our .csv file that contains our data, using this line. Note that your .csv might be named differently. 

df = pd.read_csv('vibrationdata.csv')

We can use df.head() to make sure our data is properly imported. You should see an output like this:

Next, we need to split our test and training data, as well as split our features and labels. Use these lines to split our features and labels into an x and y array, and then into a training set with 70% of the data and a testing set with 30%.

x=df[['x','y','z']]
y=df['status']

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3)

We can use the following code to create an instance of a Random Forest Classifier model, which was deemed the most accurate in our past testing. This instance uses 100 decision trees, as declared by the n_estimators parameter.

model = RandomForestClassifier(n_estimators=100)

Finally, we can fit our model with our training x and y datasets. 

model.fit(x_train, y_train)

Testing the model’s accuracy:

We can use the following line to get our prediction for our testing data. 

y_pred = model.predict(x_test)

Then use the following to compare the accuracy between our prediction and the actual output. 

print("Accuracy:",accuracy_score(y_test, y_pred))

For my dataset, our model yielded an accuracy result of 95%. 

Future usefulness of the model:

This model setup is very useful for the simple fact that it can be used with many different types of data, as long as you can obtain some “normal” data and a few anomalies. 

With the implementation of live data processing, you will have a model that can alert you as soon as anything different occurs. And as long as you keep updating the model with new training data, it will be able to detect new problems before they become costly. 

This concludes the main goal of our project, although there is still room for improvement. With more extensive data collection, we could get an even more accurate model. 

Getting input for predictive maintenance 2

In last week’s blog post, we went over our plan for collecting data for predictive maintenance. We also looked at the hardware configuration and began setting that up. This week, we will finalize getting our output data and do some final steps before we can train our model.

Handling x, y, and z output:

In an older blog post, we created a python program that will read the x, y, and z data from our BMA220 accelerometer and print it out. To use this data in our machine learning model, we would need it to be put into a .csv format. We will need to update our python code to do this. 

Our main function should look something like this: 

while (True):
        xdata = i2cbus.read_byte_data(
            i2caddress, 0x4)  # read the value of x data
        ydata = i2cbus.read_byte_data(
            i2caddress, 0x6)  #read the value of y data
        zdata = i2cbus.read_byte_data(
            i2caddress, 0x8)  #read the value of z data
        data = [xdata, ydata, zdata]
        print(data)  # print the value of x y and z data

We can add this code inside our while loop to create a .csv file named outputdata, and write each loop iteration’s data to the outputdata csv:

with open('outputdata.csv', 'a') as file:
            writer = csv.writer(file)
            writer.writerow(data)

To keep track of how many rows of data are added, we can create a variable called datacount outside our while loop:

    datalog = 0

We can add the following code at the end of our while loop. This increases the datacount variable by one each time the loop adds an entry to the outputdata csv. It then prints out the number of entries that have been recorded in the outputdata csv:

datalog += 1
        time.sleep(1)
        print(datalog, "entries recorded")

After implementing all these changes to our python code and running it, a file called outputdata.csv should be created. Depending on how long you let the program run, the file should look like this with more or less entries.

8,12,56
8,12,56
12,8,60
8,8,60
8,8,60
4,8,56
12,8,60
4,8,56
8,12,56
8,12,60
8,8,60
8,8,60

The values may differ depending on how your accelerometer is positioned. 

Analyzing our data:

When I first looked at the x, y, z data in the outputdata file, I was surprised by the large changes I saw, when the accelerometer was seemingly not moving. I wasn’t sure if this data was normal and just due to noise, or if it was a defective accelerometer. To get a better understanding of it, I plotted the data to see if it was moving within a certain range, or if it was just all over the place.

Doing this is good practice in any case to make sure our training data isn’t full of strange outliers.

  • Plotting the data:

To plot our x, y, and z data, we will use the matplotlib library. Specifically mplot3d and pyplot from it. Include this code at the top to use it:

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import pandas

Then, we will use pandas to read our csv file and put it into a pandas dataframe:

points = pandas.read_csv('outputdata.csv')

Use the following piece of code to create a figure and separate our data into x, y, and z.

Note that to do this you will need to add x,y,z as the first entry in the outputdata csv.

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

x = points['x'].values
y = points['y'].values
z = points['z'].values

Finally, we can plot it as a 3D scatterplot. 

ax.scatter(x, y, z, c='r', marker='o')

plt.show()

We should get an output like this: 

This shows us that our data mainly is the same points over and over, which shows that it’s mainly due to noise and not a faulty accelerometer.

In the next blog post, we will make sure we properly collect test and training data, and begin to feed it to our model. 

Getting input for predictive maintenance 1

In last week’s blog post, we worked on setting up our accelerometer to input data to our jetson nano. We can now access the x, y, and z data of the BMA220 accelerometer. This week, we will cover our plan to implement the BMA220 for use with predictive maintenance.

The configuration:

The main components in this project are the BMA220 accelerometer, the jetson nano, an arduino, and an ac/dc motor. They will be configured as shown below.

The jetson nano will process the input from the BMA220, which will be fixed on to an ac/dc motor. The motor’s speed will be controlled by the arduino. In this project, it is important that all the components are fixed to a board. Any slight play in their stationary positions may throw off the data collection.

Collecting data:

To do any kind of machine learning, we first need a large set of data. In this case, when we are trying to predict failure, we will need ‘normal’ data, and ‘failure’ data. This is data taken when the machine (in this case a motor) is running fine, and when it is in a failed state or abnormal state. 

We will run the motor 24/7 for a week or so to collect a lot of normal data. Then, we will inject problems that cause the data to become abnormal. This can include things like messing with the bearing, bumps and shakes, etc. We will separate the datasets, train the model to recognize any abnormal data, and to notify us before something costly could happen.

Why is this useful:

Having a model like this is useful for many reasons. The main one is interchangeability. A model like the one we are making, that is trained to spot any abnormal data, can be implemented on various pieces of machinery. All that is needed is a large set of data, and since most machinery records its data anyway, this is easy to get. 

While you could theoretically hard-code all the rules to recognize failure for a certain machine, this is a lengthy process that will require a lot of human eyes looking over huge datasets. It quite literally is like searching for a needle in a haystack to find the triggers that signal the machine’s failure. With machine learning, you can easily be notified of any common failures, and even new failures that have never occurred before.

In next week’s blog post, we will have our motor and will be able to create our project for data collection. 

Other useful resources:

  • Machine learning techniques

There are various ways to implement machine learning in predictive maintenance. The one we are using here is to flag anomalous behavior. This article goes over the other ways ML can be used and how the steps differ.