利用python整理凯斯西储大学(CWRU)轴承数据,制作数据集

  • 1 前言
  • 2 制作数据集
    • 2.1 下载数据,初步处理
    • 2.2 上代码

1 前言

大多数文献用CWRU数据制作数据集的思路是差不多的,这里就说一个最常见的:用窗口滑移的方式制作样本,例如每2048个采样点为一个样本。(吐槽一下:我本来以为这个实验是做了好多轴承的实验结果呢,没想到同种情况就只有一个轴承,样本是部分重叠的情况下切出来的,话说这种样本训练出的结果真的可靠吗?)

2 制作数据集

因为最近在尝试LSTM,所以最后做出来的数据集是又分了时间步的,但是不需要的小伙伴可以忽略这一步的操作。

2.1 下载数据,初步处理

官方连接:https://csegroups.case.edu/bearingdatacenter/pages/download-data-file
或者我上传的:https://pan.baidu.com/s/1Faygebmjw3kEPli6ikM0eA
提取码:gdsk

因为我的需求是十种状态,每种状态有120个样本,每个样本2048个采样点,所以选择了12kHz驱动端的数据,利用matlab简单处理了一下,得到了10×121048的.mat文件(千万别问我121048怎么算出来的。。。)

2.2 上代码

import numpy as np
import scipy.io as scio
from random import shuffledef normalize(data):'''(0,1)normalization:param data : the object which is a 1*2048 vector to be normalized '''s= (data-min(data)) / (max(data)-min(data))return  sdef cut_samples(org_signals):''' get original signals to 10*120*2048 samples, meanwhile normalize these samples:param org_signals :a 10* 121048 matrix of ten original signals '''results=np.zeros(shape=(10,120,2048))temporary_s=np.zeros(shape=(120,2048))for i in range(10):s=org_signals[i]for x in range(120):temporary_s[x]=s[1000*x:2048+1000*x]temporary_s[x]=normalize(temporary_s[x])     #顺道对每个样本归一化results[i]=temporary_sreturn resultsdef make_datasets(org_samples):'''输入10*120*2048的原始样本,输出带标签的训练集(占75%)和测试集(占25%)'''train_x=np.zeros(shape=(10,90,2048))train_y=np.zeros(shape=(10,90,10))test_x=np.zeros(shape=(10,30,2048))test_y=np.zeros(shape=(10,30,10))for i in range(10):s=org_samples[i]# 打乱顺序index_s = [a for a in range(len(s))]shuffle(index_s)s=s[index_s]# 对每种类型都划分训练集和测试集train_x[i]=s[:90]test_x[i]=s[90:120]# 填写标签label = np.zeros(shape=(10,))label[i] = 1train_y[i, :] = labeltest_y[i, :] = label#将十种类型的训练集和测试集分别合并并打乱x1 = train_x[0]y1 = train_y[0]x2 = test_x[0]y2 = test_y[0]for i in range(9):x1 = np.row_stack((x1, train_x[i + 1]))x2 = np.row_stack((x2, test_x[i + 1]))y1 = np.row_stack((y1, train_y[i + 1]))y2 = np.row_stack((y2, test_y[i + 1]))index_x1= [i for i in range(len(x1))]index_x2= [i for i in range(len(x2))]shuffle(index_x1)shuffle(index_x2)x1=x1[index_x1]y1=y1[index_x1]x2=x2[index_x2]y2=y2[index_x2]return x1, y1, x2, y2    #分别代表:训练集样本,训练集标签,测试集样本,测试集标签def get_timesteps(samples):''' get timesteps of train_x and test_X to 10*120*31*128:param samples : a matrix need cut to 31*128'''s1 = np.zeros(shape=(31, 128))s2 = np.zeros(shape=(len(samples), 31, 128))for i in range(len(samples)):sample = samples[i]for a in range(31):s1[a]= sample[64*a:128+64*a]s2[i]=s1return s2# 读取原始数据,处理后保存
dataFile= 'G://study of machine learing//deep learning//LSTM//datasets//十个原始信号.mat'
data=scio.loadmat(dataFile)
org_signals=data['signals']
org_samples=cut_samples(org_signals)
train_x, train_y, test_x, test_y=make_datasets(org_samples)
train_x= get_timesteps(train_x)
test_x= get_timesteps(test_x)saveFile = 'G://study of machine learing//deep learning//LSTM//datasets//datasets.mat'
scio.savemat(saveFile, {'train_x':train_x, 'train_y':train_y, 'test_x':test_x, 'test_y':test_y})

不需要划分timesteps的可以不用这个函数,最后就得到了打乱顺序的训练集(900×2048)、测试集(300×2048)

码字不易,转载请附出处:https://blog.csdn.net/weixin_44620044/article/details/106877805
小白一枚,请大牛批评指正。