低素质弹幕分类器 CNN 训练笔记

2017/2/6 14:48 下午 posted in  Deep Learning

一开始使用这个结构,迭代10次。

model = Sequential()
model.add(Convolution1D(100, 4, border_mode='valid', input_shape=(100, word_model.vector_size)))
model.add(Activation('relu'))
model.add(Convolution1D(5, 4, border_mode='valid'))
model.add(Activation('relu'))
model.add(Flatten()) 
model.add(Dense(32, activation='relu'))
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy']
             )

完成训练后,乍一看准确率很高,结果 print 出来看一下,低素质弹幕完全没有被过滤,完全是将分类全部丢给 positive 达到的高准确率 (0.98) 的确是 meaningless classification
并且这个结果在loss里看得很清楚,loss一直是处于15+的

尝试增加第二个卷积层的节点数,然而训练并没有明显变好

model = Sequential()
model.add(Convolution1D(100, 4, border_mode='valid', input_shape=(100, word_model.vector_size)))
model.add(Activation('relu'))
model.add(Convolution1D(100, 4, border_mode='valid'))
model.add(Activation('relu'))
model.add(Flatten()) 
model.add(Dense(32, activation='relu'))
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy']
             )

然后加大了FC层的隐节点,瞬间loss开始猛降,最后降到了0.3左右,print 出来一看,的确效果不错,但是有一部分语句较短的低素质的弹幕没有被识别出来。

model = Sequential()
model.add(Convolution1D(100, 4, border_mode='valid', input_shape=(100, word_model.vector_size)))
model.add(Activation('relu'))
model.add(Convolution1D(100, 4, border_mode='valid', input_shape=(100, word_model.vector_size)))
model.add(Activation('relu'))
model.add(Flatten()) 
model.add(Dense(100, activation='relu'))
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy']
             )

然后由于感觉最后一次迭代没有收敛到极致,尝试加大迭代次数看看这个模型的极限如何。

设置迭代100次后。

Epoch 100/100
2999/2999 [==============================] - 9s - loss: 3.7457e-04 - acc: 1.0000

虽然 acc 和 loss 都到了令人发指的地步,但是发现训练集和测试集忘记shuffle了。。
但是还是看了一眼测试结果,骂人弹幕的识别率为意料之中的0,因为全都被判定为普通弹幕了。shuffle之后重新训练看看吧,先迭代10次,看看效果,然后再测试100次的过拟合程度

10次的结果是准确率在92.45左右,人工检测的结果还可以,检测出了一部分,但是不够理想,调整的迭代20次看看。

Epoch 20/20
2999/2999 [==============================] - 9s - loss: 0.0689 - acc: 0.9810
Correct: 1918
Incorrect: 83
Accuracy: 95.852

人工检测结果有所提升,但是仍然不够理想,提高到50次看看

Epoch 50/50
2999/2999 [==============================] - 9s - loss: 6.4179e-04 - acc: 1.0000
Correct: 1963
Incorrect: 38
Accuracy: 98.101

虽然训练数据很好看,但是检查弹幕的识别情况,发现已经过拟合。基本把所有低素质弹幕识别成普通弹幕。

Negative damku accuracy: 7.692
True negative: 2
False negative: 24

重新回到10次迭代,尝试画出roc曲线

Epoch 10/10
2999/2999 [==============================] - 9s - loss: 0.7608 - acc: 0.8943

Correct: 1886
Incorrect: 115
Overall accuracy: 94.253
Negative damku accuracy: 30.769
True negative: 8
False negative: 18

然后是20次迭代

Epoch 20/20
2999/2999 [==============================] - 10s - loss: 0.1805 - acc: 0.9710

Correct: 1908
Incorrect: 93
Overall accuracy: 95.352
Negative damku accuracy: 19.231
True negative: 5
False negative: 21

15次迭代

Epoch 15/15
2999/2999 [==============================] - 9s - loss: 0.3569 - acc: 0.9650

Correct: 1782
Incorrect: 219
Overall accuracy: 89.055
Negative damku accuracy: 46.154
True negative: 12
False negative: 14
Epoch 17/17
2999/2999 [==============================] - 9s - loss: 0.3631 - acc: 0.9847

Correct: 1893
Incorrect: 108
Overall accuracy: 94.603
Negative damku accuracy: 26.923
True negative: 7
False negative: 19
2999/2999 [==============================] - 10s - loss: 0.2556 - acc: 0.9760

Correct: 1816
Incorrect: 185
Overall accuracy: 90.755
Negative damku accuracy: 30.769
True negative: 8
False negative: 18

突然想到训练数据其实不需要遵从概率分布,直接使用上次贝叶斯分类器的弹幕数据即可(上次训练贝叶斯分类器的时候没注意训练样本的概率分布问题,这是个错误)。导入新样本后进行迭代测试

Epoch 1/1
5405/5405 [==============================] - 19s - loss: 0.6693 - acc: 0.6699     
Train epoch: 1
Correct: 2912
Incorrect: 692
Overall accuracy: 80.799
Negative damku accuracy: 81.522
True negative: 1328
False negative: 301
==========
Epoch 1/1
5405/5405 [==============================] - 19s - loss: 0.4421 - acc: 0.8344     
Train epoch: 2
Correct: 3133
Incorrect: 471
Overall accuracy: 86.931
Negative damku accuracy: 82.627
True negative: 1346
False negative: 283
==========
Epoch 1/1
5405/5405 [==============================] - 19s - loss: 0.2734 - acc: 0.9075     
Train epoch: 3
Correct: 3294
Incorrect: 310
Overall accuracy: 91.398
Negative damku accuracy: 86.618
True negative: 1411
False negative: 218
==========
Epoch 1/1
5405/5405 [==============================] - 20s - loss: 0.1724 - acc: 0.9404     
Train epoch: 4
Correct: 3365
Incorrect: 239
Overall accuracy: 93.368
Negative damku accuracy: 90.117
True negative: 1468
False negative: 161
==========
Epoch 1/1
5405/5405 [==============================] - 22s - loss: 0.1117 - acc: 0.9641     
Train epoch: 5
Correct: 3390
Incorrect: 214
Overall accuracy: 94.062
Negative damku accuracy: 92.879
True negative: 1513
False negative: 116
==========
Epoch 1/1
5405/5405 [==============================] - 18s - loss: 0.0770 - acc: 0.9771     
Train epoch: 6
Correct: 3416
Incorrect: 188
Overall accuracy: 94.784
Negative damku accuracy: 94.598
True negative: 1541
False negative: 88
==========
Epoch 1/1
5405/5405 [==============================] - 18s - loss: 0.0505 - acc: 0.9858     
Train epoch: 7
Correct: 3403
Incorrect: 201
Overall accuracy: 94.423
Negative damku accuracy: 91.590
True negative: 1492
False negative: 137
==========
Epoch 1/1
5405/5405 [==============================] - 18s - loss: 0.0414 - acc: 0.9900     
Train epoch: 8
Correct: 3426
Incorrect: 178
Overall accuracy: 95.061
Negative damku accuracy: 94.905
True negative: 1546
False negative: 83
==========
Epoch 1/1
5405/5405 [==============================] - 18s - loss: 0.0365 - acc: 0.9902     
Train epoch: 9
Correct: 3417
Incorrect: 187
Overall accuracy: 94.811
Negative damku accuracy: 92.756
True negative: 1511
False negative: 118
==========
Epoch 1/1
5405/5405 [==============================] - 18s - loss: 0.0231 - acc: 0.9943     
Train epoch: 10
Correct: 3415
Incorrect: 189
Overall accuracy: 94.756
Negative damku accuracy: 92.449
True negative: 1506
False negative: 123
==========
Epoch 1/1
5405/5405 [==============================] - 18s - loss: 0.0320 - acc: 0.9906     
Train epoch: 11
Correct: 3396
Incorrect: 208
Overall accuracy: 94.229
Negative damku accuracy: 94.905
True negative: 1546
False negative: 83
==========
Epoch 1/1
5405/5405 [==============================] - 18s - loss: 0.0158 - acc: 0.9959     
Train epoch: 12
Correct: 3416
Incorrect: 188
Overall accuracy: 94.784
Negative damku accuracy: 93.738
True negative: 1527
False negative: 102
==========
Epoch 1/1
5405/5405 [==============================] - 18s - loss: 0.0093 - acc: 0.9983     
Train epoch: 13
Correct: 3415
Incorrect: 189
Overall accuracy: 94.756
Negative damku accuracy: 95.212
True negative: 1551
False negative: 78
==========
Epoch 1/1
5405/5405 [==============================] - 18s - loss: 0.0048 - acc: 0.9991     
Train epoch: 14
Correct: 3421
Incorrect: 183
Overall accuracy: 94.922
Negative damku accuracy: 94.843
True negative: 1545
False negative: 84
==========
Epoch 1/1
5405/5405 [==============================] - 18s - loss: 0.0052 - acc: 0.9989     
Train epoch: 15
Correct: 3421
Incorrect: 183
Overall accuracy: 94.922
Negative damku accuracy: 93.923
True negative: 1530
False negative: 99
==========
Epoch 1/1
5405/5405 [==============================] - 20s - loss: 0.0024 - acc: 0.9998         
Train epoch: 16
Correct: 3413
Incorrect: 191
Overall accuracy: 94.700
Negative damku accuracy: 94.291
True negative: 1536
False negative: 93
==========
Epoch 1/1
5405/5405 [==============================] - 18s - loss: 0.0028 - acc: 0.9998     
Train epoch: 17
Correct: 3418
Incorrect: 186
Overall accuracy: 94.839
Negative damku accuracy: 93.186
True negative: 1518
False negative: 111
==========
Epoch 1/1
5405/5405 [==============================] - 21s - loss: 0.0024 - acc: 0.9996         
Train epoch: 18
Correct: 3415
Incorrect: 189
Overall accuracy: 94.756
Negative damku accuracy: 94.352
True negative: 1537
False negative: 92
==========
Epoch 1/1
5405/5405 [==============================] - 19s - loss: 0.0013 - acc: 0.9996         
Train epoch: 19
Correct: 3425
Incorrect: 179
Overall accuracy: 95.033
Negative damku accuracy: 94.475
True negative: 1539
False negative: 90
==========
Epoch 1/1
5405/5405 [==============================] - 18s - loss: 9.6297e-04 - acc: 0.9998     
Train epoch: 20
Correct: 3417
Incorrect: 187
Overall accuracy: 94.811
Negative damku accuracy: 93.493
True negative: 1523
False negative: 106
==========

想起来一开始 word2vec model 是用的娱乐区弹幕训练的,不完全符合环境。导出游戏区的弹幕重新训练看。

Epoch 1/1
5405/5405 [==============================] - 17s - loss: 0.5780 - acc: 0.7441     
Train epoch: 1
Correct: 3140
Incorrect: 464
Overall accuracy: 87.125
Negative damku accuracy: 89.134
True negative: 1452
False negative: 177
==========
Epoch 1/1
5405/5405 [==============================] - 17s - loss: 0.2168 - acc: 0.9258     
Train epoch: 2
Correct: 3444
Incorrect: 160
Overall accuracy: 95.560
Negative damku accuracy: 93.738
True negative: 1527
False negative: 102
==========
Epoch 1/1
5405/5405 [==============================] - 18s - loss: 0.0978 - acc: 0.9697     
Train epoch: 3
Correct: 3459
Incorrect: 145
Overall accuracy: 95.977
Negative damku accuracy: 95.887
True negative: 1562
False negative: 67
==========
Epoch 1/1
5405/5405 [==============================] - 22s - loss: 0.0606 - acc: 0.9824     
Train epoch: 4
Correct: 3426
Incorrect: 178
Overall accuracy: 95.061
Negative damku accuracy: 96.746
True negative: 1576
False negative: 53
==========
Epoch 1/1
5405/5405 [==============================] - 23s - loss: 0.1076 - acc: 0.9678     
Train epoch: 5
Correct: 3468
Incorrect: 136
Overall accuracy: 96.226
Negative damku accuracy: 94.537
True negative: 1540
False negative: 89
==========
Epoch 1/1
5405/5405 [==============================] - 20s - loss: 0.0476 - acc: 0.9856     
Train epoch: 6
Correct: 3465
Incorrect: 139
Overall accuracy: 96.143
Negative damku accuracy: 95.028
True negative: 1548
False negative: 81
==========
Epoch 1/1
5405/5405 [==============================] - 19s - loss: 0.0285 - acc: 0.9911     
Train epoch: 7
Correct: 3472
Incorrect: 132
Overall accuracy: 96.337
Negative damku accuracy: 95.150
True negative: 1550
False negative: 79
==========
Epoch 1/1
5405/5405 [==============================] - 18s - loss: 0.0192 - acc: 0.9943     
Train epoch: 8
Correct: 3473
Incorrect: 131
Overall accuracy: 96.365
Negative damku accuracy: 96.010
True negative: 1564
False negative: 65
==========
Epoch 1/1
5405/5405 [==============================] - 18s - loss: 0.0128 - acc: 0.9956     
Train epoch: 9
Correct: 3472
Incorrect: 132
Overall accuracy: 96.337
Negative damku accuracy: 95.580
True negative: 1557
False negative: 72
==========
Epoch 1/1
5405/5405 [==============================] - 17s - loss: 0.0079 - acc: 0.9972     
Train epoch: 10
Correct: 3474
Incorrect: 130
Overall accuracy: 96.393
Negative damku accuracy: 95.580
True negative: 1557
False negative: 72
==========
Epoch 1/1
5405/5405 [==============================] - 20s - loss: 0.0060 - acc: 0.9981     
Train epoch: 11
Correct: 3476
Incorrect: 128
Overall accuracy: 96.448
Negative damku accuracy: 95.396
True negative: 1554
False negative: 75
==========
Epoch 1/1
5405/5405 [==============================] - 27s - loss: 0.0045 - acc: 0.9989     
Train epoch: 12
Correct: 3478
Incorrect: 126
Overall accuracy: 96.504
Negative damku accuracy: 95.089
True negative: 1549
False negative: 80
==========
Epoch 1/1
5405/5405 [==============================] - 22s - loss: 0.0031 - acc: 0.9994     
Train epoch: 13
Correct: 3476
Incorrect: 128
Overall accuracy: 96.448
Negative damku accuracy: 95.150
True negative: 1550
False negative: 79
==========
Epoch 1/1
5405/5405 [==============================] - 19s - loss: 0.0024 - acc: 0.9994         
Train epoch: 14
Correct: 3479
Incorrect: 125
Overall accuracy: 96.532
Negative damku accuracy: 95.089
True negative: 1549
False negative: 80
==========
Epoch 1/1
5405/5405 [==============================] - 19s - loss: 0.0020 - acc: 0.9994         
Train epoch: 15
Correct: 3476
Incorrect: 128
Overall accuracy: 96.448
Negative damku accuracy: 94.966
True negative: 1547
False negative: 82
==========
Epoch 1/1
5405/5405 [==============================] - 22s - loss: 0.0018 - acc: 0.9994         
Train epoch: 16
Correct: 3474
Incorrect: 130
Overall accuracy: 96.393
Negative damku accuracy: 95.150
True negative: 1550
False negative: 79
==========
Epoch 1/1
5405/5405 [==============================] - 19s - loss: 0.0016 - acc: 0.9994         
Train epoch: 17
Correct: 3475
Incorrect: 129
Overall accuracy: 96.421
Negative damku accuracy: 95.457
True negative: 1555
False negative: 74
==========
Epoch 1/1
5405/5405 [==============================] - 21s - loss: 0.0014 - acc: 0.9994         
Train epoch: 18
Correct: 3474
Incorrect: 130
Overall accuracy: 96.393
Negative damku accuracy: 95.150
True negative: 1550
False negative: 79
==========
Epoch 1/1
5405/5405 [==============================] - 24s - loss: 0.0013 - acc: 0.9996     
Train epoch: 19
Correct: 3474
Incorrect: 130
Overall accuracy: 96.393
Negative damku accuracy: 95.089
True negative: 1549
False negative: 80
==========
Epoch 1/1
5405/5405 [==============================] - 21s - loss: 0.0037 - acc: 0.9991         
Train epoch: 20
Correct: 3469
Incorrect: 135
Overall accuracy: 96.254
Negative damku accuracy: 96.624
True negative: 1574
False negative: 55
==========

效果提升明显

尝试把训练 ratio 提高到 0.8

Epoch 1/1
7206/7206 [==============================] - 24s - loss: 0.5097 - acc: 0.7778     
Train epoch: 1
Correct: 1673
Incorrect: 130
Overall accuracy: 92.790
Negative damku accuracy: 91.779
True negative: 748
False negative: 67
==========
Epoch 1/1
7206/7206 [==============================] - 23s - loss: 0.1654 - acc: 0.9455     
Train epoch: 2
Correct: 1745
Incorrect: 58
Overall accuracy: 96.783
Negative damku accuracy: 95.092
True negative: 775
False negative: 40
==========
Epoch 1/1
7206/7206 [==============================] - 24s - loss: 0.0891 - acc: 0.9732     
Train epoch: 3
Correct: 1750
Incorrect: 53
Overall accuracy: 97.060
Negative damku accuracy: 97.055
True negative: 791
False negative: 24
==========
Epoch 1/1
7206/7206 [==============================] - 23s - loss: 0.0570 - acc: 0.9829     
Train epoch: 4
Correct: 1739
Incorrect: 64
Overall accuracy: 96.450
Negative damku accuracy: 96.933
True negative: 790
False negative: 25
==========
Epoch 1/1
7206/7206 [==============================] - 22s - loss: 0.0394 - acc: 0.9878     
Train epoch: 5
Correct: 1754
Incorrect: 49
Overall accuracy: 97.282
Negative damku accuracy: 96.074
True negative: 783
False negative: 32
==========
Epoch 1/1
7206/7206 [==============================] - 22s - loss: 0.0471 - acc: 0.9872     
Train epoch: 6
Correct: 1747
Incorrect: 56
Overall accuracy: 96.894
Negative damku accuracy: 95.706
True negative: 780
False negative: 35
==========
Epoch 1/1
7206/7206 [==============================] - 22s - loss: 0.0266 - acc: 0.9926     
Train epoch: 7
Correct: 1735
Incorrect: 68
Overall accuracy: 96.229
Negative damku accuracy: 95.706
True negative: 780
False negative: 35
==========
Epoch 1/1
7206/7206 [==============================] - 26s - loss: 0.0235 - acc: 0.9921     
Train epoch: 8
Correct: 1742
Incorrect: 61
Overall accuracy: 96.617
Negative damku accuracy: 95.706
True negative: 780
False negative: 35
==========
Epoch 1/1
7206/7206 [==============================] - 22s - loss: 0.0211 - acc: 0.9928     
Train epoch: 9
Correct: 1753
Incorrect: 50
Overall accuracy: 97.227
Negative damku accuracy: 96.074
True negative: 783
False negative: 32
==========
Epoch 1/1
7206/7206 [==============================] - 22s - loss: 0.0207 - acc: 0.9929     
Train epoch: 10
Correct: 1750
Incorrect: 53
Overall accuracy: 97.060
Negative damku accuracy: 95.951
True negative: 782
False negative: 33
==========
Epoch 1/1
7206/7206 [==============================] - 22s - loss: 0.0282 - acc: 0.9913     
Train epoch: 11
Correct: 1743
Incorrect: 60
Overall accuracy: 96.672
Negative damku accuracy: 96.442
True negative: 786
False negative: 29
==========
Epoch 1/1
7206/7206 [==============================] - 22s - loss: 0.0174 - acc: 0.9947     
Train epoch: 12
Correct: 1737
Incorrect: 66
Overall accuracy: 96.339
Negative damku accuracy: 96.564
True negative: 787
False negative: 28
==========
Epoch 1/1
7206/7206 [==============================] - 22s - loss: 0.0135 - acc: 0.9965     
Train epoch: 13
Correct: 1741
Incorrect: 62
Overall accuracy: 96.561
Negative damku accuracy: 96.933
True negative: 790
False negative: 25
==========
Epoch 1/1
7206/7206 [==============================] - 22s - loss: 0.0106 - acc: 0.9965     
Train epoch: 14
Correct: 1743
Incorrect: 60
Overall accuracy: 96.672
Negative damku accuracy: 96.687
True negative: 788
False negative: 27
==========
Epoch 1/1
7206/7206 [==============================] - 22s - loss: 0.0068 - acc: 0.9975     
Train epoch: 15
Correct: 1751
Incorrect: 52
Overall accuracy: 97.116
Negative damku accuracy: 95.460
True negative: 778
False negative: 37
==========
Epoch 1/1
7206/7206 [==============================] - 22s - loss: 0.0053 - acc: 0.9982     
Train epoch: 16
Correct: 1748
Incorrect: 55
Overall accuracy: 96.950
Negative damku accuracy: 96.564
True negative: 787
False negative: 28
==========
Epoch 1/1
7206/7206 [==============================] - 23s - loss: 0.0051 - acc: 0.9986     
Train epoch: 17
Correct: 1751
Incorrect: 52
Overall accuracy: 97.116
Negative damku accuracy: 95.460
True negative: 778
False negative: 37
==========
Epoch 1/1
7206/7206 [==============================] - 24s - loss: 0.0038 - acc: 0.9989     
Train epoch: 18
Correct: 1749
Incorrect: 54
Overall accuracy: 97.005
Negative damku accuracy: 96.319
True negative: 785
False negative: 30
==========
Epoch 1/1
7206/7206 [==============================] - 22s - loss: 0.0036 - acc: 0.9990     
Train epoch: 19
Correct: 1747
Incorrect: 56
Overall accuracy: 96.894
Negative damku accuracy: 95.583
True negative: 779
False negative: 36
==========
Epoch 1/1
7206/7206 [==============================] - 23s - loss: 0.0035 - acc: 0.9989         
Train epoch: 20
Correct: 1746
Incorrect: 57
Overall accuracy: 96.839
Negative damku accuracy: 95.215
True negative: 776
False negative: 39
==========

测试效果提升了约1~2个百分点。

暂时没有想到能够优化的方面了,选用第3次迭代的模型作为最终模型