r - RWeka M5P 型号。如何手动重现结果？

Question

我想澄清我对训练有素的 M5P 模型结果的理解。我训练了一个 M5P 模型，它给了我一棵树，然后是 4 个线性模型。

M5 unpruned model tree:
(using smoothed linear models)

Value12 <= 2.266 : 
|   Value2 <= 1111.5 : LM1 (2/0.01%)
|   Value2 >  1111.5 : LM2 (4/2.268%)
Value12 >  2.266 : 
|   Value3 <= 1544650 : LM3 (2/1.652%)
|   Value3 >  1544650 : LM4 (2/92.017%)

LM num: 1
Value15 = 
    -0.0001 * Value2 
    + 1.8377

LM num: 2
Value15 = 
    -0.0001 * Value2 
    + 1.8181

LM num: 3
Value15 = 
    -0 * Value3 
    + 1.7212

LM num: 4
Value15 = 
    -0 * Value3 
    + 1.7093

Number of Rules : 4

为了确保我理解工作原理，我尝试使用决策树和引用的 LM 模型手动复制结果，但结果不如预期。

我使用树来确定要使用哪个 LM 模型，并按照 LM 模型中所述执行操作，结果不一样。这正常吗？

我使用的数据集：

Data_train<-structure(list(Value2 = c(610L, 1245L, 978L, 610L, 978L, 610L, 
1727L, 1810L, 1805L, 1805L), Value3 = c(1544673L, 2206981L, 2512821L, 
1544627L, 2512792L, 1524144L, 3415598L, 9205162L, 9182166L, 9182089L
), Value4 = c(12.1260004043579, 17.3250007629395, 19.7259998321533, 
12.125, 19.7250003814697, 11.9650001525879, 26.8120002746582, 
72.2610015869141, 72.0800018310547, 72.0790023803711), Value5 = 
c(0.0817999988794327, 
0.0856000036001205, 0.0828000009059906, 0.0817999988794327, 
0.0828000009059906, 
0.09009999781847, 0.145199999213219, 0.200299993157387, 0.200299993157387, 
0.200200006365776), Value6 = c(2L, 1L, 2L, 2L, 2L, 2L, 4L, 4L, 
4L, 4L), Value7 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
    Value8 = c(4L, 4L, 4L, 4L, 4L, 4L, 22L, 36L, 36L, 36L), Value9 = c(1L, 
    1L, 2L, 1L, 2L, 1L, 8L, 6L, 6L, 6L), Value10 = c(0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Value11 = c(0.958189010620117, 
    1, 0.925986051559448, 0.958268105983734, 0.926032960414886, 
    0.971082329750061, 0.471057742834091, 0.476771682500839, 
    0.47670641541481, 0.47671303153038), Value12 = c(3.27869, 
    0.80321, 2.04499, 3.27869, 2.04499, 3.27869, 2.31616, 2.20994, 
    2.21607, 2.21607), Value13 = c(1L, 0L, 1L, 1L, 1L, 1L, 2L, 
    3L, 3L, 3L), Value15 = c(1.33398258686066, 1.90592515468597, 
    2.17005920410156, 1.33387243747711, 2.1699492931366, 1.31627094745636, 
    0.353617042303085, 1.93668437004089, 1.93183350563049, 1.93180668354034
    )), .Names = c("Value2", "Value3", "Value4", "Value5", "Value6", 
"Value7", "Value8", "Value9", "Value10", "Value11", "Value12", 
"Value13", "Value15"), row.names = c(NA, 10L), class = "data.frame")

这是我用来训练模型的公式：

library(RWeka)
Data_modelUnPruned <- M5P(Value15 ~ Value6 + Value3 + Value4 + Value2 + 
Value7 + Value8 + Value9 + Value10 + Value11 + Value12 + Value13, data = 
Data_train, control = Weka_control(N = TRUE))

这是添加预测列后的结果数据集：

Data_train_Results<-structure(list(Value2 = c(610L, 1245L, 978L, 610L, 978L, 
610L, 
1727L, 1810L, 1805L, 1805L), Value3 = c(1544673L, 2206981L, 2512821L, 
1544627L, 2512792L, 1524144L, 3415598L, 9205162L, 9182166L, 9182089L
), Value4 = c(12.1260004043579, 17.3250007629395, 19.7259998321533, 
12.125, 19.7250003814697, 11.9650001525879, 26.8120002746582, 
72.2610015869141, 72.0800018310547, 72.0790023803711), Value5 = 
c(0.0817999988794327, 
0.0856000036001205, 0.0828000009059906, 0.0817999988794327, 
0.0828000009059906, 
0.09009999781847, 0.145199999213219, 0.200299993157387, 0.200299993157387, 
0.200200006365776), Value6 = c(2L, 1L, 2L, 2L, 2L, 2L, 4L, 4L, 
4L, 4L), Value7 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
    Value8 = c(4L, 4L, 4L, 4L, 4L, 4L, 22L, 36L, 36L, 36L), Value9 = c(1L, 
    1L, 2L, 1L, 2L, 1L, 8L, 6L, 6L, 6L), Value10 = c(0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Value11 = c(0.958189010620117, 
    1, 0.925986051559448, 0.958268105983734, 0.926032960414886, 
    0.971082329750061, 0.471057742834091, 0.476771682500839, 
    0.47670641541481, 0.47671303153038), Value12 = c(3.27869, 
    0.80321, 2.04499, 3.27869, 2.04499, 3.27869, 2.31616, 2.20994, 
    2.21607, 2.21607), Value13 = c(1L, 0L, 1L, 1L, 1L, 1L, 2L, 
    3L, 3L, 3L), Value15 = c(1.33398258686066, 1.90592515468597, 
    2.17005920410156, 1.33387243747711, 2.1699492931366, 1.31627094745636, 
    0.353617042303085, 1.93668437004089, 1.93183350563049, 1.93180668354034
    ), Model_Prediction = c(1.56039428073199, 1.74959163286097, 
    1.77758972532522, 1.57231876013397, 1.77758972532522, 1.57429264935954, 
    1.38009848913172, 1.71850280973615, 1.71877793206469, 1.71877793206469
    )), .Names = c("Value2", "Value3", "Value4", "Value5", "Value6", 
"Value7", "Value8", "Value9", "Value10", "Value11", "Value12", 
"Value13", "Value15", "Model_Prediction"), row.names = c(NA, 
10L), class = "data.frame")

这是我用来尝试复制模型结果的代码，它基本上是 Visual Basic 中 M5P 模型的硬编码版本。

Public Function GetLM(Value2 As Long, Value3 As Long, Value4 As Double, 
Value6 As Long, Value7 As Long, Value8 As Long, Value9 As Long, Value10 As 
Long, Value11 As Double, Value12 As Double, Value13 As Long)
Dim lm As String

If Value12 <= 2.266 Then
    If Value2 <= 1111.5 Then
        lm = "LM1" '(2/0.019%)
    Else
        lm = "LM2" '(4/2.269%)
    End If
Else
    If Value3 <= 1544650 Then
        lm = "LM3" '(2/1.652%)
    Else
        lm = "LM4" '(2/92.021%)
    End If
End If

Select Case lm
        Case "LM1"
            GetLM = -0.0001 * Value2 _
                    + 1.8377
        Case "LM2"
            GetLM = -0.0001 * Value2 _
                    + 1.8181
        Case "LM3"
            GetLM = -0 * Value3 _
                    + 1.7212
        Case "LM4"
            GetLM = -0 * Value3 _
                    + 1.7093
        Case Else
            GetLM = 0
End Select
End Function

有人可以向我解释这应该如何工作吗？

非常感谢。

r - RWeka M5P 型号。如何手动重现结果？

0 回答 0

Related

Reference