python - 使用生命线和分类变量的 Cox 回归

Question

嗨，我正在使用 lifelines 包进行 Cox 回归。我想检查非二元分类变量的影响。有内置的方法吗？或者我应该将每个类别因素转换为一个数字？或者，在生命线中使用 kmf 拟合器，是否可以对每个因素执行此操作，然后获得 p 值？我可以制作单独的图，但我找不到如何评估 p 值。

谢谢！

更新：好的，如果在使用 pd.get_dummies 后我有一个格式为 df 的数据框：

            event     time       categorical_1 categorical_2  categorical_3
0              0      11.54             0             0             1
1              0       6.95             0             0             1
2              1       0.24             0             1             0
3              0       3.00             0             0             1
4              1      10.26             1             0             1
...          ...        ...           ...           ...           ...
1215           1       6.80             1             0             0

我现在需要删除其中一个虚拟变量。然后做：

cph.fit(df, duration_col=time, event_col=event)

如果我现在想绘制分类变量如何影响生存图，我将如何处理？我试过了：

    summary = cph.summary
    for index, row in summary.iterrows():
        print(index)
        cph.plot_covariate_groups(index, [a[index].mean()], ax=ax)
    plt.show()

但是它将变量的所有不同因素绘制在同一条曲线上，我希望曲线会有所不同。好吧，我实际上不确定它是绘制所有曲线还是仅绘制最后一条曲线，但它绘制了分类变量中所有可能性的图例。

谢谢

score 3 · Accepted Answer

与其他回归一样，您需要将分类变量转换为虚拟变量。您可以使用pandas.get_dummies. 完成后，Cox 回归模型将为您提供每个类别的估计值（预计已删除的虚拟变量 - 请参见此处的注释）。

对于您的第二个问题，您需要使用类似的东西lifelines.statistics.multivariate_logrank_test来测试一个类别是否不同。（另见lifelines.statistics.pairwise_logrank_test）

对于您的绘图问题，有更好的方法。

cph.plot_covariate_groups(['categorical_1', 'categorical_2', ...], np.eye(n))

其中n是新数据框中的类别数。

在此处查看更多文档：https ://lifelines.readthedocs.io/en/latest/Survival%20Regression.html#plotting-the-effect-of-varying-a-covariate

python - 使用生命线和分类变量的 Cox 回归

1 回答 1

Related

Reference