python - 多元分布的对数优势决策边界

Question

我有一些关于 2 个类的二维数据，我正在尝试计算对数几率：

ln(P(class=a|x)/P(class=b|x))

然后我想绘制决策边界，即所有对数赔率 = 0 的点。我已经为 1d 数据做到了这一点，但对于 2d 数据，我的直觉是我必须使用 2d 直方图来获得P(x)和P(x|class = a), P(x | class = b)。我在做什么正确吗？我的一个问题是我从哪里得到 P(class = a)？它只是 0.5，因为有 2 个样本数量相同的类吗？我还认为我绘制决策边界的方式可能是错误的，因为它并不是我所期望的。

N = 1000

mean_a = [0, 0]
cov_a = [[2, 0], [0, 2]]  # diagonal covariance

mean_b = [1, 2]
cov_b = [[1, 0], [0, 1]]  # diagonal covariance

#generate data
Xa = np.random.multivariate_normal(mean_a, cov_a, N)
Xb = np.random.multivariate_normal(mean_b, cov_b, N)
Xall = np.vstack((Xa,Xb))

def logratio(a, b, eps=1e-14): 
    # take log ( ratio of probabilities of (y vs not-y) )   
    a=a+eps # to prevent taking logs of 0 or infinity
    b=b+eps # to prevent taking logs of 0 or infinity
    return np.log(a/b)

P_a = 0.5 # since each class has equal number of samples
P_b = 0.5

(P_xn_if_a, x_bins, y_bins) = np.histogram2d(Xa[:, 0], Xa[:, 1])
(P_xn, x_bins, y_bins) = np.histogram2d(Xall[:, 0], Xall[:, 1])
(P_xn_if_b, x_bins, y_bins) = np.histogram2d(Xb[:, 0], Xb[:, 1])

P_b_if_xn = P_xn_if_b * P_a / (P_xn + 1e-16)
P_a_if_xn = P_xn_if_a * P_a / (P_xn + 1e-16)
log_odds = logratio(P_a_if_xn, P_b_if_xn)

#plot only boundary
for i in range(0,10):
    for j in range(0,10):
        if log_odds[i][j] != 0:
            log_odds[i][j] = 0
        else:
            log_odds[i][j] = 1



fig, ax6 = plt.subplots(nrows=1, ncols=1,figsize=(15,8))
ax6.contour(x_bins[:-1], y_bins[:-1], log_odds,levels=[0], cmap="Greys_r")
ax6.scatter(Xa[:,0],Xa[:,1],color='r')
ax6.scatter(Xb[:,0],Xb[:,1],color='b')

在此处输入图像描述

python - 多元分布的对数优势决策边界

0 回答 0

Related

Reference