我想使用 Hexagon DSP 的 HVX 内在函数移植我们的算法,但我无法理解如何使用它们,还有一个问题是我使用了向量 64 位内在函数,但是当我分析 C 代码的代码周期时,比使用向量内在函数和也正在使用 Hexaon 计时器 api 来计算周期。这是代码:C 代码:消耗的周期是 5452
for(i=0;i<=128;i++){
value[i]=((hs_int32)((((hs_int32)(hs_int16)((32767)))*((hs_int32)
(hs_int16)((((window[i])) >> (15))))))+(hs_int32)((((((hs_int32)
(hs_int16)((32767)))*((hs_int32)(hs_int16)
(((window[i])&0x00007fff))))) >> (15))));
}
六边形内在函数:消耗的周期为 8766
for(i=0,j=0;i<=128/2;i++,j++)
{
Word64 and_op=Q6_P_and_PP(R_E_VECTOR_1[i],dummy);
shift_1[i+j]=Q6_R_asr_RI(shift_1[i+j],15);
shift_1[i+1+j]=Q6_R_asr_RI(shift_1[i+1+j],15);
Word64 first_op=Q6_P_vmpyweh_PP_sat(leak2_64,R_E_VECTOR_1[i]);
out[i]=Q6_P_vmpyweh_PP_sat(leak2_64,and_op);
shift_2[i+j]=Q6_R_asr_RI(shift_2[i+j],15);
shift_2[i+1+j]=Q6_R_asr_RI(shift_2[i+1+j],15);
out[i]=Q6_P_vaddw_PP(first_op,out[i]);
}
与使用六边形内在函数相比,C 代码显示的周期更少。任何人都可以帮助我解决这个问题。
@Brain cain,这是内在函数版本的反汇编:
r1:0 = memd(r30+#-48)
000000000000c400: r2 = memw(r30+#-52)
000000000000c404: r3 = memw(r30+#-20)
000000000000c408: r5:4 = memd(r2+r3<<#3)
000000000000c40c: r1:0 = vmpyweh(r1:0,r5:4):sat
000000000000c410: memd(r30+#-192) = r1:0
194 out[i]=Q6_P_vmpyweh_PP_sat(leak2_64,and_op);
000000000000c414: r1:0 = memd(r30+#-48)
000000000000c418: r5:4 = memd(r30+#-184)
000000000000c41c: r1:0 = vmpyweh(r1:0,r5:4):sat
000000000000c420: r2 = memw(r30+#-84)
000000000000c424: r3 = memw(r30+#-20)
000000000000c428: memd(r2+r3<<#3) = r1:0
195 shift_2[i+j]=Q6_R_asr_RI(shift_2[i+j],15);
000000000000c42c: r2 = memw(r30+#-148)
000000000000c430: r3 = memw(r30+#-20)
000000000000c434: r6 = memw(r30+#-24)
000000000000c438: r3 = add(r3,r6)
000000000000c43c: r6 = memw(r2+r3<<#2)
000000000000c440: r6 = asr(r6,#15)
000000000000c444: memw(r2+r3<<#2) = r6
196 shift_2[i+1+j]=Q6_R_asr_RI(shift_2[i+1+j],15);
000000000000c448: r2 = memw(r30+#-148)
000000000000c44c: r3 = memw(r30+#-20)
000000000000c450: r6 = memw(r30+#-24)
000000000000c454: r3 = add(r3,r6)
mt_cv_mec_power_spectrum_fixed_hexagon:
000000000000c458: r2 = addasl(r2,r3,#2)
000000000000c45c: r3 = memw(r2+#4)
000000000000c460: r3 = asr(r3,#15)
000000000000c464: memw(r2+#4) = r3
197 out[i]=Q6_P_vaddw_PP(first_op,out[i]);
000000000000c468: r1:0 = memd(r30+#-192)
000000000000c46c: r2 = memw(r30+#-84)
000000000000c470: r3 = memw(r30+#-20)
000000000000c474: r5:4 = memd(r2+r3<<#3)
000000000000c478: r1:0 = vaddw(r1:0,r5:4)
000000000000c47c: memd(r2+r3<<#3) = r1:0
}
我是 DSP 编程的新手,在理解 hexagon DSP 方面遇到了很多问题。你的帮助对我很有帮助。