“half-precision-float”的相关标签问题

0 投票

1 回答

577 浏览

intrinsics - 使用 AVX 收集半浮点值

使用 AVX/AVX2 内在函数，我可以使用以下方法收集 8 个值集，1,2 或 4 字节整数或 4 字节浮点数：

_mm256_i32gather_epi32()

_mm256_i32gather_ps()

但目前，我有一个案例，我正在加载在 nvidia GPU 上生成并存储为 FP16 值的数据。如何对这些值进行矢量化加载？

到目前为止，我找到了_mm256_cvtph_ps() 内在函数。

但是，该内在函数的输入是__m128i值，而不是__m256i值。

查看 Intel Intrinsics Guide，我没有看到将 8 个值存储到 _mm128i 寄存器中的收集操作？

如何将 FP16 值收集到 __m256 寄存器的 8 个通道中？是否可以将它们作为 2 字节短路向量加载到 __m256i 中，然后以某种方式将其减少到 __m128i 值以传递给转换内在函数？如果是这样，我还没有找到内在函数来做到这一点。

更新

我按照@peter-cordes 的建议尝试了演员阵容，但我得到了虚假的结果。另外，我不明白这怎么可能？

我的 2 字节 int 值存储在 __m256i 中：

0000XXXX 0000XXXX 0000XXXX 0000XXXX 0000XXXX 0000XXXX 0000XXXX 0000XXXX

那么我怎样才能简单地转换为 __m128i 需要紧密包装的地方

XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX

演员会这样做吗？

我当前的代码：

但结果似乎不是 8 个正确形成的值。我认为目前每第二个对我来说都是假的？

2020-06-16T19:58:20.970

0 投票

0 回答

59 浏览

deep-learning - 是否有 Keras Adam Optimizer 的实现来支持 Float16

我目前正在openvino工具包上部署tiny-yolov3，为此我需要将我的模型转换为float16。但为此我需要一个支持 FP16 的优化器。我尝试修改 SGD 以支持 fp16，但其准确性太低。所以，我希望使用 Adam，但它不支持 FP16。至少 Keras 实现，我知道使用 tf.keras.mixed_precision 可以实现这一点，但这需要 tf2.0 并且 openvino 还没有完全支持 tf2.0。因此，如果有人遇到过这个问题并且可以帮助我解决这个问题，那将非常有帮助。

deep-learning yolo adam half-precision-float

2020-07-15T20:25:11.703

0 投票

1 回答

392 浏览

c++ - 将 __fp16 转换为浮动无法链接到 Clang 9

我需要读取包含以格式存储的浮点数的文件binary16并将它们转换为float. 基于https://releases.llvm.org/9.0.0/tools/clang/docs/LanguageExtensions.html#half-precision-floating-point，我将数据读入__fp16* fp16_weights_buf，然后简单地做了

这可以编译，但链接失败：

我是否需要传递一些额外的选项才能使其工作？

c++clang half-precision-float

2020-09-09T12:55:50.133

0 投票

0 回答

344 浏览

go - converting Golang float32 to half-precision float (GLSL float16) as uint16

I need to pass some data over from Go to an '300 es' shader. The data consists of two uint16s packed into a uint32. Each uint16 represents a half-precision float (float16). I found some PD Java code that looks like it will do the job, but I am struggling with porting the last statement, which uses a couple of zero-extend right shifts (I think the other shifts are fine i.e. non-negative). Since Go is a bit clever with extending, the solution to the port is eluding me. I did think maybe the first one could be changed into a left shift, since it just seems to be positioning a single bit for addition? but the final shift blows my mind out the water :)

btw I hope I got the bracketing right, since the operator precedence seems to be different between Go and Java regarding '-' and '>>'...

I need to go the other way around next, but that is hopefully easier without right shifts... famous last words!

Java code:

https://stackoverflow.com/a/6162687/345165

My partial port:

I found what looked like even more efficient code, with no branching, but understanding the mechanics of how that works to be able to port it is a bit ;) beyond my rusty (not the language) skills.

https://stackoverflow.com/a/5587983

Cheers

[Edit] The code appears to work with the values I am currently using (it's hard to be precise since I have no experience debuging a shader). So I guess my question is about the correctness of my port, especially the final two shifts.

[Edit2] In the light of day I can see I already got the precedence wrong in one place and fixed the above example.

changed:

to:

go half-precision-float

2020-09-17T14:24:04.970

0 投票

1 回答

269 浏览

cuda - 半浮点数的 FLT_MAX

我正在使用带有半浮点数的 CUDA，或者__half在 CUDA 中调用它们。

半浮点数相当于FLT_MAX多少？

标cuda_fp16.h头似乎没有类似的宏。

cuda math.h half-precision-float

2020-11-16T21:03:27.443

0 投票

0 回答

72 浏览

assembly - 如何初始化 16 位半浮点数（ARM32 的 GAS）？

在编写 ARM 汇编程序时，可以使用数据类型指令来初始化一些值。例如，在下面的示例中，我们正在初始化一个浮点数：

label: .single 0.0

但是，当存储空间很重要时，在 ARM 平台上可以选择使用一半大小的浮点数。但是，似乎没有允许从汇编代码初始化一半大小的浮点数的数据类型指令。

在 ARM 汇编程序中初始化半精度浮点数的最简单方法是什么？

assembly floating-point arm gnu-assembler half-precision-float

2020-12-18T12:54:42.270

0 投票

1 回答

144 浏览

cuda - 没有显式内在函数的 CUDA 半浮点运算

我正在使用 CUDA 11.2，我使用该__half类型对 16 位浮点值进行操作。

我很惊讶 nvcc 编译器在我这样做时不会正确调用融合乘加指令：

它不是发出一个融合乘法加法，而是发出单独的 mul 和 add 指令。

请注意，尽管使用了--fmad=true编译器选项。

而显式__hfma( a,b,c )将发出：

是使用 16 位浮点乘加来使用显式内在函数的唯一方法吗？

cuda intrinsics nvcc fma half-precision-float

2021-01-07T19:46:39.333

0 投票

0 回答

251 浏览

c++ - 检测对 __fp16 的支持

从版本 6 开始，clang 已经支持一种__fp16类型。我想使用它，但我需要支持其他编译器（基于 clang 的和非基于 clang 的）以及旧版本的 clang，所以我需要一种可靠的方法来检测支持。不幸的是，我在clang 的文档中没有看到任何关于如何检测它的内容（使用__has_feature,__has_extension等）。

由于 clang 的版本号宏不可靠，我现在最好的解决方案是使用__has_warning("-Wpragma-pack")（在 clang 6 中还添加了 -Wpragma-pack 警告）。我希望有一个 fp16 功能/扩展/我可以检查的任何东西，只是没有记录我正在寻找的地方，但显然对其他想法持开放态度。

那么，有人知道检测__fp16支持的更好方法吗？

c++c clang half-precision-float

2021-02-01T04:27:43.120

0 投票

0 回答

104 浏览

tensorflow - TensorFlow 混合精度训练：Conv2DBackpropFilter 不使用 TensorCore

我正在使用 keras 混合精度 API 以适应 GPU 中的网络。通常在我的代码中，这看起来像这样。MWE 将是：

这似乎达到了预期的效果，因为当我训练我的模型并使用 TensorBoard 回调对其进行分析时，我的大部分操作都以半精度运行，其中一些正在使用 TensorCore（我有一个具有计算能力的 GPU 7.0 以上）。

然而，Conv2DBackpropFilter没有使用 TensorCore，即使根据 TensorBoard 信息它有资格使用它。

我还没有整个事情的最小可重现示例，如果需要，我可以处理它，但我想首先知道这是否是预期的行为，或者是否有一些已知的陷阱，因为我找不到任何网上资料。

编辑

我有一个 MRE，它的行为不同，但问题相同：为什么不使用 TensorCore（所有需要的维度都是 8 的倍数）？

在这个 MRE 中，64.2% 的操作时间花费在半精度上，这意味着半精度确实正在发生。在我的日志中，我还检查了计算能力：

然而，没有任何操作（这次不仅仅是Conv2DBackpropFilter）使用 TensorCore 运行。我不明白为什么。

tensorflow keras gpu half-precision-float automatic-mixed-precision

2021-02-05T07:47:33.950

0 投票

1 回答

440 浏览

python - 为什么从 np.float16 转换为 np.float32 会修改值？

将数字从一半转换为单个浮点表示时，我看到数值发生了变化。

在这里，我65500存储为半精度浮点数，但升级到单精度会将基础值更改为65504，这是远离目标的许多浮点增量。

在这种特定情况下，为什么会发生这种情况？

作为旁注，我还观察到

python numpy floating-point precision half-precision-float

2021-07-08T00:08:43.083

问题标签 [half-precision-float]

编辑

Reference