unit-testing - 如何测试科学软件？

Question

我坚信软件测试确实非常重要，尤其是在科学领域。然而，在过去的 6 年里，我从未遇到过任何定期测试的科学软件项目（其中大多数甚至没有版本控制）。

现在我想知道您如何处理科学代码（数值计算）的软件测试。

从我的角度来看，标准单元测试经常忽略这一点，因为没有确切的结果，所以assert(a == b)由于“正常”的数字错误，使用可能会有点困难。

因此，我期待着阅读您对此的看法。

score 12 · Accepted Answer

刚刚研究了一个类似的问题（谷歌：“测试科学软件”），并提出了一些可能感兴趣的论文。这些涵盖了普通的编码错误和知道结果是否正确的更大问题（地幔深度？）

http://http.icsi.berkeley.edu/ftp/pub/speech/papers/wikipapers/cox_harris_testing_numerical_software.pdf

http://www.cs.ua.edu/~SECSE09/Presentations/09_Hook.pdf（断开的链接；新链接是http://www.se4science.org/workshops/secse09/Presentations/09_Hook.pdf）

http://www.associationforsoftwaretesting.org/?dl_name=DianeKellyRebeccaSanders_TheChallengeOfTestingScientificSoftware_paper.pdf

我认为 09_Hook.pdf（另见 matmute.sourceforge.net）中描述的突变测试的想法特别有趣，因为它模仿了我们都犯的简单错误。最难的部分是学习使用统计分析来获得置信度，而不是单遍代码审查（人或机器）。

这个问题并不新鲜。我确定我有一份“科学软件有多准确？”的原件。由 Hatton 等人于 1994 年 10 月撰写，即使在那时也显示了相同理论（作为算法）的不同实现如何迅速分歧（这也是 Kelly & Sanders 论文中的参考文献 8）

---（2019 年 10 月）最近测试科学软件：系统文献综述

score 11 · Accepted Answer

我也在学术界，我已经编写了要在我们的集群上执行的量子力学模拟程序。我对测试甚至版本控制做了同样的观察。我更糟糕：在我的情况下，我使用 C++ 库进行模拟，而我从其他人那里得到的代码是纯意大利面条代码，没有继承，甚至没有函数。

我重写了它，还实现了一些单元测试。您必须处理数值精度是正确的，这可能会因您运行的架构而异。然而，单元测试是可能的，只要您考虑到这些数值舍入误差。您的结果不应取决于数值的四舍五入，否则您的算法的稳健性会出现不同的问题。

所以，总而言之，我对我的科学程序使用单元测试，它确实让人们对结果更有信心，尤其是在最终发布数据方面。

score 8 · Accepted Answer

I'm also using cpptest for its TEST_ASSERT_DELTA. I'm writing high-performance numerical programs in computational electromagnetics and I've been happily using it in my C++ programs.

I typically go about testing scientific code the same way as I do with any other kind of code, with only a few retouches, namely:

I always test my numerical codes for cases that make no physical sense and make sure the computation actually stops before producing a result. I learned this the hard way: I had a function that was computing some frequency responses, then supplied a matrix built with them to another function as arguments which eventually gave its answer a single vector. The matrix could have been any size depending on how many terminals the signal was applied to, but my function was not checking if the matrix size was consistent with the number of terminals (2 terminals should have meant a 2 x 2 x n matrix); however, the code itself was wrapped so as not to depend on that, it didn't care what size the matrices were since it just had to do some basic matrix operations on them. Eventually, the results were perfectly plausible, well within the expected range and, in fact, partially correct -- only half of the solution vector was garbled. It took me a while to figure. If your data looks correct, it's assembled in a valid data structure and the numerical values are good (e.g. no NaNs or negative number of particles) but it doesn't make physical sense, the function has to fail gracefully.
I always test the I/O routines even if they are just reading a bunch of comma-separated numbers from a test file. When you're writing code that does twisted math, it's always tempting to jump into debugging the part of the code that is so math-heavy that you need a caffeine jolt just to understand the symbols. Days later, you realize you are also adding the ASCII value of \n to your list of points.
When testing for a mathematical relation, I always test it "by the book", and I also learned this by example. I've seen code that was supposed to compare two vectors but only checked for equality of elements and did not check for equality of length.

score 2 · Accepted Answer

2

请看一下SO问题的答案如何正确使用TDD来实现数值方法？

于 2010-08-06T06:37:27.063 回答

unit-testing - 如何测试科学软件？

4 回答 4

Related

Reference