我需要知道与一组数字相比的数字是否在平均值的 1 stddev 之外,等等。
12 回答
虽然平方和算法在大多数情况下都可以正常工作,但如果您处理非常大的数字,它可能会造成很大的麻烦。你基本上可能会得到一个负方差......
另外,永远不要将 a^2 计算为 pow(a,2),a * a 几乎肯定会更快。
到目前为止,计算标准差的最佳方法是Welford 方法。我的 C 非常生锈,但它可能看起来像:
public static double StandardDeviation(List<double> valueList)
{
double M = 0.0;
double S = 0.0;
int k = 1;
foreach (double value in valueList)
{
double tmpM = M;
M += (value - tmpM) / k;
S += (value - tmpM) * (value - M);
k++;
}
return Math.Sqrt(S / (k-2));
}
如果您有整个总体(而不是样本总体),则使用return Math.Sqrt(S / (k-1));
.
编辑:我已经根据 Jason 的评论更新了代码......
编辑:我还根据亚历克斯的评论更新了代码......
比 Jaime 的解决方案快 10 倍,但请注意,正如 Jaime 指出的那样:
“虽然平方和算法在大多数情况下都能正常工作,但如果处理非常大的数字,它可能会造成很大的麻烦。你基本上可能会得到一个负方差”
如果您认为您正在处理非常大的数字或非常大量的数字,您应该使用这两种方法进行计算,如果结果相等,您肯定知道您可以使用“我的”方法来处理您的情况。
public static double StandardDeviation(double[] data)
{
double stdDev = 0;
double sumAll = 0;
double sumAllQ = 0;
//Sum of x and sum of x²
for (int i = 0; i < data.Length; i++)
{
double x = data[i];
sumAll += x;
sumAllQ += x * x;
}
//Mean (not used here)
//double mean = 0;
//mean = sumAll / (double)data.Length;
//Standard deviation
stdDev = System.Math.Sqrt(
(sumAllQ -
(sumAll * sumAll) / data.Length) *
(1.0d / (data.Length - 1))
);
return stdDev;
}
Jaime 接受的答案很好,除了您需要在最后一行除以 k-2(您需要除以“number_of_elements-1”)。更好的是,从 0 开始 k:
public static double StandardDeviation(List<double> valueList)
{
double M = 0.0;
double S = 0.0;
int k = 0;
foreach (double value in valueList)
{
k++;
double tmpM = M;
M += (value - tmpM) / k;
S += (value - tmpM) * (value - M);
}
return Math.Sqrt(S / (k-1));
}
Math.NET 库为您提供了此功能。
PM> 安装包 MathNet.Numerics
var populationStdDev = new List<double>(1d, 2d, 3d, 4d, 5d).PopulationStandardDeviation();
var sampleStdDev = new List<double>(2d, 3d, 4d).StandardDeviation();
有关详细信息,请参阅PopulationStandardDeviation。
您可以通过累积均值和均方来避免对数据进行两次传递
cnt = 0
mean = 0
meansqr = 0
loop over array
cnt++
mean += value
meansqr += value*value
mean /= cnt
meansqr /= cnt
并形成
sigma = sqrt(meansqr - mean^2)
一个因素cnt/(cnt-1)
通常也是合适的。
顺便说一句 - Demi和McWafflestix答案中数据的第一次传递隐藏在对Average
. 这种事情在一个小列表上肯定是微不足道的,但如果列表超过缓存的大小,甚至超出工作集的大小,这将成为一个投标交易。
代码片段:
public static double StandardDeviation(List<double> valueList)
{
if (valueList.Count < 2) return 0.0;
double sumOfSquares = 0.0;
double average = valueList.Average(); //.NET 3.0
foreach (double value in valueList)
{
sumOfSquares += Math.Pow((value - average), 2);
}
return Math.Sqrt(sumOfSquares / (valueList.Count - 1));
}
我发现 Rob 的有用答案与我使用 excel 看到的不太匹配。为了匹配 excel,我将 valueList 的平均值传递给 StandardDeviation 计算。
这是我的两分钱......显然你可以从函数内部的 valueList 计算移动平均线(ma) - 但我碰巧在需要标准偏差之前已经有了。
public double StandardDeviation(List<double> valueList, double ma)
{
double xMinusMovAvg = 0.0;
double Sigma = 0.0;
int k = valueList.Count;
foreach (double value in valueList){
xMinusMovAvg = value - ma;
Sigma = Sigma + (xMinusMovAvg * xMinusMovAvg);
}
return Math.Sqrt(Sigma / (k - 1));
}
使用扩展方法。
using System;
using System.Collections.Generic;
namespace SampleApp
{
internal class Program
{
private static void Main()
{
List<double> data = new List<double> {1, 2, 3, 4, 5, 6};
double mean = data.Mean();
double variance = data.Variance();
double sd = data.StandardDeviation();
Console.WriteLine("Mean: {0}, Variance: {1}, SD: {2}", mean, variance, sd);
Console.WriteLine("Press any key to continue...");
Console.ReadKey();
}
}
public static class MyListExtensions
{
public static double Mean(this List<double> values)
{
return values.Count == 0 ? 0 : values.Mean(0, values.Count);
}
public static double Mean(this List<double> values, int start, int end)
{
double s = 0;
for (int i = start; i < end; i++)
{
s += values[i];
}
return s / (end - start);
}
public static double Variance(this List<double> values)
{
return values.Variance(values.Mean(), 0, values.Count);
}
public static double Variance(this List<double> values, double mean)
{
return values.Variance(mean, 0, values.Count);
}
public static double Variance(this List<double> values, double mean, int start, int end)
{
double variance = 0;
for (int i = start; i < end; i++)
{
variance += Math.Pow((values[i] - mean), 2);
}
int n = end - start;
if (start > 0) n -= 1;
return variance / (n);
}
public static double StandardDeviation(this List<double> values)
{
return values.Count == 0 ? 0 : values.StandardDeviation(0, values.Count);
}
public static double StandardDeviation(this List<double> values, int start, int end)
{
double mean = values.Mean(start, end);
double variance = values.Variance(mean, start, end);
return Math.Sqrt(variance);
}
}
}
/// <summary>
/// Calculates standard deviation, same as MATLAB std(X,0) function
/// <seealso cref="http://www.mathworks.co.uk/help/techdoc/ref/std.html"/>
/// </summary>
/// <param name="values">enumumerable data</param>
/// <returns>Standard deviation</returns>
public static double GetStandardDeviation(this IEnumerable<double> values)
{
//validation
if (values == null)
throw new ArgumentNullException();
int lenght = values.Count();
//saves from devision by 0
if (lenght == 0 || lenght == 1)
return 0;
double sum = 0.0, sum2 = 0.0;
for (int i = 0; i < lenght; i++)
{
double item = values.ElementAt(i);
sum += item;
sum2 += item * item;
}
return Math.Sqrt((sum2 - sum * sum / lenght) / (lenght - 1));
}
所有其他答案的问题在于他们假设您将数据放在一个大数组中。如果您的数据是即时进入的,这将是一种更好的方法。无论您如何或是否存储数据,此类都有效。它还为您提供了华尔道夫方法或平方和方法的选择。两种方法都使用单次通过。
public final class StatMeasure {
private StatMeasure() {}
public interface Stats1D {
/** Add a value to the population */
void addValue(double value);
/** Get the mean of all the added values */
double getMean();
/** Get the standard deviation from a sample of the population. */
double getStDevSample();
/** Gets the standard deviation for the entire population. */
double getStDevPopulation();
}
private static class WaldorfPopulation implements Stats1D {
private double mean = 0.0;
private double sSum = 0.0;
private int count = 0;
@Override
public void addValue(double value) {
double tmpMean = mean;
double delta = value - tmpMean;
mean += delta / ++count;
sSum += delta * (value - mean);
}
@Override
public double getMean() { return mean; }
@Override
public double getStDevSample() { return Math.sqrt(sSum / (count - 1)); }
@Override
public double getStDevPopulation() { return Math.sqrt(sSum / (count)); }
}
private static class StandardPopulation implements Stats1D {
private double sum = 0.0;
private double sumOfSquares = 0.0;
private int count = 0;
@Override
public void addValue(double value) {
sum += value;
sumOfSquares += value * value;
count++;
}
@Override
public double getMean() { return sum / count; }
@Override
public double getStDevSample() {
return (float) Math.sqrt((sumOfSquares - ((sum * sum) / count)) / (count - 1));
}
@Override
public double getStDevPopulation() {
return (float) Math.sqrt((sumOfSquares - ((sum * sum) / count)) / count);
}
}
/**
* Returns a way to measure a population of data using Waldorf's method.
* This method is better if your population or values are so large that
* the sum of x-squared may overflow. It's also probably faster if you
* need to recalculate the mean and standard deviation continuously,
* for example, if you are continually updating a graphic of the data as
* it flows in.
*
* @return A Stats1D object that uses Waldorf's method.
*/
public static Stats1D getWaldorfStats() { return new WaldorfPopulation(); }
/**
* Return a way to measure the population of data using the sum-of-squares
* method. This is probably faster than Waldorf's method, but runs the
* risk of data overflow.
*
* @return A Stats1D object that uses the sum-of-squares method
*/
public static Stats1D getSumOfSquaresStats() { return new StandardPopulation(); }
}
我们也许可以在 Python 中使用统计模块。它有 stedev() 和 pstdev() 命令分别计算样本和总体的标准差。
详细信息:https ://www.geeksforgeeks.org/python-statistics-stdev/
将统计信息导入为 st print(st.ptdev(dataframe['column name']))
这是人口标准差
private double calculateStdDev(List<double> values)
{
double average = values.Average();
return Math.Sqrt((values.Select(val => (val - average) * (val - average)).Sum()) / values.Count);
}
对于示例标准偏差,只需将上述代码中的 [values.Count] 更改为 [values.Count -1]。
确保您的集合中没有只有 1 个数据点。