Volume 2014, Issue 1 210761
Research Article
Open Access

A Unified Weight Formula for Calculating the Sample Variance from Weighted Successive Differences

Kuo-Hung Lo

Kuo-Hung Lo

Department of Industrial Engineering and Management, Yuan Ze University, Zhongli 32003, Taiwan yzu.edu.tw

Department of Marketing & Logistics Management, Yu Da University of Science and Technology, Miaoli City 36100, Taiwan ydu.edu.tw

Search for more papers by this author
Tien-Lung Sun

Tien-Lung Sun

Department of Industrial Engineering and Management, Yuan Ze University, Zhongli 32003, Taiwan yzu.edu.tw

Search for more papers by this author
Juei-Chao Chen

Corresponding Author

Juei-Chao Chen

Department of Statistics and Information Science, Fu Jen Catholic University, New Taipei City 24205, Taiwan fju.edu.tw

Search for more papers by this author
First published: 27 August 2014
Citations: 1
Academic Editor: Chin-Chia Wu

Abstract

The basic formula to calculate sample variance is based on the sum of squared differences from mean. From computational perspective, mean calculation is nondesired as it can introduce computing errors. Previous researches have proposed to use weighted formula of the successive differences to calculate sample variance to avoid mean calculation. But their weighted formula is not in a unified format in the sense that it has to be represented as two formulas. This paper proposes a unified weight formula for sample variance calculation from weighted successive differences. A proof is provided to show that sample variance calculated using the proposed unified weighted formula is mathematically equivalent to the basic definition.

1. Introduction

Sample variance calculation is a fundamental task in many data analysis applications. The basic formula for calculating a sample variance is based on the sum of squared differences from mean. Given that a set of data is x1,  x2,   … , xn, the sample variance, denoted as , is calculated as follows:
()
where is the sum of squared differences from the mean and is the sample mean. Von Neumann et al. [1] pointed out that (1) does not take into account the order of the observations. They proposed to instead use successive differences of data so that the order can be considered. Specifically, they used
()
where the subscript i refers to temporal order of the data and di = xixi−1. Define {x0, d1, …, dn} as the successive differences of the input data. From computational perspective, von Neumann’s formula is also advantageous as it avoids a mean calculation that may introduce rounding errors.

The problem with is that it is not mathematically equivalent to the basic definition. This problem was independently solved by Eilon and Chowdhury [2] and Joarder [3] where weighted successive differences were used to derive a formula that is mathematically equivalent to the basic definition.

Eilon and Chowdhury [2] considered a job scheduling problem where they wanted to minimize the variance of the job’s waiting time. Let yi be the waiting time of the ith job. By definition, y1 = 0, as the first job does not have waiting time, and , for i = 1,2, …, n, where n is the number of jobs and pj is the processing time of job j. The objective is to minimize the variance of the waiting time, or equivalently . For this purpose, there is a need to quickly update SSn when job i and j are swapped. Notice that when job i and j are swapped, most of the jobs’ waiting time will change accordingly, and and SSn have to be recalculated. To avoid recalculating when updating SSn, Eilon and Chowdhury derived a formula to calculate SSn from successive differences. By definition, the successive differences of the waiting time are the processing time; that is, pi = yi+1yi, i = 1,  2,   … ,  n − 1. So,
()
where
()
Equation (3) is not a general formula for calculating SSn as y1 is zero. Vani and Raghavachari [4] gave a more general formula by considering the job’s completion time rather than waiting time. Let be the completion time of job i. They rewrote (3) as follows:
()
where
()
for i, j = 1,  2,   … ,  n. In an independent work, Joarder [3] also derived a formula similar to (5). He then converted its double sum structure into a quadratic form wherein
()
where is a vector of the successive differences and is a weight matrix with cij as defined in (6).

One problem with the weight formula in (6) is that it is not in a unified format but has to be represented as two formulas. This deficiency prohibits a compact representation that would facilitate further derivations. To solve this problem, we derive a unified weight formula for sample variance calculations from weighted successive differences. Joarder [3] derived an updating formula to calculate a variance from weighted successive differences. But, his formula contains a dynamically increased number of updating items. Using the unified weight formula, we show [5] that we can improve Joarder’s formula by reducing the updating items to a fixed number of only two items.

2. Main Results

Theorem 1. Given that a temporally order of the observations x1,  x2,   … , xn the sum of squared differences about the mean can be represented as

()
where , {di = xixi−1}, for x0 = 0, i = 1, 2, …, n, and are the n × n symmetric matrix with
()

Proof. First write

()
for . Now, x can be presented as
()
for
()
that is, the row i column j element of Pn×n is
()
for i, j = 1, 2, …, n.

Next, the mean of x can be written as

()
In vector form this is
()
where
()
the row i column j element of Qn×n is
()
Now observe that
()
Thus we need to obtain expressions for calculating , , , and .

First

()
where
()
Then
()
where
()
That is,
()
Finally,
()
where
()
We now can see that and, hence,
()
A direct calculation produces as follows:
()
Thus, and the proof is complete.

3. Numerical Example

This section gives a numerical example to illustrate sample variance calculation using the nonunified weight formula cij given in (6) and the unified formula wij given in (9). We take a sample data set x1 = 5, x2 = 14, x3 = 9, and x4 = 6 from Ross [6, Page 145] where the data are used to illustrate the variance updating process using the one-pass algorithm proposed in van Reeken [7]. The successive differences for this data set are
()
and the successive differences vector is . Using the nonunified weight formula, the weight matrix is constructed using two formulas: one for the lower triangular matrix and one for the strictly upper triangular matrix. For the lower triangular matrix with ij, cij = (ni + 1)(j − 1). For example, c11 = (4 − 1 + 1)(1 − 1) = 0, c21 = (4 − 2 + 1)(1 − 1) = 0, c31 = (4 − 3 + 1)(1 − 1) = 0, and c32 = (4 − 3 + 1)(2 − 1) = 2. For the strictly upper triangular matrix with i < j, the weight formula is cij = (nj + 1)(i − 1). For example, c12 = (4 − 2 + 1)(1 − 1) = 0, c13 = (4 − 3 + 1)(1 − 1) = 0, and c23 = (4 − 3 + 1)(2 − 1) = 2. Combining the lower triangular matrix and the strictly upper triangular matrix we can get
()
The variance is then calculated as
()
Now with our approach, the weight matrix is constructed using the unified weight formula given in (9). For example, w12 = (4 + 1)(1 + 2 − 1) − 1 × 2 − (4/2)(1 + 2 + |1 − 2|) = 0 and w21 = (4 + 1)(2 + 1 − 1) − 2 × 1 − (4/2)(2 + 1 + |2 − 1|) = 0. Similarly, w23 and w32 are calculated as w23 = (4 + 1)(2 + 3 − 1) − 2 × 3 − (4/2)(2 + 3 + |2 − 3|) = 2 and w32 = (4 + 1)(3 + 2 − 1) − 3 × 2 − (4/2)(3 + 2 + |3 − 2|) = 2. The other weights are calculated in a similar manner to produce
()
The variance is then calculated as
()

4. Conclusions

Sample variance calculation using weighted successive differences is advantageous from a computational perspective as it avoids a mean calculation which may introduce rounding errors. However, the weight formula that has been proposed in previous research is not in a unified format. Instead, it has to be represented as two formulas. This deficiency prohibits compact representation of further derivations. This paper derives a unified weight formula for calculating a sample variance from weighted successive differences. We have employed this compute formula to improve variance updating formula in Vani and Raghavachari [4] or Joarder [3].

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.