Volume 2014, Issue 1 210761

Research Article

Open Access

A Unified Weight Formula for Calculating the Sample Variance from Weighted Successive Differences

Kuo-Hung Lo

Department of Industrial Engineering and Management, Yuan Ze University, Zhongli 32003, Taiwan yzu.edu.tw

Department of Marketing & Logistics Management, Yu Da University of Science and Technology, Miaoli City 36100, Taiwan ydu.edu.tw

Search for more papers by this author

Tien-Lung Sun,

Tien-Lung Sun

orcid.org/0000-0002-8408-404X

Department of Industrial Engineering and Management, Yuan Ze University, Zhongli 32003, Taiwan yzu.edu.tw

Search for more papers by this author

Juei-Chao Chen,

Corresponding Author

Juei-Chao Chen

[email protected]

orcid.org/0000-0001-5653-8035

Department of Statistics and Information Science, Fu Jen Catholic University, New Taipei City 24205, Taiwan fju.edu.tw

Search for more papers by this author

Kuo-Hung Lo,

Kuo-Hung Lo

Department of Industrial Engineering and Management, Yuan Ze University, Zhongli 32003, Taiwan yzu.edu.tw

Department of Marketing & Logistics Management, Yu Da University of Science and Technology, Miaoli City 36100, Taiwan ydu.edu.tw

Search for more papers by this author

Tien-Lung Sun,

Tien-Lung Sun

orcid.org/0000-0002-8408-404X

Department of Industrial Engineering and Management, Yuan Ze University, Zhongli 32003, Taiwan yzu.edu.tw

Search for more papers by this author

Juei-Chao Chen,

Corresponding Author

Juei-Chao Chen

[email protected]

orcid.org/0000-0001-5653-8035

Department of Statistics and Information Science, Fu Jen Catholic University, New Taipei City 24205, Taiwan fju.edu.tw

Search for more papers by this author

First published: 27 August 2014

https://doi.org/10.1155/2014/210761

Citations: 1

Academic Editor: Chin-Chia Wu

Share a link

Email
Wechat
Bluesky

Abstract

The basic formula to calculate sample variance is based on the sum of squared differences from mean. From computational perspective, mean calculation is nondesired as it can introduce computing errors. Previous researches have proposed to use weighted formula of the successive differences to calculate sample variance to avoid mean calculation. But their weighted formula is not in a unified format in the sense that it has to be represented as two formulas. This paper proposes a unified weight formula for sample variance calculation from weighted successive differences. A proof is provided to show that sample variance calculated using the proposed unified weighted formula is mathematically equivalent to the basic definition.

1. Introduction

Sample variance calculation is a fundamental task in many data analysis applications. The basic formula for calculating a sample variance is based on the sum of squared differences from mean. Given that a set of data is x₁, x₂, … , x_n, the sample variance, denoted as

, is calculated as follows:

()

where

is the sum of squared differences from the mean and

is the sample mean. Von Neumann et al. [1] pointed out that (1) does not take into account the order of the observations. They proposed to instead use successive differences of data so that the order can be considered. Specifically, they used

()

where the subscript i refers to temporal order of the data and d_i = x_i − x_i−1. Define {x₀, d₁, …, d_n} as the successive differences of the input data. From computational perspective, von Neumann’s formula is also advantageous as it avoids a mean calculation that may introduce rounding errors.

The problem with is that it is not mathematically equivalent to the basic definition. This problem was independently solved by Eilon and Chowdhury [2] and Joarder [3] where weighted successive differences were used to derive a formula that is mathematically equivalent to the basic definition.

Eilon and Chowdhury [2] considered a job scheduling problem where they wanted to minimize the variance of the job’s waiting time. Let y_i be the waiting time of the ith job. By definition, y₁ = 0, as the first job does not have waiting time, and

, for i = 1,2, …, n, where n is the number of jobs and p_j is the processing time of job j. The objective is to minimize the variance of the waiting time, or equivalently

. For this purpose, there is a need to quickly update SS_n when job i and j are swapped. Notice that when job i and j are swapped, most of the jobs’ waiting time will change accordingly, and

and SS_n have to be recalculated. To avoid recalculating

when updating SS_n, Eilon and Chowdhury derived a formula to calculate SS_n from successive differences. By definition, the successive differences of the waiting time are the processing time; that is, p_i = y_i+1 − y_i, i = 1, 2, … , n − 1. So,

()

where

()

Equation (3) is not a general formula for calculating SS_n as y₁ is zero. Vani and Raghavachari [4] gave a more general formula by considering the job’s completion time rather than waiting time. Let

be the completion time of job i. They rewrote (3) as follows:

()

where

()

for i, j = 1, 2, … , n. In an independent work, Joarder [3] also derived a formula similar to (5). He then converted its double sum structure into a quadratic form wherein

()

where

is a vector of the successive differences and

is a weight matrix with c_ij as defined in (6).

One problem with the weight formula in (6) is that it is not in a unified format but has to be represented as two formulas. This deficiency prohibits a compact representation that would facilitate further derivations. To solve this problem, we derive a unified weight formula for sample variance calculations from weighted successive differences. Joarder [3] derived an updating formula to calculate a variance from weighted successive differences. But, his formula contains a dynamically increased number of updating items. Using the unified weight formula, we show [5] that we can improve Joarder’s formula by reducing the updating items to a fixed number of only two items.

2. Main Results

Theorem 1. Given that a temporally order of the observations x₁, x₂, … , x_n the sum of squared differences about the mean can be represented as

()

where

, {d_i = x_i − x_i−1}, for x₀ = 0, i = 1, 2, …, n, and

are the n × n symmetric matrix with

()

Proof. First write

()

for

. Now, x can be presented as

()

for

()

that is, the row i column j element of P_n×n is

()

for i, j = 1, 2, …, n.

Next, the mean of x can be written as

()

In vector form this is

()

where

()

the row i column j element of Q_n×n is

()

Now observe that

()

Thus we need to obtain expressions for calculating

, and

First

()

where

()

Then

()

where

()

That is,

()

Finally,

()

where

()

We now can see that

and, hence,

()

A direct calculation produces as follows:

()

Thus,

and the proof is complete.

3. Numerical Example

This section gives a numerical example to illustrate sample variance calculation using the nonunified weight formula c_ij given in (6) and the unified formula w_ij given in (9). We take a sample data set x₁ = 5, x₂ = 14, x₃ = 9, and x₄ = 6 from Ross [6, Page 145] where the data are used to illustrate the variance updating process using the one-pass algorithm proposed in van Reeken [7]. The successive differences for this data set are

()

and the successive differences vector is

. Using the nonunified weight formula, the weight matrix

is constructed using two formulas: one for the lower triangular matrix and one for the strictly upper triangular matrix. For the lower triangular matrix with i ≥ j, c_ij = (n − i + 1)(j − 1). For example, c₁₁ = (4 − 1 + 1)(1 − 1) = 0, c₂₁ = (4 − 2 + 1)(1 − 1) = 0, c₃₁ = (4 − 3 + 1)(1 − 1) = 0, and c₃₂ = (4 − 3 + 1)(2 − 1) = 2. For the strictly upper triangular matrix with i < j, the weight formula is c_ij = (n − j + 1)(i − 1). For example, c₁₂ = (4 − 2 + 1)(1 − 1) = 0, c₁₃ = (4 − 3 + 1)(1 − 1) = 0, and c₂₃ = (4 − 3 + 1)(2 − 1) = 2. Combining the lower triangular matrix and the strictly upper triangular matrix we can get

()

The variance is then calculated as

()

Now with our approach, the weight matrix

is constructed using the unified weight formula given in (9). For example, w₁₂ = (4 + 1)(1 + 2 − 1) − 1 × 2 − (4/2)(1 + 2 + |1 − 2|) = 0 and w₂₁ = (4 + 1)(2 + 1 − 1) − 2 × 1 − (4/2)(2 + 1 + |2 − 1|) = 0. Similarly, w₂₃ and w₃₂ are calculated as w₂₃ = (4 + 1)(2 + 3 − 1) − 2 × 3 − (4/2)(2 + 3 + |2 − 3|) = 2 and w₃₂ = (4 + 1)(3 + 2 − 1) − 3 × 2 − (4/2)(3 + 2 + |3 − 2|) = 2. The other weights are calculated in a similar manner to produce

()

The variance is then calculated as

()

4. Conclusions

Sample variance calculation using weighted successive differences is advantageous from a computational perspective as it avoids a mean calculation which may introduce rounding errors. However, the weight formula that has been proposed in previous research is not in a unified format. Instead, it has to be represented as two formulas. This deficiency prohibits compact representation of further derivations. This paper derives a unified weight formula for calculating a sample variance from weighted successive differences. We have employed this compute formula to improve variance updating formula in Vani and Raghavachari [4] or Joarder [3].

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

1 von Neumann J., Kent R. H., Bellinson H. R., and Hart B. I., The mean square successive difference, The Annals of Mathematical Statistics. (1941) 12, 153–162, https://doi.org/10.1214/aoms/1177731746, MR0004436.
10.1214/aoms/1177731746
Web of Science® Google Scholar
2 Eilon S. and Chowdhury I. G., Minimising waiting time variance in the single machine problem, Management Science. (1977) 23, no. 6, 567–575, MR0443962.
10.1287/mnsc.23.6.567
Web of Science® Google Scholar
3 Joarder A. H., Sample variance and first-order differences of observations, The Mathematical Scientist. (2003) 28, no. 2, 129–133, MR2030029.
Google Scholar
4 Vani V. and Raghavachari M., Deterministic and random single machine sequencing with variance minimization, Operations Research. (1987) 35, no. 1, 111–120, https://doi.org/10.1287/opre.35.1.111, MR908865, 2-s2.0-0023170104.
10.1287/opre.35.1.111
Web of Science® Google Scholar
5 Sun T. L., Lo K. H., and Chen J. C., An updating formula to calculate sample variance from weighted successive differences, manuscript.
Google Scholar
6 Ross S. M., Introduction to Probability and Statistics for Engineers and Scientists, 1987, Wiley, New York, NY, USA.
Google Scholar
7 van Reeken A. J., Dealing with neelys algorithms, Communications of the ACM. (1968) 11, no. 3, 149–150.
10.1145/362929.362961
Web of Science® Google Scholar

All articles

A Unified Weight Formula for Calculating the Sample Variance from Weighted Successive Differences

Abstract

1. Introduction

2. Main Results

3. Numerical Example

4. Conclusions

Conflict of Interests

References

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

A Unified Weight Formula for Calculating the Sample Variance from Weighted Successive Differences

Abstract

1. Introduction

2. Main Results

3. Numerical Example

4. Conclusions

Conflict of Interests

References

References

Related

Information