Sunday, October 1, 2017

statistics formulas

Mean or Average

Mean or average, in theory, is the sum of all the elements of a set divided by the number of elements in the set. Mean could be treated as a collaborative property of the whole set of values. You can get a fairly good idea about the whole set of data by calculating its mean. Thus the formula for mean will become.
Mean = Sum of all the set elements / Number of elements
The importance of mean lies in its ability to summarize the whole dataset with a single value. For example, you may want to compare the average household income of County 1 to County 2. To compare the household incomes between the two counties you cannot compare each and every household income of one county to the other. The best solution would be to find the average household incomes of the two counties and then compare them with each other. By comparing the two means, we may make an assumption as to which county is more prosperous than the other.

Median

Simply put: Median is the middle value of a set. So, if a set consists of odd number of sets, then the middle value is the median of the set, and if the set consists of an even number of sets, then the median is the average of the two middle values. The median may be used to separate a set of data into two parts.
To find the median of a set, all one needs to do is to write the elements of the set in increasing order and find the number of elements then finally find the median. Median can prove to be a very useful property in case of any outliers in the dataset. An outlier is nothing but a very huge aberration in the values specified in the set. For example, if a set consists of values: 1, 2, 3, 4, 10000, then the value 10000 is an outlier. Outliers can make mean values deeply flawed. For example, the mean of the above set is 10010/5=2002 and the median is 3. Thus, we can definitely say that the median most properly summaries the set, better than the mean. You can learn some more about the various statistics formulas and become well acquainted with the topic.

Mode

The mode in a dataset is the value that is most frequent in a dataset. Like mean and median, mode is also used to summarize a set with a single piece of information. For example, the mode of the dataset S = 1,2,3,3,3,3,3,4,4,4,5,5,6,7, is 3 since it occurs the maximum number of times in the set S.
An important property of mode is that it is equal to the value of mean and median in the case of a normal distribution. In other distributions or skewed distributions the value of mode may differ from the two. In normal distributions the data is symmetrical to a central value. A normal distribution curve is a curve that is symmetrical to an axis. Another important property of normal distributions is that half of the values in the set are larger than the mean and half are smaller.

Variance

You may want to measure the deviation of a set of data from the mean value. For example, a huge variance of the household income data of a country may be interpreted as an economy with high inequality. Many useful interpretations can be carried out by analyzing the variance in data. The variance is obtained by:
  1. Finding out the difference between the mean value and all the values in the set.
  2. Squaring those differences.
  3. Adding the differences.
Thus, one can observe that the variance of the particular dataset is always positive. The most proper use of variance is its use in the calculation of Standard Deviation, which is one of the most important concepts of statistics. Also, the calculation of variance can be lengthy; you may want to take up a course on Vedic Mathematics which will teach you on how you can do the calculations faster.

Standard Deviation

The standard deviation is calculated by square rooting the variance of the data. The standard deviation gives a more accurate account of the dispersion of values in a dataset. Since variance is obtained by squaring the values, it cannot be applied to real world calculations. Standard deviation is calculated by obtaining the square root of the variance which is of the same unit as the elements of the set. Hence, Standard Deviation can be used as a trusted statistical quantity to make proper statistical calculations. Standard deviation is also related to probability in many ways, so you may like to take a workshop on probability and statistics to explore more about the relation between the two topics.
A standard use of deviation is finding out how much the values of the dataset differ from the mean. 

No comments:

Post a Comment