Statistics is a branch of mathematics that deals with the analysis done with the help of models, and representations of any provided data set or any real-life data. It deals with the methods and ways to collect, analyze, application of a suitable test or model on the data, and to take out the information by plotting the results. Some of the measures that are used in the subject are mean, variance, regression analysis, skewness, and kurtosis.
Statistics is also described as a process through which a data set is characterized. The dependence of a data set on a small population indicates that interpretations can be developed regarding the population based on the statistical results from the sample. The process comprises of collecting the data, evaluating it and then putting it into a mathematical form.
The applications of statistics are wide-ranging. It is used in areas such as psychology, all kinds of sciences, businesses, manufacturing, and political studies.
There are two types of methods that are utilized by the analysts to analyze the data, descriptive and inferential.
As the term indicates, this method of analysis is used to describe, display or summarize the data in a meaningful manner. This technique does not permit us to draw out results from the data or conclude anything from the hypothesis we’ve made.
Data, if especially large, is difficult to handle in its raw form. Therefore, descriptive statistics help us to describe it appropriately. For instance, if a teacher has 100 results of students’ exams, and she wants to know the overall performance, then descriptive statistics will help in doing this. Usually, there are two common kinds of statistic that define the data:
Central tendency measures:
Central tendency measures are the methods used to describe the central position of a frequency distribution for a data set. In this instance, the frequency distribution is merely the distribution and outline of marks gained by the hundred pupils from the bottom to the top. Central position can be described using various statistics, such as mode, median, and mean.
Measures of spread:
To summarize the data set, these techniques are used to describe how distributed the results are. For instance, if the mean score of these hundred students is 65 out of 100, does not imply that all pupils have gained a score of 65. Instead, these marks will be distributed around 65. Some will be lesser than 65 and others greater. For the description of this distribution or spread, many statistics are there, such as the variance, range, absolute deviation, quartiles, and standard deviation.
While using the descriptive statistics, analysts make use of tabulations, using graphs, charts, and tables, and then give the statistical summary of the results obtained.
We observed that the descriptive statistics give us the information regarding our concerned data set. For instance, we could perform a calculation of the standard deviation and mean of the exam scores of hundred students, and this could give us important knowledge regarding these 100 students. Any group of data under consideration is known as the population. It can be small or large. Such as in the example of 100 students, these 100 pupils will form our population as we are interested in their scores. The application of descriptive statistics rests on the populations and its characteristics such as the mean or standard deviation. These characteristics are known as the parameters which are used to represent the whole population.
There are situations where we are unable to reach the whole population like if we want to analyze the scores of all students in the UK, it is not possible to investigate the whole population. In such a case, we take a small portion known as the sample from the population which then represents the whole population. Characteristics of the sample, such as the mean or standard deviation, are known as statistics. The methods which permit us to make use of the samples so that we can make interpretations regarding the population from which the samples are taken are called the inferential statistics. Hence, it’s important for a sample to accurately represent the population. Sampling is the process by which we can achieve this. Inferential statistics comprises of the methods: (1) parameters’ estimation and (2) testing of statistical hypotheses.
Testing of hypotheses
In statistics, the analyst tests an assumption about the parameter of the population. The technique used is dependent on what kind of data set is under consideration and why is there a need to analyze it. The method of hypothesis testing deduces the outcome of a hypothesis done on sample data obtained from a bigger population.
In this process, a statistical sample is tested, with an objective of accepting or rejecting a null hypothesis. It informs the analyst regarding the truth of the main hypothesis. If it’s false, a new hypothesis is formulated and tested, iterating the procedure until data find out a true hypothesis.
Testing a statistical hypothesis is done by assessing and investigating a random sample of the population under consideration. This random population sample is used to test 2 different hypotheses which are the null hypothesis and the alternative hypothesis. The null hypothesis is defined as the hypothesis which is believed to be true. On the other hand, the alternative hypothesis is taken to be untrue.
Stages of testing the hypothesis
Testing of the hypothesis is done using a process which comprises 4 steps. First, the analyst states the two hypotheses for testing. Formulation of the analysis plan is the second step which tells about the evaluation of the data set. The plan is then carried out and analyzed. Lastly, the results are analyzed, and we either accept or reject the null hypothesis.
The p-value shows the chance of the happening of a given event. This value is taken as a substitute to the points of rejection to make the level of significance smallest at which the rejection of the null hypothesis would take place. The small value of the p-value shows that there isn’t any strong proof in favor of the alternative hypothesis.
Selection of appropriate statistical test
There are various tests used according to the nature of data set.
This test determines whether the means of 2 groups are statistically same, or if there is a noteworthy difference between them.
In cases where the t-test cannot be applied, Wilcoxon-Mann-Whitney Test is used which makes a comparison of the medians instead of the means of the two sets.
Variance Analysis (ANOVA)
It makes a comparison of the means of more than two sets to see if there’s any statistically important variation amid them.
In case of the categorical variables, the chi-squared test is used. It checks if the occurrence of a specific categorical variable is correlated to a specific dataset.
Apart from these, there are many other tests as well which are applied in different situations.
After the process of analyzing and testing, appropriate conclusions are drawn from the data set. Seeking for the most suitable outcome can be a difficult task as the answers may not be as you require. Following are some points to reflect while seeking for the results:
• The main points of the analysis
• Any other interpretations that can be drawn
• can these outcomes be supported statistically?
• Understanding of the results
• Difference of the results from expectations
Creswell, J. W. (1994). Research design: Qualitative & quantitative approaches. Thousand Oaks, CA, US: Sage Publications, Inc.
Descriptive Statistics and Interpreting Statistics. (n.d.). Retrieved February 16, 2018, from http://www.statisticssolutions.com/descriptive-statistics-and-interpreting-statistics/
Staff, I. (2015, October 01). Descriptive Statistics. Retrieved February 16, 2018, from https://www.investopedia.com/terms/d/descriptive_statistics.asp