Basic statistics

Before using advanced analysis methods like, for example, discriminant analysis or multiple regression, you must first of all reveal the data in order to identify trends, locate anomalies or simply have available essential information such as the minimum, maximum or mean of a data sample [1].

A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and analysing those statistics. Descriptive statistics is distinguished from inferential statistics (or inductive statistics) by its aim to summarize a sample, rather than use the data to learn about the population that the sample of data is thought to represent. This generally means that descriptive statistics, unlike inferential statistics, is not developed on the basis of probability theory, and are frequently nonparametric statistics. Even when a data analysis draws its main conclusions using inferential statistics, descriptive statistics are generally also presented. For example, in papers reporting on human subjects, typically a table is included giving the overall sample size, sample sizes in important subgroups (e.g., for each treatment or exposure group), and demographic or clinical characteristics such as the average age, the proportion of subjects of each sex, the proportion of subjects with related co-morbidities, etc [2].

Some measures that are commonly used to describe a data set are measures of central tendency and measures of variability or dispersion. Measures of central tendency include the mean, median and mode, while measures of variability include the standard deviation (or variance), the minimum and maximum values of the variables, kurtosis and skewness [2].

In our XLSTAT training, we offer you a large number of descriptive statistics and charts which give you a useful and relevant preview of your data.

Quantitative data

No. of values used

No. of values ignored

No. of min./max. value

% of min./max. value

Minimum/Maximum

1st quartile

Median

3rd quartile

Range: difference between the maximum and the minimum,

Sum of the weights (if any)

Total

Mean

Geometric mean

Harmonic mean

Kurtosis (Pearson), Skewness (Pearson)

Kurtosis, Skewness

CV (standard deviation/mean)

Sample variance

Estimated variance

Standard deviation of a sample

Estimated standard deviation

Mean absolute deviation

Standard deviation of the mean

Qualitative data

No. of categories

Mode

Mode frequency

Mode weight

% mode

Relative frequency of the mode

Frequency

Weight of the category

%: percentage of the category

Relative frequency of the category

Also some plots are very helpful to visualize the distribution.

Charts created for quantitative variables

Box plots

Scattergrams

Strip plots

Q-Q plots

p-p plots

Stem and leaf plots

Charts created for categorical variables

Bar charts

Pie charts

Double pie charts

Doughnuts

Stacked bars

Multiple bars

References

[1] https://www.xlstat.com/en/solutions/features/descriptive-statistics-including-box-plots-and-scattergrams

[2] https://en.wikipedia.org/wiki/Descriptive_statistics