The average Human Resources (HR) team is sitting on a data gold mine. There’s recruitment data, career progression data, training data, absenteeism figures, productivity data, personal development reviews, competency profiles, and staff satisfaction data, just for starters. Plus, in addition to traditional HR data sets, companies can now collect so much more data – scanning social media data, for instance, or analysing the content of emails to gauge employee sentiment. Understanding *basic statistics* is of paramount importance for every HR profession. In this article, we will discuss some of the *basic statistics* that every HR person should understand.

HR’s focus on developing strategies to improve employee experience remains of high priority. That’s where the need to understand *basic statistics* and numbers rise. Analysing HR statistics is very important for every human resource department.

To understand the *basic statistics* in terms of how to analyse them, interpret, and where to use them, we have to first understand the type of data and variables we will be dealing with.

Advertisment

**Types of Variables to understand**

**Categorical**

Qualitative data are often termed as categorical data. Data can be added into categories according to their characteristics. We have nominal and ordinal variables. Nominal Variable (Unordered list) is a variable that has two or more categories, without any implied ordering. For example, the gender of your employees (Male or Female) or the marital status of team members that is, Unmarried, Married, Divorced, etc

An Ordinal Variable (Ordered list) is a variable that has two or more categories, with clear ordering. For example, a scale used in an employee engagement survey - Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree or a rating that is, Very low, Low, Medium, Great, Very great.

**Continuous or Numeric**

Continuous data is not categorical according to their characteristics. Under continuous data, we have an interval and all forms of ratios. An interval variable is similar to an ordinal variable, except that the intervals between the values of the interval variable are equally spaced. In other words, it has order and equal intervals. For example, Annual Income in Dollars. We may have three people who make $5,000, $10,000, and $15,000. The second person makes $5,000 more than the first person and $5,000 less than the third person, and the size of these intervals is the same.

A ratio is an interval data with a natural zero point. When the variable equals 0.0, there is none of that variable. For example height, weight, a shoe size of employees. Etc

**Basic Statistics**** to know**

**The Mean**

The mean or average is almost familiar to everyone. It is used to find the average value from a set of data points. It is the sum of the observations divided by the sample size. For example to find the average age of a group of employees, the average salary for certain positions, average employee turnover, etc. One of the problems of using mean is that it is affected by extreme values. Very large or very small numbers can distort the final average value. Use mean when your data is not skewed i.e normally distributed. In other words, there are no extreme values present in the data set (Outliers).

**Median (50th Percentile)**

It is the middle value. It splits the data in half. Half of the data are above the median; half of the data are below the median. The big advantage of using the median is that it is NOT affected by extreme values. Very large or very small numbers does not affect it. Use median when your data is skewed or you are dealing with ordinal (ordered categories) data (e.g. Likert scale 1. Strongly dislike 2. Dislike 3.Neutral 4. Like 5. Strongly like)

**Percentiles**

It is the value that occurs most frequently in a dataset. One of the advantages of mode is that it can be used when the data is not numerical. Some of the shortfalls of mode are; there may be no mode at all if none of the data is the same and there may be more than one mode. Use mode when dealing with nominal (unordered categories) data.

A percentile (or a centile) is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations falls. For example, the 20th percentile is the value (or score) below which 20% of the observations may be found. For example, if you are paying an Accountant a salary that is around the 75th percentile it means for that accountant position you are paying better than 75% of the market participants.

**Coefficient of Variation (CV)**

CV is a measure of the relative variation of data values about the mean in terms of percent. CV= Standard Deviation/Mean. The smaller the CV the less relative variation there is of the data points about the mean. The larger the CV, the larger the relative variation there is of the data points about the mean.

**Mode**

**Range**

It is simply the largest observation minus the smallest observation. The range is easy to calculate. However, it is very sensitive to outliers and does not use all the observations in a data set.

**Standard Deviation**

It is a measure of the spread of data about the mean. It gives a better picture of your data than just the mean alone. Some of the weaknesses of the statistic are; it doesn't give a clear picture of the whole range of the data and it can give a skewed picture if data contain outliers.

**Skewness**

It is a measure of symmetry. Distribution is symmetric if it looks the same to the left and right of the center point.

**Kurtosis**

It is a measure of whether the data are peaked or flat relative to the rest of the data. Higher values indicate a higher, sharper peak; lower values indicate a lower, less distinct peak.

**Z scores**

Z scores are mainly used for standardization. Z score standardization is one of the most popular methods to normalize data. In this case, we rescale an original variable to have a mean of zero and a standard deviation of one. Mathematically, the scaled variable would be calculated by subtracting the mean of the original variable from the raw value and then divide it by the standard deviation of the original variable.

**Hypothesis Testing**

Let’s say you have small employee data and you are asked to assess the credibility of a statement about the whole organisation using that small data.

In other words, we use a random sample of data taken from a large group (population) to describe and make inferences about the population. For example, the executive team wants to understand if demographic variables such as Age, Gender, Education Qualifications within a company that employs 10,000 people who work in different remote areas. It is not possible to reach out to each employee to collect feedback as it's maybe a very expensive and time-consuming process.

**Statistical Significance**

Statistical significance evaluates the likelihood that an observed (actual) difference is due to chance. It deals with the following question: If we selected many samples from the same population, would we still find the same relationship between these two variables in every sample? Or is our finding due only to random chance?

**P-value**

P-value evaluates how well the sample data support that the null hypothesis is true. A low P-value means that your sample provides enough evidence that you can reject the null hypothesis for the entire population. In technical language, it means the lowest level of significance at which you can reject the null hypothesis.

**Correlation**

Correlation measures the linear relationship between two or more variables. For example the relationship between employee engagement results and an employee turnover of the relationship between internal grades and employee salaries.

Even in this age of increasing automation, robotics, and artificial intelligence, people will continue to be a central driver of success. This means that the role of the HR team is changing and, as our ability to gather and analyze ever-increasing amounts of data grows, so too do the opportunities for HR teams to add more value to the organization through statistics.

**Benjamin Sombi is a Data Scientist, Entrepreneur, & Business Analytics Manager at Industrial Psychology Consultants (Pvt) Ltd a management and human resources consulting firm.**