Advertisement

Tests for Jobs: What a Century of Research Tells Us About Who Will Perform

Memory NguwiBy Memory Nguwi
Last Updated 4/1/2026
Share this article
Tests for Jobs: What a Century of Research Tells Us About Who Will Perform
Advertisement
Advertisement

Most organizations assume they know which tests for jobs work. Cognitive ability tests sit at the top. Personality questionnaires are soft and unreliable. A good interview reveals everything you need. These beliefs have shaped hiring decisions for millions of people. Some of them hold up under scrutiny. Others crumble the moment you look at the research carefully.

The science of personnel selection has accumulated more than a century of evidence on which tests for jobs predict future performance. That evidence comes from dozens of meta analyses, many of them synthesising hundreds or thousands of individual studies. The conclusions are far more nuanced than any single ranking or table can capture. Some methods genuinely predict performance. Others add little beyond what a coin flip would tell you. And several findings that were treated as settled science for decades are now the subject of vigorous, unresolved debate among researchers.

This article walks through what the research actually supports, test by test, study by study. It covers structured interviews, cognitive ability tests, personality assessments, integrity tests, work samples, situational judgement tests, and assessment centres. Where the evidence is clear, we say so. Where it is contested, we explain the disagreement and let you decide what to do with it.

Why Most Organisations Get Tests for Jobs Wrong

The story of how tests for jobs have been understood begins with one enormously influential paper. A 1998 meta analysis summarising 85 years of research on selection methods established a ranking that became gospel in the field. General mental ability was declared the single strongest predictor of job performance. Work sample tests came in just above it. Everything else fell behind. Textbooks reproduced these rankings as established fact. Entire selection systems were designed around them.

The numbers became so familiar that most practitioners stopped questioning them. And that was a problem. Because no single study, no matter how wide its scope, settles the science permanently. Meta analyses depend on the statistical corrections they apply, the studies they include, and the assumptions built into their methodology. All of these choices are open to challenge, and challenges have arrived.

The bigger issue is that organisations treat validity rankings as shopping lists. They pick the test at the top of the table, administer it to every applicant, and call it evidence based hiring. That approach ignores everything the research says about context, job complexity, how tests are combined, and the critical difference between average effects and what works for a specific role in a specific organisation.

What Tests for Jobs Actually Predict: The Evidence, Test by Test

Structured Interviews as Tests for Jobs: Consistently Strong Performers

Structured interviews have one of the longest and most consistent evidence bases of any selection method. A large scale meta analysis covering 245 validity coefficients from more than 86,000 individuals found that structured interviews consistently outperformed unstructured ones. Situational interviews, where candidates describe how they would handle hypothetical scenarios, showed higher validity than interviews built around general psychological probing.

What makes structured interviews effective is their standardisation. Every candidate faces the same questions, and responses are scored against a consistent rubric. This removes the gut feel that turns most unstructured interviews into little more than a conversation where the outcome is decided in the first few minutes. Across virtually every meta analysis that has examined interviews, structure is the single most important moderator of predictive accuracy.

More recent summaries have only strengthened this conclusion. A review of current evidence confirms that well designed structured interviews, grounded in job analysis and delivered by trained interviewers, are among the strongest selection tools available. Whether they are the single strongest tool depends on which set of validity estimates you accept, but their position near the top is not seriously contested by anyone in the field.

Cognitive Ability Tests for Jobs: Powerful, but How Powerful?

Cognitive ability tests have been studied more extensively than any other selection tool. The original 1998 meta analysis placed their validity at .51. A 2016 update summarising 100 years of research largely reaffirmed that conclusion, and for a quarter century the number shaped hiring policy, legal strategy, and organisational practice worldwide.

However, this estimate has become the subject of intense debate. A 2022 reanalysis argued that statistical corrections for range restriction had been systematically over applied across decades of meta analyses, inflating validity estimates for cognitive ability and other methods. The revised estimate for cognitive ability dropped to approximately .31. But this revision did not go unchallenged. A 2023 response argued that concurrent validation studies are indeed affected by range restriction and that failing to correct for it produces underestimates of true validity. The critique contended that the assumption underlying the revision, that range restriction does not meaningfully affect concurrent studies, lacked direct empirical support.

The debate intensified further. A 2024 methodological critique called the revised approach "flawed and misleading," arguing that comparing validity estimates where some are corrected for range restriction and others are not introduces confounds that make rankings unreliable. A 2025 rebuttal defended the principle of conservative estimation while clarifying that the original authors do not oppose correction when a credible basis exists. A separate challenge in the Journal of Applied Psychology questioned the empirical basis for treating concurrent studies as free from restriction.

What does this mean in practice? Cognitive ability tests predict job performance. That much is beyond dispute. They are particularly valuable for roles involving learning new material, solving novel problems, and managing complexity. The precise magnitude of their predictive power is genuinely uncertain, likely falling somewhere between the revised lower estimates and the original higher ones, depending on the job, the study design, and the corrections applied. The safest conclusion: cognitive ability matters, but it is not the only thing that matters, and it should not be the only test used.

Personality Tests for Jobs: Conscientiousness Leads, Context Decides

For years, personality testing was dismissed as irrelevant to job performance. Then a foundational 1991 meta analysis across tens of thousands of employees established that conscientiousness predicted performance across virtually every occupation. That finding revived the entire field of personality assessment in selection.

2000 replication study using only measures explicitly designed to assess the Big Five confirmed that conscientiousness was the most consistent predictor, but noted that observed correlations were smaller than the field's enthusiasm might suggest. Other traits, particularly agreeableness and emotional stability, showed meaningful relationships with contextual performance: the interpersonal behaviours that keep organisations running.

The most sweeping synthesis to date, a century spanning review published in the Proceedings of the National Academy of Sciences, drew on 92 meta analyses involving more than 1.1 million participants. It found that conscientiousness showed favourable effects for 98% of occupational variables studied. But the strength of the relationship depended heavily on job complexity. In highly complex roles, conscientiousness predicted performance less well. In structured, predictable roles, it predicted more strongly.

2013 meta analytic test confirmed this pattern directly, finding that conscientiousness was a stronger predictor in routinised jobs and weaker in roles with high cognitive demands. A second order meta analysis further confirmed these patterns while noting that variation in effect sizes across different meta analyses could partly be explained by sampling error at the meta analytic level. The practical message: personality tests for jobs should be matched to role type, not applied as a blanket screening tool.

Integrity Tests for Jobs: Useful but Debated

Integrity tests are designed to predict dishonest or counterproductive behaviour. A landmark 1993 meta analysis based on 665 validity coefficients across more than 576,000 data points reported a mean validity of .41 for predicting supervisory ratings of job performance and showed that integrity tests predicted counterproductive behaviours across different settings and job types.

An updated 2012 analysis using stricter inclusion criteria found a dramatically lower corrected validity of .15 for job performance. Studies authored by test publishers produced substantially higher validities than independent studies, and self report criteria inflated effects for counterproductive behaviour. The disagreement sparked a heated exchange, with the original researchers defending their broader inclusion criteria and the critics arguing that many of the included studies had methodological weaknesses.

50 year review published in 2023, covering 150 studies and more than 67,000 participants, found that integrity tests reliably predict counterproductive behaviour, though the magnitude depends on how the criterion is measured and whether the test is overt or personality based. The balance of evidence suggests integrity tests add value, particularly for predicting rule breaking and disruptive behaviour, but their validity for overall job performance is more modest and more contested than the earliest claims suggested.

Related: 5 Reasons Why Every Organisation Needs Psychometric Testing

Work Sample Tests for Jobs: Good but Not the Best

Work sample tests, where candidates perform a sample of the actual work the role requires, were long considered the strongest tests for jobs. The original 1998 summary placed their validity at .54, the highest of any method. But a 2005 updated meta analysis incorporating newer studies produced a revised estimate of .33, roughly one third lower. The decrease was attributed to the expansion of work sample tests into service sector roles, where the connection between a simulated task and actual job performance may be weaker than in traditional skilled trades.

Work sample tests still offer a meaningful signal, and candidates tend to view them as fair because the connection to the job is transparent. But the idea that they are the single most valid predictor no longer holds. Their value is greatest in manual and technical roles where the work can be directly observed and scored, and weakest in roles involving complex interpersonal or strategic demands that are difficult to simulate.

Situational Judgement Tests for Jobs: Versatile and Growing

Situational judgement tests present candidates with realistic work scenarios and ask them to choose or rank possible responses. A 2007 meta analysis found a validity of .26 for predicting job performance, with similar results regardless of whether the test used knowledge based or behavioural tendency instructions.

2010 construct level analysis classified the constructs measured by different situational judgement tests and found they most commonly assess leadership and interpersonal skills. Those measuring teamwork and leadership showed the strongest criterion related validities. The insight is that situational judgement tests are not a single thing. Their predictive power depends entirely on what constructs they measure and how the scenarios are designed.

These tests also offer practical advantages. They tend to produce smaller group differences than cognitive ability tests and are perceived as fairer by candidates. For organisations seeking a balance between prediction and fairness, well designed situational judgement tests are an increasingly attractive option.

Assessment Centres as Tests for Jobs: Thorough but Costly

Assessment centres use multiple exercises to evaluate candidates across several dimensions simultaneously. The original 1987 meta analysis covering 50 studies found a corrected mean validity of .37. A 2003 dimension level analysis extracted 258 data points from 34 studies and found true validities ranging from .25 to .39, depending on the dimension assessed. Dimensions such as problem solving, organising, and influencing others showed the strongest relationships with performance criteria.

subsequent incremental validity study demonstrated that assessment centre dimensions explain additional variance in job performance beyond both cognitive ability and personality, supporting their use for high stakes roles. But the cost of designing, administering, and scoring a proper assessment centre is substantial. For entry level hiring at volume, the cost benefit equation rarely works. For senior positions or roles with high consequence, the investment can pay for itself many times over.

The Unresolved Debate: How Valid Are Tests for Jobs Really?

No honest account of research on job tests can avoid the fact that the field is in the midst of a significant methodological disagreement. For 25 years, one set of validity estimates was treated as the definitive word. In 2022, a major reanalysis argued that validity estimates had been systematically inflated through over correction for range restriction, and proposed revised, lower estimates. That work generated more than 16 published commentaries and multiple stand alone critiques.

Critics have raised several substantial objections. One 2023 response argued that concurrent studies are indeed affected by range restriction and that the assumption they are not was made without consulting existing empirical evidence. A 2024 critique described the revised approach as creating bias and confounds when ranking selection methods. The original researchers published a 2025 rebuttal defending their position while acknowledging that their estimates represent a lower bound rather than a final answer.

The practical implication is that practitioners should treat all published validity estimates with appropriate caution. The true validity of any selection method for a specific job, in a specific organisation, is not a single number you can look up in a table. It is a range, shaped by how the test is designed, how the criterion is measured, the applicant population, and the statistical choices made in the research. What the field does agree on is that some methods are consistently stronger than others. Structured interviews, cognitive ability tests, job knowledge tests, and conscientiousness measures sit reliably near the top. Unstructured interviews, graphology, and years of experience sit reliably near the bottom. The exact ordering of the top methods is genuinely uncertain, and any claim to a definitive ranking should be viewed with scepticism.

What This Means for Choosing Tests for Jobs

If you are responsible for hiring, these findings should reshape how you think about tests for jobs. The test your organisation relies on may not be as uniquely powerful as you were told. The combination of methods you use matters more than any individual tool.

The research consistently points toward one conclusion: no single test captures the full picture of future performance. A well designed structured interview, supplemented by a job knowledge test, a conscientiousness measure, or a work sample, will capture more of the variance than any method used alone. The specific combination should depend on the role, its complexity, and the behaviours that matter most for success in your context.

Equally important: validity estimates are averages. What works across thousands of studies may not work for your specific role, organisation, or applicant pool. The best selection systems are grounded in careful job analysis, combine multiple methods, and are monitored against the outcomes that matter locally.

Key Takeaways

  1. Structured interviews have among the strongest and most consistent evidence of any selection method. Standardising the questions and scoring against a rubric is what makes them work.
  2. Cognitive ability tests predict job performance, particularly for complex and learning heavy roles. The precise magnitude of their validity is currently debated, but their usefulness is not in question.
  3. Conscientiousness is the strongest personality predictor of performance, supported by evidence spanning a century and more than one million participants. Its predictive power is stronger in structured, predictable roles and weaker in highly complex ones.
  4. Integrity tests reliably predict counterproductive behaviour such as rule breaking, theft, and absenteeism. Their validity for overall job performance is more modest and contested.
  5. Work sample tests remain useful, but updated evidence places their validity lower than originally reported, particularly outside manual and technical occupations.
  6. Situational judgement tests combine moderate predictive power with smaller group differences and greater perceived fairness, making them an increasingly popular choice.
  7. No single test should be used in isolation. The most effective selection systems combine two or three methods tailored to the specific demands of the role.

Implications for Practice: Building a Selection System With the Right Tests for Jobs

Start every selection process with a structured interview. The evidence supporting this method is strong, the practical barriers are low, and the improvement over unstructured approaches is substantial. Train your interviewers, standardise the questions around job relevant competencies, and score responses against a rubric. An unstructured conversation with a candidate is not an interview. It is a performance by both parties that predicts almost nothing.

Supplement the interview with at least one additional method matched to the role. For technical positions, a job knowledge test or work sample provides information the interview cannot capture. For roles where reliability, self discipline, and follow through are critical, a measure of conscientiousness adds genuine predictive power. For positions with high risk of counterproductive behaviour, an integrity test offers a meaningful signal.

Do not rely on a single validity ranking to choose your tools. The meta analytic debate is real, and the numbers that informed selection design for the past 25 years are being reexamined. Rather than anchoring on any one set of estimates, focus on the convergent finding: structured interviews, cognitive ability, conscientiousness, and job knowledge are consistently near the top regardless of which correction methodology you apply. Build your system around these methods and validate it locally.

Review the validity evidence behind every test you use. Do not accept vendor claims without scrutiny. Ask which meta analyses support the tool, whether the evidence has been challenged or updated, and what the results look like for your specific job family. If a vendor cannot point you to peer reviewed research supporting their product, that should tell you everything you need to know.

Monitor your system against real outcomes. The best evidence in the world is still an average. Whether your particular combination of tests predicts performance in your organisation, for your applicant pool, with your criteria, is an empirical question that only local data can answer. Track outcomes, evaluate results, and adjust. That is what evidence based hiring actually looks like.

For more on the use of assessments in hiring, these resources may be helpful: Psychometric Tests: A Survival Guide provides a practical overview of what psychometric tests measure and how to prepare for them. 5 Best Practices For Implementing Pre Employment Testing offers guidance on building a testing programme that is both valid and legally defensible.

Advertisement

Related Articles

Memory Nguwi

Memory Nguwi

Memory Nguwi is the Managing Consultant of Industrial Psychology Consultants (Pvt). With a wealth of experience in human resources management and consultancy, Memory focuses on assisting clients in developing sustainable remuneration models, identifying top talent, measuring productivity, and analyzing HR data to predict company performance. Memory's expertise lies in designing workforce plans that navigate economic cycles and leveraging predictive analytics to identify risks, while also building productive work teams. Join Memory Nguwi here to explore valuable insights and best practices for optimizing your workforce, fostering a positive work culture, and driving business success.

Ad
Advertisement

Related Articles

Advertisement