The employment interview is the most widely used personnel selection method and, in its unstructured form, among the least valid. A substantial body of peer reviewed research associates the unstructured interview with modest criterion related validity, low interrater reliability, and pronounced susceptibility to perceptual and decisional bias. This review examines 11 documented weaknesses and the evidence based remedy.
Across most organizations the interview functions as the decisive stage of selection. Application screening determines who enters the process, and standardized tests are frequently regarded with caution, yet the interview is the point at which managers believe they form their most accurate judgment of a candidate. That belief receives limited support from the empirical literature. The attributes that lend the interview its apparent diagnostic value, namely its interpersonal and unscripted character, are precisely the attributes that render it vulnerable to influence and resistant to consistent scoring.
The caution extends to the validity estimates themselves. A recent reassessment of the selection literature revised the validity of widely used predictors, concluding that a longstanding correction for range restriction had overstated the apparent predictive power of several methods, the interview among them; this reanalysis is itself the subject of active scholarly debate, examined later in this review. The pertinent question for HR professionals and hiring managers is consequently not whether the interview exhibits weaknesses, but how consequential those weaknesses are and by what means they may be controlled.
Selection decisions shape the quality of the workforce, and selection error is costly to remediate through performance management, retraining, or turnover. A method that admits the biases documented below therefore imposes a direct organizational cost, which warrants subjecting the interview to the scrutiny applied to any instrument used in consequential decisions. The 11 weaknesses examined here are each grounded in peer reviewed evidence, and they are followed by the single intervention that addresses the majority of them.
The evidence summarized in this review derives predominantly from meta analyses, controlled experiments, and field studies published in peer reviewed journals, rather than from practitioner opinion or isolated observation. Where a conclusion rests on a particular study, its design and sample are indicated so that the reader may weigh the strength of the evidence accordingly. The intention is to distinguish findings that are well established across many studies from those that rest upon a more limited body of work, and to ground each recommendation in the former wherever possible.
A taxonomy of interview weaknesses
For the purposes of this review, a weakness of the employment interview is any characteristic that causes its judgments to diverge from a candidate's actual capacity to perform the role. The weaknesses examined here fall into three categories. Structural weaknesses derive from the absence of standardized questions and scoring. Perceptual weaknesses concern the systematic distortions in what an interviewer observes. Decisional weaknesses concern how observations are combined into a judgment. As the concluding sections establish, the introduction of structure mitigates all three categories at once.
The significance of any individual weakness is conditioned by context, including the base rate of suitable candidates and the volume of recruitment. In high volume hiring, small biases recur across many decisions and aggregate into a measurable effect on workforce quality; in senior appointments, a single distorted decision may be consequential in itself. In both circumstances, the interview is rarely the only instrument available, which renders its disproportionate influence over the final decision difficult to justify on the evidence.
1. Limited criterion related validity of the unstructured interview
The unstructured interview, in which questions are improvised and candidates are evaluated on global impression, ranks among the weakest predictors of subsequent job performance available to employers. A 1994 meta analysis of 245 validity coefficients obtained from 86,311 individuals established that structured interviews predicted performance at approximately twice the level of unstructured interviews. A 2023 reappraisal of the selection literature reached a convergent conclusion, locating the structured interview among the stronger available predictors while the unstructured form occupied a substantially lower position. The two formats are designated by a common term yet operate as distinct instruments, and the format in widest use is the less valid of the two. The distinction carries operational consequences, since the statement that an organization interviews its candidates conveys little about the validity of the process; that validity depends almost entirely on whether the questions and the scoring are fixed in advance.
In the interest of balance, the structured interview fares comparatively well even within these more conservative estimates. In the reappraisal cited above, it emerged as the predictor with the highest mean validity, which indicates that the downward revision bears more heavily upon other instruments, cognitive ability tests in particular, than upon the structured interview itself.
2. The primacy of initial impressions
Interviewers frequently arrive at a provisional evaluation before the substantive portion of the interview commences. An early study of structured interviews found that the impression formed within the opening minutes was strongly associated with the ratings recorded at the conclusion of the interview, and with the eventual decision to extend an offer. The informal exchange that opens the interview may therefore exert greater influence over the outcome than the deliberately constructed, job related questions that follow it. For the design of selection systems, this indicates that the least regulated segment of the interview can acquire disproportionate weight, and that refinement of the formal question set yields limited benefit where the initial impression is left unmanaged.
A qualification is warranted. Part of the early impression reflects genuine interpersonal skill, which constitutes a legitimate requirement in roles with substantial relational demands; in such positions, the influence of the initial impression is not wholly biased. It remains a source of error, however, wherever the role does not call for those skills, and the interviewer is rarely in a position to distinguish the two in the moment.
3. The persistence of initial impressions within structured formats
Standardization does not fully insulate the interview against this effect. A later experiment employing 163 simulated structured interviews demonstrated that impressions formed during the preliminary rapport stage carried forward into the scoring of the standardized questions, displacing ostensibly objective ratings toward the interviewer's initial reaction. Structured formats attenuate the effect without eliminating the influence of the interviewer, which establishes that standardization must extend to the conduct of the entire interaction rather than to the question set alone, and that interviewer training should explicitly address the rapport stage.
4. The undetectability of deceptive impression management
The majority of candidates engage in the deliberate management of the impression they convey, and a considerable proportion misrepresent. Research distinguishing honest from deceptive impression management found that observers, including experienced interviewers, are unable to reliably distinguish the two. The prevalence of the behavior is substantial. A validation study conducted across six samples comprising more than 1,300 candidates reported that in excess of 90% engaged in some degree of faking during interviews, with the proportion approaching outright misrepresentation ranging from approximately 28% to 75%, depending on the tactic employed. Because deception evades detection, fluent misrepresentation may be rewarded while candid disclosure is penalized, a pattern that undermines the common contention that an experienced interviewer can discern sincerity and strengthens the case for verifying material claims through independent means.
5. The influence of impression management on ratings
Impression management not only evades detection; it also affects the ratings themselves. A tactics meta analysis reported that impression management behaviors are strongly associated with interviewer ratings, and that across studies of student applicants between 77% and 99% employed ingratiation, the cultivation of interviewer liking. The candidate most adept at managing the interaction, rather than the most capable, tends to receive the most favorable evaluation. This constitutes a systematic advantage for socially skilled candidates that is unrelated to the requirements of most roles, and standardized scoring constrains the advantage without removing it. In positions for which interpersonal skill is itself a core requirement, such as sales or client-facing roles, the bias is of lesser concern; for the majority of roles, it introduces error rather than signal.
6. Low interrater reliability
A selection instrument must demonstrate reliability to be useful, and the unstructured interview does not. A reliability meta analysis drawing upon 111 interrater coefficients found that agreement between interviewers increases materially only when both the questions and the scoring are standardized. In its unstructured form, two interviewers evaluating the same candidate frequently reach divergent conclusions, which entails that the outcome depends in part upon the identity of the interviewer rather than upon the attributes of the candidate. Low reliability imposes a ceiling on validity, since an instrument on which assessors cannot agree cannot consistently track a stable criterion such as job performance; it also exposes the organization to inconsistent treatment of comparable candidates.
7. Appearance effects on evaluation
Physical appearance influences interview judgments even where it bears no relation to the requirements of the role. A meta analysis of experimental studies found that more physically attractive candidates received more favorable evaluations across a range of job related outcomes, that experienced professionals were as susceptible to the effect as students, and that the provision of additional job relevant information did not eliminate it. The magnitude of the effect is modest but consistent, and it accumulates with the other perceptual biases described in this review. Because the effect operates largely outside conscious awareness, instructing interviewers to disregard appearance is insufficient, and structural controls such as evaluation against standardized criteria are more effective.
The effect is also not uniformly advantageous. A systematic review of attractiveness in labor markets found that, although attractiveness generally benefits men, it can disadvantage women applying for roles stereotyped as masculine, a pattern termed the beauty is beastly effect that has been observed for women in several studies and for men in none. Appearance therefore introduces not a single directional bias but a set of interacting biases involving gender and role type, which complicates any straightforward correction.
8. The overweighting of verbal fluency
The interview assesses conversational performance, which is not the central requirement for most positions. A mock interview study of 130 candidates found that the initial impressions governing interviewer judgments corresponded closely with extraversion and verbal fluency, even after candidates' qualifications had been controlled. Reserved but capable candidates are accordingly disadvantaged relative to fluent candidates who may possess no greater capacity to perform the work itself. Where a role does not require frequent verbal performance, the interview measures an attribute that is misaligned with the demands of the position and may systematically disadvantage otherwise qualified applicants.
9. The effect of anxiety on capable candidates
Interview anxiety depresses performance within the interview without necessarily reflecting performance in the role. The research that developed a recognized measure of anxiety in selection interviews found that higher anxiety was associated with weaker interview performance, and a 2018 meta analysis confirmed a consistent negative relationship between interview anxiety and interview performance across the available studies. Because anxiety in a high stakes evaluative setting is only loosely related to day to day effectiveness, the interview may exclude capable candidates on the basis of nervousness, with implications for both predictive accuracy and the perceived fairness of the assessment.
The magnitude of the relationship should be kept in proportion. The meta analytic association is modest rather than large, and it is more pronounced in mock interviews than in genuine high stakes selection, which counsels against overstating the effect even as it is taken seriously.
10. Demographic and accent differences in ratings
A meta analysis of interview evaluations found that Black and Hispanic applicants received lower ratings than White applicants, with the largest differences observed for lower complexity roles and for less structured formats. Related research indicates that a candidate's accent alone can affect ratings. For organizations relying upon the unstructured interview, these findings represent a material fairness concern and a source of legal exposure under equal employment legislation in many jurisdictions. The documentation of job relatedness and the standardization of the process, therefore, function as measures of legal defensibility as well as of assessment quality, particularly where adverse impact may be alleged.
These differences are attenuated, although not eliminated, when interviews are structured and tied to defined job related criteria, which constitutes a further argument for standardization on grounds of equity as well as of accuracy.
The differences should also be read in comparative terms. Subgroup differences in interview ratings are generally smaller than those produced by cognitive ability tests, so relative to certain alternatives a structured, job related interview can reduce rather than increase adverse impact, a consideration relevant to the design of the wider selection system.
11. The limitations of intuitive combination
Even when an organization assembles sound information, the interview encourages decision makers to override that information with intuition. A meta analysis of more than 150 studies found that the combination of candidate information by a fixed formula predicted performance substantially better than its combination through holistic human judgment, improving prediction by more than 50%, and that the advantage obtained even among experienced experts familiar with the role. Intuition retains value in the generation of hypotheses concerning a candidate, but as the terminal method of combining evidence it diminishes accuracy. The superiority of mechanical over holistic combination is among the more robust findings in the selection literature and has been observed across both employment and academic admissions contexts.
The advantage of mechanical combination, although consistent, is not absolute. A broader meta analysis of clinical against mechanical prediction across psychology and medicine found mechanical methods to be superior or equal in the large majority of cases, yet human judgment proved substantially more accurate in a minority, on the order of one study in ten, typically where a rare and decisive piece of information was available that no formula anticipated. Mechanical combination is therefore best treated as the default rather than as an inviolable rule, with documented provision for human override in genuinely exceptional cases.
These weaknesses are not independent of one another. A candidate who generates a favorable initial impression through fluency and appearance benefits concurrently from several biases; the interviewer's inability to detect impression management permits that advantage to proceed unexamined; and reliance upon intuitive combination allows it to enter the final decision. Within the unstructured interview, these effects compound. Standardization interrupts them at several points at once, which accounts for the capacity of a single intervention to address a range of otherwise distinct problems.
Counterpoints and scholarly disagreement
An evidence review is incomplete without attention to the disputes within the evidence itself. Three qualifications deserve particular emphasis, the first of which bears directly upon the validity estimates relied upon throughout this article.
The reanalysis cited at the outset, which revised the apparent validity of several predictors downward, has been contested by other scholars who argue that it does not correct sufficiently for range restriction and therefore understates true validity rather than overstating it. The dispute is unresolved. Its direction is nonetheless instructive, since if the critics are correct the validity of the structured interview, in common with that of other well constructed predictors, is higher than the conservative figures imply, which would strengthen rather than weaken the case for a well designed interview. It is further notable that the structured interview ranked first among predictors in the reanalysis under challenge, so neither position in the debate supports abandoning it.
The second qualification concerns the interpretation of the perceptual biases. Several of the effects documented above are modest in magnitude, and some carry a degree of valid signal. The early impression that influences interviewer judgment reflects, in part, genuine interpersonal skill that certain roles require, and the social facility underlying favorable impression management is itself job relevant where the work is relationship intensive. The biases are real and consequential in aggregate, but they are neither uniformly large nor uniformly irrelevant, and the boundary between extraneous bias and job relevant signal depends upon the demands of the specific role.
The third qualification concerns the combination of evidence. The superiority of mechanical over holistic combination, although well established, admits exceptions, and human judgment retains an advantage in a minority of cases involving rare and decisive information. The appropriate inference is that mechanical combination should function as the default, accompanied by structured and documented provision for human override, rather than as an absolute prohibition on judgment.
None of these qualifications overturns the central conclusion; each refines it. The weight of the evidence continues to favor a structured and scored interview used in combination with other valid instruments, and the principal scholarly disputes, to the extent that they bear upon the interview at all, tend to raise its estimated value rather than to lower it.
Why the interview persists despite the evidence
The interview endures for reasons that are organizationally rational, notwithstanding its psychometric limitations. It is inexpensive and familiar, and it serves functions beyond prediction, among them the attraction of preferred candidates and the reciprocal exchange of information through which both parties assess fit. A structural feature additionally sustains interviewer overconfidence. Interviewers seldom observe the subsequent performance of rejected candidates, because such candidates are not engaged, and in the absence of this feedback, an interviewer may conduct a large number of interviews without discovering that particular judgments were mistaken. The impression of diagnostic insight consequently persists without correction.
These functions are legitimate and account for the continued presence of the interview in most selection processes. The deficiency lies not in the use of the interview but in the weight assigned to its least reliable form, and in the assumption that an unstructured conversation constitutes rigorous assessment. Whether the interview predicts performance is therefore conditional upon its format. A structured interview that administers identical job related questions to every candidate and scores them against a defined rubric constitutes a reasonable predictor; an improvised interview does not. Standardization accounts for the majority of the difference, which is the basis for treating the interview as two distinct methods rather than as one.
Remediation through structure
The majority of the weaknesses enumerated above diminish substantially when the interview is structured and scored according to defined rules rather than according to global impression. Structure operates by withdrawing interviewer discretion at the junctures at which bias enters. A fixed question set prevents the conversation from favoring particular candidates; a scoring rubric directs attention to the response rather than to the person; and independent scoring prior to discussion limits the influence of the most senior or most assertive member of a panel. Each design feature corresponds to a specific weakness, with fixed questions and rubrics raising reliability and validity, standardized criteria narrowing demographic differences, and independent scoring constraining both the contamination of ratings by initial impressions and the dominance of senior voices in the final determination.
The supporting evidence is consistent. Structuring the interview approximately doubles its predictive validity, increases agreement between interviewers, and narrows the demographic differences that widen within unstructured formats. A structured interview review integrating narrative and quantitative analysis of the literature concluded that the addition of structure reliably improves both validity and fairness, and identified the specific design features responsible. The combination of the interview score with other assessment results through a predetermined rule, rather than through holistic discussion, preserves these gains. Structure also clarifies the contribution the interview makes alongside other instruments, which is the relevant consideration when it forms one element of a multi method selection process. The implementation sequence is well established:
- Establish the competencies required by the role through formal job analysis rather than assumption.
- Develop a standardized set of job related questions and administer them, unchanged, to every candidate.
- Construct a scoring rubric incorporating defined behavioral anchors for strong, adequate, and weak responses.
- Require each interviewer to score independently and to record those scores before any panel discussion.
- Combine interview scores with other assessment data by a predetermined rule rather than by consensus.
These measures are modest in cost relative to the consequences of selection error. The investment lies principally in design and in interviewer training, after which a structured interview requires no more time to administer than an unstructured one while yielding more accurate and more defensible decisions.
Design considerations for the structured interview
Two question formats predominate. Situational questions present a hypothetical, job related scenario and elicit the candidate's intended response; behavioral questions request an account of the candidate's conduct in a relevant past situation. Both formats outperform unstructured questioning when responses are scored against defined criteria, and the structured interview review cited above documents their comparative performance. The selection between them depends upon the role and upon whether prospective reasoning or past behavior constitutes the more relevant indicator of the competency under assessment.
Several further choices strengthen the method. The consistent ordering of questions, the restriction of prompts and follow up enquiries to a defined set, the use of multiple trained interviewers, and the recording of judgments against anchored rating scales each contribute to reliability and validity. Systematic note taking supports accurate scoring and furnishes the documentation required should a decision subsequently be reviewed. Panel interviews improve reliability when members score independently but undermine it when a single dominant view governs the group rating, which is the reason that independent scoring prior to discussion is a requirement rather than a refinement.
Implications for HR professionals and hiring managers
The principal implication is that the interview should be conducted as a structured, scored procedure rather than an open conversation. Organizations are advised to derive interview questions from a current job analysis, standardize those questions across candidates, attach a defined scoring rubric, and require independent scoring prior to discussion. This single measure addresses limited validity, low reliability, contamination by initial impressions, and significant demographic bias concurrently, and it demands investment in process discipline rather than additional expenditure.
Decision rules merit equivalent attention. A favorable initial impression should not be treated as evidence of competence, and interview scores should be combined with other assessment data by a predetermined rule rather than through open deliberation, in which the most senior participant tends to prevail. Where decisions carry substantial consequence, valid instruments such as work samples and structured assessments should be accorded at least equal weight rather than permitted to be overridden by the interview. A documented decision rule additionally establishes an audit trail that supports internal review and external scrutiny, and it reduces the scope for the inconsistent application of judgment across candidates.
Fairness and legal risk warrant specific governance. Because unstructured interviews generate demographic differences in ratings and admit accent and appearance effects, organizations operating across multiple jurisdictions should document the job relatedness of their interview criteria, train interviewers in structured scoring, and retain interview records sufficient to support an adverse impact review. Structured, job related interviews are at once more predictive and more defensible.
Interviewer capability requires deliberate development rather than reliance upon experience. Training that familiarizes interviewers with the rubric, affords practice in the scoring of recorded responses, and calibrates ratings across assessors improves consistency, and interviewer training featured as a moderator of agreement within the reliability evidence cited above, which indicates that the competence of the assessor, and not only the design of the instrument, determines the quality of the result. Selection quality should likewise be monitored rather than assumed. Where practicable, organizations should track interview scores against subsequent performance and retention, examine outcomes for differences across demographic groups, and apply the findings to refine questions and rubrics over time. The treatment of the interview as a measurement instrument subject to continuing validation, rather than as a fixed managerial prerogative, distinguishes a mature selection function.
The interview will continue to occupy a central position in hiring. The evidence indicates that its value is determined almost entirely by how it is structured and how its results are combined with other information, both of which lie within the organization's control rather than the individual interviewer's.
Key takeaways
- The weaknesses of the employment interview are concentrated in the unstructured format, which predicts job performance weakly and yields low agreement between interviewers.
- Interviewers commonly form a provisional judgment within the opening minutes, and that initial impression influences the scoring of the structured questions that follow.
- Most candidates engage in impression management and a considerable proportion misrepresent, and interviewers cannot reliably distinguish honest from deceptive presentation.
- Physical appearance, verbal fluency, anxiety, and demographic and accent differences each affect interview ratings independently of job related ability.
- Holistic combination of candidate information is less accurate than a fixed scoring formula, which improved prediction of performance by more than 50% in a large meta analysis.
- Structuring the interview through standardized questions, defined rubrics, and independent scoring addresses the majority of these weaknesses concurrently.
- Structured, job related interviews are also more defensible against fairness and legal challenge than unstructured conversations.
Related reading on The Human Capital Hub
For a comparison of the interview with other selection methods and guidance on their combination, see the overview of recruitment and selection methods. For measures that limit the biases examined here, see the discussion of controlling selection bias. For the job analysis that should anchor interview criteria, see this guide to job analysis.







