Validity and Reliability of The Strong Interest Inventory®

Like any psychological analysis or survey instrument, the Strong Interest Inventory® Assessment and its four subcategories have been very carefully constructed. Each component has been normed to help job-seekers compare their own results to those of the general population. They have also been rigorously tested to ensure that they are both valid and reliable, even when individuals take the assessments years later. Understanding how each assessment is constructed can provide job-seekers and career coaches a valuable foundation to accurately interpret the results of their assessment and ultimately apply them to achieve better vocational outcomes. In the following discussion, we overview the norming, validity, and reliability of each of the four subcategories of the Strong Interest Inventory® Assessment, as well as some strategies and best practices for overcoming challenges in interpretation.

TAKE THE STRONG INTEREST INVENTORY!

General Occupational Themes: Norming, Reliability, and Validity

In order to accurately interpret one’s results, it is helpful to have a basic understanding of how the General Occupational Themes (GOTs) are constructed, including how they are normed as well as how experts can ensure their reliability and validity.

The GOT construction is simple in principle. The theme items are weighted depending on how a person responds. For example, if a person responds “Strongly Dislike” to the item Carpenter, which is weighted on the Realistic theme, then this person’s raw score for Realistic decreases by 2. Other responses have other weights (“Dislike” = -1; “indifferent” = 0; “Like” = +1; “Strongly Life” = +2).

These raw scores are normed based on a General Representative Sample (GRS) of 2,250 people (50% men, 50% women). Norming is a statistical process for determining the typical performance of a group. This “norm” is then used to convert raw scores to standard scores (Mean = 50; Standard Deviation = 10) to better understand how the performance of a given individual compares to that of the group. This way, individuals can determine how strong their preferences are, compared to others in the population.

The GOT has been tested repeatedly for reliability, and it has been found that people typically get similar results on this scale regardless of which version of the GOT they take (recall that it has been revised periodically since its initial release in the early 20^th century), as well as if they take it months, or even years later. Numerous other studies have been done over the last three decades to test GOT’s validity and have found that the GOT is the strongest measure of the personality traits on the RAISEC hexagon, and that it is equally valid for both women and men.

Basic Interest Scales: Norming, Reliability, and Validity

The Basic Interest Scales (BISs) were developed by grouping items that correlate with each other and share similar content. For example, interests like “Literature”, “Foreign Languages”, and “Journalism” are grouped together under “Writing & Mass Communication”. In the assessment, individuals rate their interest in each individual item (e.g., “Strongly Like”, “Like”, “Indifferent”, “Dislike”, “Strongly Dislike). Then, their responses are quantified so they can be more easily compared to a general norm.

The established norm uses the 2004 General Representative Sample (GRS) as a reference, with a mean of 50 and a standard deviation of 10. Individuals’ scores are converted to this distribution, so it is easy to evaluate how an individual compares to the population as a whole. Note that these norms do account for gender – so a woman’s responses are normed against those of other women and a man’s responses are normed against those of other men. For example, if a female individual has an adjusted score of 70 on a given interest, then their score is two standard deviations above the mean for all women, meaning it is higher than 97.5% of all female respondents. This is a very high proclivity for that interest. On the other hand, if their score is 45, then they are very close to indifferent, compared to the total population of female respondents.

Researchers have also examined “test-retest reliability”—that is, how consistent individuals’ scores are across multiple tests, even months or years apart. They have found that for adults, the test-retest reliability is very high—most adults have consistent answers and consistent scores over time. While the BIS Evaluation may be less reliable when used with younger individuals (r = .56 with high school students; r = .68 with college students; compared with r = .82 for adults), these correlations are still high enough to warrant using BISs with young adults. This variation is expected, not because of any issue with the metric itself, but rather because students’ interests commonly change during these formative years, as their horizons broaden, and their locus of experience widens.

Similarly, dozens of studies over several decades have demonstrated that the BISs are valid for distinguishing among groups, including those in or seeking to pursue various professions (e.g., musicians score similarly to other musicians; chemists to other chemists, etc.). Similar distinctions are also evident among pre-existing groups. For example, the responses of women and men differ systematically, which is why norming is gender-dependent.

Learn About The Reliability and Validity of The Strong Interest Inventory Assessment Test

Occupational Scales: Norming, Reliability, and Validity

The process for developing an Occupational Scale (OS) was devised by E. K. Strong and involved identifying how the responses of members of a particular occupation differ from responses of a general sample of employed adults. For example, if members of a given occupation respond “Strongly Like” to an item like “Computer Science” more than the general population, then liking computer science becomes an identifying feature of that group—a way of distinguishing them from the general population.

Once the necessary items were determined and appropriately weighted, the next step is to “norm the scales” – that is, adjust the raw scores using a conversion formula to yield a mean of 50 and a standard deviation of 10. In this way, comparing an individual’s score to the general population is standardized, and one can easily determine whether one’s preference for a given item is higher, lower, or comparable to that of the general population.

Equally important is ensuring the reliability and validity of the OSs. Tests are reliable if their results are stable over time. In other words, an individual’s results should remain relatively consistent if they retake the assessment a few weeks, months, or even years later. Studies have demonstrated that mean correlation is well over .72, with an upper bound of over .90, demonstrating a high degree of reliability. Other studies have examined OSS concurrent validity—its ability to discriminate between two groups of people—and found that the OSS has high validity for tightly defined, distinct fields, such as Surgeons, Medical Illustrators, and Athletic Trainers, and lower validity for those that are less well-defined, such as Administrative Assistants or Paralegals.

Personal Style Scales: Norming, Reliability, and Validity

Norming the Personal Style Scales (PSSs) was done similarly to the other scales—a General Representative Sample (GRS) of 2,250 individuals consisting of 50% women and 50% men was used as a reference to convert raw scores to normed scores with a median of 50 and a standard deviation of 10. Then, individuals can use their normed scores to gauge where their scores fall with respect to the general population.

The PSSs have been evaluated based on their internal consistency and their test-retest reliability. Internal consistency for all five scales is fairly high (alpha = 0.82-0.87, depending on the scale). Test-retest reliability was measured using a sample of 174 participants, some of whom were retested 2-7 months later while others were retested 8-23 months later. In both cases, the general stability was good, with the Learning Environment and Work Styles scales having the highest correlation (0.86-0.91). While the Team Orientation correlation was lower—between 0.70 and 0.77—it is still high enough to suggest reasonable reliability.

Further studies evaluated the validity of the PSSs by examining how they relate to and correlate with the occupational scales. These correlations clearly demonstrate that the PSS is valid. For example, people who rank high on the Elementary School Teacher Occupational Scale are also likely to score high on the ‘works with people” Personal Style Scale. On the other hand, those who score high on the Chemist scale are likely to rank low on the “works with people” scale and high on the “works with ideas/data/things” scale. These patterns appear again and again across multiple scales.

Strategies and Challenges in Interpretation of The Strong Interest Inventory®

Accurate interpretation is key to applying the results of the Strong Interest Inventory® assessment. As a result, the assessment is most effective when the client understands the tool’s strengths and application, the career professional is an expert interpreter, and these two individuals thoroughly and honestly discuss career interests. While each relationship and individual will have slightly different needs, there are some commonalities that apply across cases. In this section, we will first overview a number of general strategies and best practices. Then, we will examine ways of addressing specific challenges.

Strategy 1: Before the assessment is administered, it is important to lay a strong foundation and prepare the job-seeker to take the assessment by helping them understand what the inventory is and how it works. For example, they should be informed that the inventory will help them make career decisions by shedding light on patterns in their personal interests as well as how their patterns map onto those of professionals in many different careers. The questions touch on many different jobs, interests, activities, and more. It is not a test of their ability in any given field, but rather a way of measuring how much they like each item. They should also be told not to overthink their answers—there are no right or wrong answers, and the first “gut feeling” response is generally the best. Note that while the test can be used with any age group (though it is not generally administered to youth younger than 13 years old), it is written roughly at a 9^th grade reading level. Some careers, such as “Actuary” or “Radiologic Technologist” may not be familiar to all job-seekers or to those who are not proficient in English. In these cases, it is permissible to offer definitions or explanations for these items. The Inventory® is not a graded test or a measure of aptitude. Providing this kind of guidance will make the Profile scores more meaningful than if the job-seeker simply guesses the meaning of unknown terms.

Strategy 2: Career professionals should study the Profile before their client arrives for their interpretation session. They should understand how the scales relate to one another and should check the Profile for consistency by examining Section 6—the typicality index. If there is an unusually high number of atypical responses, then the test may not be valid—for instance if the person struggled to understand many items, or even filled out certain parts randomly. You should also confirm that at least 276 of the 291 items had responses and look for patterns among any skipped items. If you find the assessment is valid, you can move on to identifying any parts of it that are overly flat or elevated. You should also notice consistencies between the various sections. Use your analysis to develop hypotheses about the client’s interest to be discussed with them during the interpretation session.

Strategy 3: Organize your discussion with your client in three parts: Introduction, Interpretation, and Exploration. During the Introduction phase, start by reviewing the purpose of the Inventory and reiterate that it is designed to measure interests, not abilities. Emphasize that it explores both general and personal and lifestyle interests as well as specific occupational interests. Then, discuss how the client felt about taking the Inventory—was the process interesting? Frustrating? Challenging? You may even want to ask about their experience on the day of the assessment, as any extreme emotions (having a doctor’s appointment that morning, or even getting into a car accident on the way to the assessment) may have skewed their responses. Finally, be sure to emphasize that the client is the expert on their own preferences. As they review their results, have them think about why they answered the way they did and perhaps what experiences may have influenced their responses. During the Interpretation phase, you may want to start with a brief explanation of the six occupational interests and how they relate to one another. Then, discuss each section of the Inventory. Define each Theme (section one) and explain any interests that seem to align or oppose that theme. Then, discuss the pattern of likes and dislikes in the Basic Interest Scales, the Occupational Scales, and the Personal Styles Scales. In the last phase, Exploration, focus on the Theme code that best represents the client’s profile and consider additional vocational possibilities that might be a good fit for them.

Strategy 4: Look for patterns among the various scales. Most of the time, the profiles will be “consistent”. In other words, the GOTs, BISs, and OSs will all point to the same general interest areas. You should discuss each of these scales in turn and examine how they could support your client.

Challenge 1: The first challenge you may encounter is a “Flat Profile”—those with many “Little” to “Very Little” interest scores and little differentiation among Occupational Scales. Job-seekers may have “flat” profiles for many reasons, including indecisiveness, low self-esteem, an underdeveloped vocational identity, and even family or peer pressure. Once the reason for the flat profile is identified, that reason should be addressed with vocational counseling, coaching, and possibly therapy.

Challenge 2: Another interpretive challenge is an “Elevated Profile”. This is the opposite of a “Flat Profile” and characterizes those that have many “Like” or “Strongly Like” responses. Some job-seekers may have “elevated” profiles because they want to seem positive or want to please everyone. In these cases, it may be helpful to start with the 2-3 highest GOTs, then move to the 5 highest BISs and the 10 highest Oss. The Profile Summary is particularly useful for these job-seekers. In other, rarer cases, job-seekers may have highly diverse talents and interests. In these cases, counseling that explores each option or opportunity may be the best option.

Challenge 3: The last common challenge is when clients have high scores on opposite GOTs (Realistic and Social, Investigative and Enterprising, or Artistic and Conventional. Clients with these results may feel conflicted about their long-term career options or may have trouble reconciling diverse interests. Some options for resolving these apparent conflicts include counseling clients to choose one interest for their vocation and the other for personal hobbies or performing a job activity associated with one interest while in an environment that reflects the other (e.g., working in the technical or mechanical aspects of music or theater production).

The strategies for interpretation and conflict resolution provided here serve as a strong start for helping clients appropriately use the results of their Strong Interest Inventory® Assessment to guide their career decisions.

References:

Strong Interest Inventory Manual (Donnay, D et al. CPP, 2005)

Strong Interest Inventory Manual Supplement (Thompson, Richard, 2005, CPP Inc.)

Validity and Reliability of The Strong Interest Inventory® Assessment

BULK PRICING AVAILABLE

TO CONTACT US

General Occupational Themes: Norming, Reliability, and Validity

Basic Interest Scales: Norming, Reliability, and Validity

Occupational Scales: Norming, Reliability, and Validity

Personal Style Scales: Norming, Reliability, and Validity

Strategies and Challenges in Interpretation of The Strong Interest Inventory®