Introverts and Extroverts

Ethan Holden
3 min readJan 10, 2020
A vector chart showing extrovert and introvert with ambivert in the overlap

When I started this project, my goal was to find a data set that would illustrate that most people fall somewhere in the middle of the introvert/extrovert spectrum instead of being placed at either extreme end of the spectrum.

I found a data set of an online personality survey that asked 91 different questions, each asking the survey taker, on a scale of 1–5, how much they agreed with a given statement. After the questionnaire was completed each individual was asked whether they self-identified as an introvert, an extrovert, or neither. For those interested, the raw data set can be found here and the study can be found here.

The first thing I wanted to explore was if a Primary Component Analysis, or PCA, would reveal anything about the data. This is a process that would reduce the number of inputs from 280 down to 2. I found this wasn’t as helpful as I thought it might be. The first problem was that the percentage of explained data that still held after the PCA, was only 8.4% and if I wanted to get even 80%, I had to keep over 200 of the initial inputs, making it too hard for the human mind to visualize. While this was not very helpful in analyzing the data, it did show there was some sort of relationship between the questions asked in the survey and whether their answer related to the person’s personality type.

I did, however, find that the fact that a PCA had such a hard time with this data set, was evidence that personality type is not clearly one or the other, but takes a very large number of components to be able to make that determination.

Next, I took the data and used it to identify which questions were more likely to have higher introvert or extrovert answers. I then took the average answer given based on personality type and graphed it. I also graphed the difference between the two types in order to see what questions might be more likely to indicate a person was one personality type or the other. In this case, the positive blue lines indicate more introverts gave a favorable response to the question and negative blue lines show the opposite, that extroverts gave a more favorable response.

While most questions have a clearly dominant personality type, not all of them have the same difference between the types. This shows us that even though one type might favor certain questions, that does not make them a determining factor on which group a person will fall under but rather only a small indication on which direction you can move them towards within the spectrum.

As a side note, after looking at this data, and making my conclusions, I explored the data that showed how long each person took to answer each question in milliseconds. After taking out the most extreme outliers from the data, I got the below chart.

I then took the average time for all the questions to see which group was faster at answering the different questions and found that more often than not, extroverts were faster than introverts are, even when I made the comparison of who answered questions faster when those questions were ones they were more likely to disagree with. After thinking about it, I realized that this made the most sense as extroverts are more action-based type people and introverts are more introspective and often take more time to think things through.

If you are interested in the code I used to explore this data, you can find it here.

--

--

Ethan Holden

I am a Data Scientist in training at Lambda School and a certified culinarian. Boardgame enthusiast, and father of one.