PANDORA

Description

Paper

Basic Stats

General

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10295 entries, 0 to 10294
Data columns (total 38 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   author                     10295 non-null  object 
 1   mbti                       9084 non-null   object 
 2   introverted                9078 non-null   float64
 3   intuitive                  9083 non-null   float64
 4   thinking                   9080 non-null   float64
 5   perceiving                 9074 non-null   float64
 6   gender                     3227 non-null   object 
 7   age                        2324 non-null   float64
 8   enneagram                  794 non-null    object 
 9   country                    2146 non-null   object 
 10  state                      857 non-null    object 
 11  type                       1611 non-null   object 
 12  agreeableness              1606 non-null   float64
 13  openness                   1588 non-null   float64
 14  conscientiousness          1605 non-null   float64
 15  extraversion               1608 non-null   float64
 16  neuroticism                1603 non-null   float64
 17  is_description             151 non-null    float64
 18  is_percentile              368 non-null    object 
 19  is_score                   1098 non-null   float64
 20  contains_details           1386 non-null   float64
 21  num_comments               10295 non-null  int64  
 22  en_comments                10295 non-null  int64  
 23  en_comments_percentage     10295 non-null  float64
 24  region                     852 non-null    object 
 25  continent                  2146 non-null   object 
 26  country_code               2146 non-null   object 
 27  enneagram_type             794 non-null    float64
 28  enneagram_wing             790 non-null    float64
 29  is_native_english_country  2146 non-null   float64
 30  predicted_test             1677 non-null   float64
 31  test_name                  1677 non-null   object 
 32  test_scale                 1677 non-null   object 
 33  16pers_ta                  9 non-null      object 
 34  test_result_type           1677 non-null   object 
 35  is_female                  3084 non-null   float64
 36  is_female_pred             10295 non-null  int64  
 37  is_female_proba            10295 non-null  float64
dtypes: float64(20), int64(3), object(15)
memory usage: 3.0+ MB

Gender

mf
gender17531331

Continuous variables (OCEAN, age, number of comments)

ageagreeablenessopennessconscientiousnessextraversionneuroticismcommentsin english
count2324160615881605160816031029510295
mean25.6842.4062.4540.1637.3849.7818191714
std7.0731.0427.7830.3930.4732.3741043866
min140000011
25%211444131019219206
50%244069353050604569
75%29708565618217291616
max67100100999910010178999568

MBTI dimensions

introvertedintuitivethinkingperceiving
1.07152805458575315
0.01926102932233759

MBTI types

intpintjinfpinfjentpenfpistpentjistjenfjisfpisfjestpesfpestjesfj
mbti233618471074105163161740732019516312310972504329

Enneagram types

5.04.09.07.06.02.08.01.03.0
enneagram_type239164100795644414130

Continents

north americaeuropeasiaoceaniasouth americaafrica
continent132559810687264

Top 10 countries

uscanadaukaustraliagermanynetherlandsswedeneufrancefinland
country112518817773543833272724

Regions (without prefix US, Canada with prefix c)

wmwseneswcwce
region2141551451401015344

Terms of use

Please note the following in case you download the datasets:

  • You cannot transfer or reproduce any part of the dataset.
  • You cannot attempt to identify any user in the dataset.
  • You cannot contact any user in the dataset.
  • You can not display users’ names and sensitive messages publicly
  • You should check (via Reddit API) if users removed some of their content and/or deleted their accounts
  • You can report your findings publicly only on an aggregate level

Dataset Request Form

Feel free to contact us if you need one of our datasets for your research