⚠️ This is the Zenodo Sandbox instance, used for testing purposes ⚠️
DOIs created in this instance are not real and will not resolve. You can find the production instance of Zenodo at zenodo.org
Published February 16, 2024 | Version Latest
Software Open

arnestc/political-compass: Reddit Political Compass Dataset

Description

This data set is described in the paper: Ernesto Colacrai, Federico Cinus, Gianmarco De Francisci Morales and Michele Starnini. 2024. "Navigating Multidimensional Ideologies with Reddit's Political Compass: Economic Conflict and Social Affinity" In Proceedings of the ACM Web Conference 2024 (WWW '24).

Each username is consistently replaced with an anonymized number.

  • blacklist_anonymized.joblib: list of users (anonymized) classified as bots.

For each of the analyzed subreddits (/r/PoliticalCompass as PC and /r/PoliticalCompassMemes as PCM), the data set contains these CSV files.

  • submissions_anonymized_SUBREDDIT.csv: each line corresponds to a submission on the SUBREDDIT, including the anonymized username of its author, flair associated to the author (ideology on the Political Compass), and the time of creation (UTC format) of the submission. Data go from 2012 to 2022, and in code/notebooks/0.1-Data-Pre-Processing-PC-PCM.ipynb are selected only data in time period 2020-2022.
  • comments_anonymized_SUBREDDIT.csv: each line corresponds to a comment on the SUBREDDIT, including the anonymized username of its author, flair associated to the author (ideology on the Political Compass), and the time of creation (UTC format) of the submission. Data go from 2012 to 2022, and in code/notebooks/0.1-Data-Pre-Processing-PC-PCM.ipynb are selected only data in time period 2020-2022.
  • edges_anonymized_SUBREDDIT.csv: each line corresponds to a comment on the SUBREDDIT done during 2020-2022. The file lists the author of the comment, the author of the parent comment to which this comment is replying to, and the sentiment of the text of the interaction. This can be seen as a weighted graph among users.
  • popularity_anonymized_SUBREDDIT.csv: each line corresponds to the author and the list of the scores associated to each of his comments in the SUBREDDIT. Those data are used to analyze possible confounding effects of Reddit.
  • socio_demographics_anonymized_SUBREDDIT.csv: for each Reddit users of the SUBREDDIT included in the analysis, this file reports their anonymized username and their score on the age, gender, partisan, and affluence axes (included also ideologies flairs for analysis). Scores are quantile-normalized, so that i.e. a score of 0.25 indicates the 25th percentile. The axes respectively correspond to probability of being young (low) or old (high), male or female, poor or rich, and left-leaning or right-leaning.
  • edges_anonymized_with_toxicity_SUBREDDIT.csv: each line corresponds to an edge of the interaction network of the SUBREDDIT with the author of the comment, the author of the parent comment to which this comment is replying to, the body (as empty string for anonymization reasons), the social and economic ideologies of both the author and the parent author, and the toxicity value get from the original body of the comment.

Each username is consistently replaced with an anonymized string.

See the paper for more details about how we extracted this information. The total number of considered users and comments is

| SUBREDDIT | /r/PC | /r/PCM | |---------------|---------|---------| | N. nodes | 18135 | 173672 | | N. edges | 215111 | 6197901 |

Files

arnestc/political-compass-Latest.zip

Files (821.0 kB)

Name Size Download all
md5:3ed03f62500e3e2bb5c3a0c7014b1611
821.0 kB Preview Download

Additional details

Related works