Question | Answer |
---|---|
Authorization | All API clients must authenticate with OAuth2. |
Data format | JSON |
Traffic rate limits | Clients connecting via OAuth2 may make up to 60 requests per minute, but also Reddit may set and enforce limits on the number of API requests that you may make or the number of users you may serve. |
API Wrappers | A bunch of them are written in various languages like Python, Javascript, Go, Rust, etc. Scroll down for a complete list. |
Is it free | Yes, if you open-source your code. |
Commercial license | If your intended usage is commercial, contact [email protected] for approval. |
Privacy policy | You must disclose in your application through a privacy policy how you collect, use, store, and disclose data collected from Reddit. |
In August 2021, we held an in-house hackathon where we decided to develop a set of graph streaming applications. The main problem we encountered was developers wasting time on finding the right accessible real-time data sources and learning how to connect to them. Out of this came the idea of consolidating this data in one place and sharing it with other developers.
And here we are now! A couple of months later, with a first out of many real-time data sources.
The first source we wanted to cover is the Reddit API. In short, it’s one of the most visited user-generated content sites in the world. It consists of a network of communities called subreddits which are based on various user interests. The users are called Redditors. I believe that most of you are already familiar with Reddit. Therefore, I will not dwell too much on the description of Reddit but go straight to the things that led you to this web page.
Yes. All API clients must authenticate with OAuth2. OAuth is an open protocol that allows secure authorization from web, mobile, and desktop applications.
Also, Reddit’s OAuth API is only accessible via HTTPS, not HTTP.
Most of the wrappers / libraries mentioned next in the article will have clear instructions on how to authorize via OAuth2.
Reddit API transmits data in JSON format. Example:
{
"kind": "string",
"data": {
"modhash": "string",
"dist": int,
"children": [{
"kind": "string",
"data": {
"approved_at_utc":"string",
"subreddit": "string",
"selftext": "string"
...,
"is_video":"boolean"
}],
"after":"",
"before:""
}
}
There are traffic rate limits. Clients connecting via OAuth2 may make up to 60 requests per minute.
Also, Reddit may set and enforce limits on the number of API requests that you may make or the number of users you may serve.
https://www.reddit.com/r/{subreddit}/{listing}.json?limit={count}&t={timeframe}
Where:
Value | Description |
---|---|
subreddit | name of the subreddit |
listing | bestby_id/namescomments/articlecontroversialduplicates/articlehotnewrandomrisingtopsort |
count | the maximum number of items to return |
timeframe | hourdayweekmonthyearall |
Language | Library / wrapper |
---|---|
Python | PRAW |
Javascript / Node.js | Snoowrap |
Rust | Roux |
.NET / C# | Reddit.NET |
Java | jReddit |
Scala | SCRAPI |
Ruby | Redd |
Go | graw |
Developers are allowed to access data to build apps on top of Reddit but must pay attention to user privacy that is defined by the Reddit API Terms of Use.
In short, you need to specify in your application through a privacy policy how you collect, use, store, and share data collected by Reddit.
Yes, if you open-source your code.
If your intended usage is commercial, you’ll need approval from Reddit by emailing [email protected]. Use of the API is considered commercial if you are earning money from it, have in-app advertising, in-app purchases, or you intend to learn from the data and sell it.
Find out Redditors who produce large discussions and traction within the community. The good news is the following real-time example lives inside your browser, and you can start playing with it right away. Also, you’ll learn a bit more about how to interpret this type of data as a graph and how to apply algorithms like the Breadth-first search tree (bfs_tree) without much difficulty.
Reddit seems like an ideal place for real-time sentiment analysis projects, i.e. to study affective states and subjective information. Check out how the guys from Memgraph approached it with Python, PRAW library & Kafka as a part of the hackathon that I’ve mentioned in the introduction.
OK, that’s it for now. Next up is Spotify API. Ping me for feedback on Discord or Linkedin. For the next couple of weeks, this will be a work-in-progress, so if you noticed that I’ve missed anything, please let me know.