Close icon

Reddit JSON API Real-Time Streaming Data Source

TLDR

Question Answer
Authorization All API clients must authenticate with OAuth2.
Data format JSON
Traffic rate limits Clients connecting via OAuth2 may make up to 60 requests per minute, but also Reddit may set and enforce limits on the number of API requests that you may make or the number of users you may serve.
API Wrappers A bunch of them are written in various languages like Python, Javascript, Go, Rust, etc. Scroll down for a complete list.
Is it free Yes, if you open-source your code.
Commercial license If your intended usage is commercial, contact [email protected] for approval.
Privacy policy You must disclose in your application through a privacy policy how you collect, use, store, and disclose data collected from Reddit.

Introduction

In August 2021, we held an in-house hackathon where we decided to develop a set of graph streaming applications. The main problem we encountered was developers wasting time on finding the right accessible real-time data sources and learning how to connect to them. Out of this came the idea of consolidating this data in one place and sharing it with other developers.

And here we are now! A couple of months later, with a first out of many real-time data sources.

The first source we wanted to cover is the Reddit API. In short, it’s one of the most visited user-generated content sites in the world. It consists of a network of communities called subreddits which are based on various user interests. The users are called Redditors. I believe that most of you are already familiar with Reddit. Therefore, I will not dwell too much on the description of Reddit but go straight to the things that led you to this web page.

Do I need to authenticate myself if I want to use Reddit API?

Yes. All API clients must authenticate with OAuth2. OAuth is an open protocol that allows secure authorization from web, mobile, and desktop applications.

Also, Reddit’s OAuth API is only accessible via HTTPS, not HTTP.

Most of the wrappers / libraries mentioned next in the article will have clear instructions on how to authorize via OAuth2.

[What type of data can you access with the Reddit API?]

Reddit API transmits data in JSON format. Example:

{
    "kind": "string",
    "data": {
        "modhash": "string",
        "dist": int,
        "children": [{
            "kind": "string",
            "data": {
                "approved_at_utc":"string",
                "subreddit": "string",
                "selftext": "string"
                ...,
                "is_video":"boolean"
            }],
        "after":"",
        "before:""
    }
}

Reddit API limitations

There are traffic rate limits. Clients connecting via OAuth2 may make up to 60 requests per minute.

Also, Reddit may set and enforce limits on the number of API requests that you may make or the number of users you may serve.

How to get the data from Reddit API?

  1. Via HTTP GET requests to subreddit URL. Similar to this:
https://www.reddit.com/r/{subreddit}/{listing}.json?limit={count}&t={timeframe}

Where:

Value Description
subreddit name of the subreddit
listing best
by_id/names
comments/article
controversial
duplicates/article
hot
new
random
rising
top
sort
count the maximum number of items to return
timeframe hour
day
week
month
year
all
  1. Via wrappers or libraries:
Language Library / wrapper
Python PRAW
Javascript / Node.js Snoowrap
Rust Roux
.NET / C# Reddit.NET
Java jReddit
Scala SCRAPI
Ruby Redd
Go graw

Important things around user privacy

Developers are allowed to access data to build apps on top of Reddit but must pay attention to user privacy that is defined by the Reddit API Terms of Use.

In short, you need to specify in your application through a privacy policy how you collect, use, store, and share data collected by Reddit.

Is the Reddit API free?

Yes, if you open-source your code.

When do I have to pay for using Reddit API?

If your intended usage is commercial, you’ll need approval from Reddit by emailing [email protected]. Use of the API is considered commercial if you are earning money from it, have in-app advertising, in-app purchases, or you intend to learn from the data and sell it.

Examples of what can I do with this data?

  1. Find out Redditors who produce large discussions and traction within the community. The good news is the following real-time example lives inside your browser, and you can start playing with it right away. Also, you’ll learn a bit more about how to interpret this type of data as a graph and how to apply algorithms like the Breadth-first search tree (bfs_tree) without much difficulty.

  2. Reddit seems like an ideal place for real-time sentiment analysis projects, i.e. to study affective states and subjective information. Check out how the guys from Memgraph approached it with Python, PRAW library & Kafka as a part of the hackathon that I’ve mentioned in the introduction.


OK, that’s it for now. Next up is Spotify API. Ping me for feedback on Discord or Linkedin. For the next couple of weeks, this will be a work-in-progress, so if you noticed that I’ve missed anything, please let me know.

Table of Contents
Sign up for our Newsletter