Reddit AI Trend Reports is an open source Python project. It is designed to help users automatically track and analyze trends in discussions about artificial intelligence (AI) in the Reddit community. The tool is able to grab relevant posts from user-specified Reddit subsections. Then, it summarizes the content, performs sentiment analysis, and identifies popular keywords for these posts. Finally, it generates a variety of data visualization charts that allow users to intuitively understand what's hot and community sentiment in AI. Ideal for AI researchers, market analysts, and anyone interested in the topic of AI on Reddit, this tool effectively improves the efficiency of information acquisition and analysis.
Function List
- Grabbing Reddit Posts: from a user-specified Reddit sub-panel (e.g.
r/MachineLearning
,r/StableDiffusion
,r/ChatGPT
etc.) to get the latest or popular posts. - Keyword Filtering: Filter the crawled posts according to a list of keywords set by the user, keeping only relevant content.
- emotional analysis: Automated sentiment analysis of post titles and content to determine the community's positive, negative, or neutral attitudes toward specific AI topics.
- Summary of post content: Summarize the content of posts using Large Language Models (LLMs) to quickly distill the core ideas and messages of posts.
- Supports OpenAI API as a summarization backend, which can use models such as GPT-3.5.
- Supports Hugging Face models as a summarizing backend, allowing the use of additional open source models.
- Hot Keyword Identification: Analyze post content to automatically discover and identify current hot keywords and trends in AI on Reddit.
- data visualization:: Utilization
LIDA
The library automatically generates various data visualization charts. These charts include, but are not limited to, bar charts, line charts, scatter plots, and word clouds to help users understand the data more intuitively. - Result Output: Save analysis results (including raw data, summaries, and sentiment scores) as CSV or JSON format files. The generated charts are then saved as image files.
Using Help
Reddit AI Trend Analyzer is a Python based command line tool. Users need some basic knowledge of Python operations and command line to use it. Below are the detailed installation and usage steps, as well as the flow of the main functions.
Step 1: Preparing the environment and installation
Before using this tool, you need to prepare your Python environment and install all dependencies.
- Clone Code Repository:
First, open your terminal or command line tool. Then, run the following command to clone the project's code from GitHub to your local computer.git clone https://github.com/liyedanpdx/reddit-ai-trends.git
Once the cloning is complete, go to the directory where the project is located.
cd reddit-ai-trends
- Create and activate a virtual environment:
To avoid conflicts with other Python projects on your system, it is recommended to create a separate Python virtual environment for this tool.
Create a virtual environment:python -m venv venv
Activate the virtual environment:
- on macOS or Linux systems:
source venv/bin/activate
- On Windows:
.\venv\Scripts\activate
After activating the virtual environment, your command line prompt will be preceded by the display
(venv)
, indicating that you have entered the virtual environment. - on macOS or Linux systems:
- Install project dependencies:
All of the project's dependency libraries are listed in therequirements.txt
file. After activating the virtual environment, run the following command to install them:pip install -r requirements.txt
This process can take some time depending on your internet speed.
- Get API credentials:
This tool requires access to the Reddit API for post data and the Large Language Model (LLM) API for summarizing post content. Therefore, you need access to the following credentials:- Reddit API Credentials (PRAW):
- Visit the Reddit developer page.
- Click "
are you a developer? create an app...
"(Are you a developer? Create an app...). - Select "
script
"Type. - Fill in the application name (e.g.
RedditAITrendsTracker
), description. - In "
redirect uri
"Fill in thehttp://localhost:8080
(This URL does not have to be real, but it must be filled in). - Click "
create app
". - After successful creation, you will see
client_id
(Below the app name, something likexxxxxxxxxxxxxx
) andclient_secret
(atsecret
next to the words). - You'll also need a
user_agent
. This is usually your Reddit username or a string that describes your app (e.g.RedditAITrendsTracker by u/YourRedditUsername
).
- LLM API Credentials:
This tool supports OpenAI and Hugging Face as LLM backends.- OpenAI API Key: If you want to summarize using the GPT model, you need an OpenAI API Key. visit the OpenAI website to get one.
- Hugging Face API Token: If you want to use the Hugging Face model, you need a Hugging Face API Token. visit the Hugging Face website to get one.
- Reddit API Credentials (PRAW):
- Configuring Environment Variables:
Create a file in the project root directory called.env
file. In this file, fill in the API credentials you just obtained. Make sure you keep this information local only and do not make it public.# Reddit API 凭证 REDDIT_CLIENT_ID='你的Reddit Client ID' REDDIT_CLIENT_SECRET='你的Reddit Client Secret' REDDIT_USER_AGENT='你的Reddit User Agent' REDDIT_USERNAME='你的Reddit 用户名' REDDIT_PASSWORD='你的Reddit 密码' # 注意:出于安全考虑,如果不需要发帖等操作,可以不提供密码,只使用匿名访问。 # LLM API 凭证 (选择其中一个或都配置) OPENAI_API_KEY='你的OpenAI API Key' HUGGINGFACE_API_TOKEN='你的Hugging Face API Token'
Step 2: Run and use
Once you have configured your environment and credentials, you can start running the main.py
scripts to perform analysis tasks now. Scripts control their behavior through command line arguments.
- Basic Run Commands:
The easiest way to run it is to specify the Reddit sub-panel to be crawled.python main.py --subreddits MachineLearning
This command removes all the data from the
r/MachineLearning
The sub-section grabs a default number of posts, but does not summarize, sentiment analyze, or visualize them. - Core Function Operation Flow:
- Getting and filtering posts:
utilization--subreddits
parameter specifies one or more Reddit sub-boards, with multiple boards separated by commas and no spaces.
utilization--keywords
Parameter allows filtering posts based on keywords. Only posts with titles or content containing these keywords will be processed. Multiple keywords are also separated by commas.
utilization--limit
parameter can limit the number of posts to be crawled.python main.py --subreddits MachineLearning,StableDiffusion --keywords "LLM,GPT-4,Diffusion" --limit 50
This command removes all the data from the
r/MachineLearning
cap (a poem)r/StableDiffusion
Crawl up to 50 posts containing the keywords "LLM", "GPT-4" or "Diffusion". - Perform sentiment analysis:
To perform sentiment analysis on a post, simply add the command--sentiment_analysis
Parameters.python main.py --subreddits ChatGPT --limit 20 --sentiment_analysis
This will have an impact on the
r/ChatGPT
of 20 posts were sentiment analyzed and sentiment scores were included in the results. - Summary of post content:
To enable the post summary feature, add the--summarize_posts
parameters. Also, you need to pass the--llm_backend
Specify the LLM backend to use (openai
maybehuggingface
), and through--model_name
Specify the specific model.- Summarizing with OpenAI:
python main.py --subreddits MachineLearning --limit 10 --summarize_posts --llm_backend openai --model_name gpt-3.5-turbo --summary_length 50
This command will use OpenAI's
gpt-3.5-turbo
model that summarizes the content of each post into about 50 words. - Summarizing with Hugging Face:
python main.py --subreddits StableDiffusion --limit 10 --summarize_posts --llm_backend huggingface --model_name facebook/bart-large-cnn --summary_length 100
This command will use the Hugging Face's
facebook/bart-large-cnn
model that summarizes the content of each post into about 100 words. Make sure the model you choose is a summary model.
- Summarizing with OpenAI:
- Generate data visualizations:
To generate charts automatically, add the--visualize_data
parameters. This tool will use theLIDA
The library automatically generates a variety of charts based on the captured data.python main.py --subreddits ChatGPT,MachineLearning --limit 100 --visualize_data --output_dir my_results ``` 这个命令不仅会抓取数据,还会生成图表并保存到 `my_results` 文件夹中。
- Specify the output directory:
utilization--output_dir
parameter to specify the directory where the analysis results (CSV, JSON files and generated images) are saved. If the directory does not exist, the script creates it automatically.python main.py --subreddits AITech --limit 30 --output_dir AI_Reports --summarize_posts --visualize_data
All generated files will be saved in the
AI_Reports
folder.
- Getting and filtering posts:
List of Command Line Parameters
This is a complete list of parameters supported by the script:
--subreddits
: Required. Comma-separated list of Reddit subboard names.--keywords
: Optional. Comma-separated list of keywords to filter posts.--limit
: Optional. The maximum number of posts to be crawled, default is 50.--llm_backend
: Optional. Select the LLM backend.openai
maybehuggingface
If you enable thesummarize_posts
Then it needs to be.--model_name
: Optional. the LLM model name, such asgpt-3.5-turbo
maybefacebook/bart-large-cnn
The--summary_length
: Optional. The length (number of words) of the post summary, default is 100.--output_dir
: Optional. The directory where the results and charts are saved, the default isresults
The--sentiment_analysis
: Optional. Performs sentiment analysis if present.--summarize_posts
: Optional. Summarizes the post if it exists.--visualize_data
: Optional. If present, generates a data visualization chart.
By combining these parameters, you have the flexibility to configure and run the Reddit AI Trend Analyzer tool according to your needs.
application scenario
- AI researchers track technology hotspots
By analyzingr/MachineLearning
mayber/ArtificialIntelligence
With the posts in these sections, researchers can quickly learn about the latest research results, popular algorithms, and industry trends, so they can adjust their research direction. - Market Analysts Gain Insight into AI Product User Sentiment
Market analysts can monitorr/ChatGPT
or a specific AI product community, the sentiment analysis function is used to understand users' reactions and emotions towards new features, updates, or competing products, providing data to support product strategy. - Content creators looking for AI-related hot topics
Self-published authors or bloggers can use this tool to identify trending topics and keywords about AI on Reddit to create content that is more popular with readers and increase the readership and interactivity of their articles. - AI developer monitoring tool or framework community feedback
Developers can track sub-boards related to specific AI frameworks (e.g., TensorFlow, PyTorch) or tools (e.g., Stable Diffusion) to collect issues, feature requests, and usage experiences encountered by users in order to improve the product.
QA
- Q: How do I get Reddit API credentials?
A: You need to visit the Reddit developer pagehttps://www.reddit.com/prefs/apps/
. Create a "script
" type of app and fill in the necessary app information. After successful creation, the page will display yourclient_id
cap (a poem)client_secret
. At the same time, you need to set up the application with auser_agent
The - Q: Why do I need an LLM API Key?
A: The LLM API Key is used to invoke the Large Language Modeling service. This tool uses LLM to automatically summarize the content of Reddit posts. If you need to summarize using OpenAI's GPT model or other models on Hugging Face, you must provide the appropriate API Key or Token. - Q: What LLM models are supported for post summarization?
A: If you choose OpenAI as your backend, you can use thegpt-3.5-turbo
and other various models supported by OpenAI. If you choose Hugging Face as your backend, you can use any model from the Hugging Face model library that is suitable for text summarization tasks, such asfacebook/bart-large-cnn
. You need to specify this on the command line based on the name of the model. - Q: How do I specify multiple Reddit subsections or keywords?
A: On the command line, use--subreddits
maybe--keywords
When parameterizing, simply comma multiple sub-panels or keywords with the,
Separation is sufficient. Example:--subreddits MachineLearning,ChatGPT
maybe--keywords "LLM,Diffusion"
. Please note that there should be no spaces before or after the comma. - Q: I don't have a basic knowledge of Python, can I use this tool?
A: This tool is a Python based command line script that requires the user to run it in a terminal or command line. Therefore, you will need to understand basic Python environment configuration, virtual environment operations, and the use of command line arguments. If you have no Python or command line experience at all, you may need to learn some basics before you can use it smoothly.