Zotero-arXiv-Daily is an open source tool designed to help researchers automatically get recommended arXiv papers related to the Zotero literature base via GitHub Actions. Users simply fork the project on GitHub, configure the Zotero API key and email service, and receive a daily list of new papers that match their research interests. The tool generates paper summaries (TL;DR) using AI by analyzing the literature in the Zotero repository and sends the recommended results, sorted by relevance, to the user's email address. The project runs for free within GitHub Actions' public repository quota and requires no additional software installation, making it suitable for researchers who need to keep track of the academic frontier.
Function List
- Automatically get new papers from arXiv and recommend relevant papers based on the content of the user's Zotero library.
- AI was used to generate short summaries (TL;DR) of each paper to facilitate rapid screening.
- Supports custom arXiv paper categories such as AI, computer vision, natural language processing, etc.
- Sends recommended papers to user's email inboxes through a daily automated run of GitHub Actions.
- Support to get medical field papers from medRxiv (need to configure relevant environment variables).
- Provides debugging mode (Test-Workflow) to retrieve a fixed number of papers at any time.
- Recommended results are sorted by relevance, taking into account when papers were added to the Zotero library.
- Support for filtering unwanted Zotero literature collections to avoid irrelevant recommendations.
Using Help
Installation and Configuration
Zotero-arXiv-Daily runs through GitHub Actions and requires no local software installation. Here is the detailed configuration process:
- Fork Warehouse
interviewshttps://github.com/TideDr/zotero-arxiv-daily
To copy the repository to your GitHub account, click on the "Fork" button in the upper right corner.<你的用户名>/zotero-arxiv-daily
Forms exist. - Get Zotero API key
- Log in to the Zotero website (
https://www.zotero.org
), go to the "Settings" page. - In the "API Key" section, generate a new API key to ensure read access.
- Copy the generated key and save it for later use.
- Log in to the Zotero website (
- Configuring GitHub Actions environment variables
- Go to your Fork repository and click "Settings" > "Secrets and variables" > "Actions" > "New repository secret".
- Add the following environment variables:
ZOTERO_USER_ID
: Your Zotero user ID, which can be found in Zotero Settings.ZOTERO_API_KEY
: The Zotero API key generated in the previous step.ARXIV_QUERY
: target arXiv paper category, concatenated with "+", e.g.cs.AI+cs.CV+cs.NLP
(Refer to the arXiv website for category abbreviations).SMTP_SERVER
: Mailbox SMTP server address (e.g., Gmail forsmtp.gmail.com
).SMTP_PORT
: SMTP port number (e.g. Gmail is587
maybe465
).SENDER_EMAIL
: The e-mail address to send the e-mail to.SENDER_PASSWORD
: Authentication password for mailbox SMTP service (Note: Gmail needs to use "application-specific password").RECEIVER_EMAIL
: The e-mail address to receive the results of the recommendation.MAX_PAPER_NUM
: Maximum number of papers to recommend at a time (it is recommended to set it to 5-10, generating TL;DR is time consuming).- Optional:
MEDRXIV_DAYS
cap (a poem)MEDRXIV_SUBJECTS
, used to enable medRxiv paper recommendations.
- Save all variables.
- Enabling GitHub Actions
- Go to the "Actions" tab in the Fork repository and enable workflows.
- By default, the master branch (
main
) workflowSend-emails-daily
It runs automatically every day, retrieving new papers posted the day before. - Can be manually triggered
Test-Workflow
Debugging to get recommendations for a fixed 5 papers.
- Check logs
- Look at the Actions tab to see the workflow run log. If there are no new papers for a weekend or holiday, the log may show "No new papers found".
Main Functions
- Daily Essay Recommendations
The tool fetches new papers from arXiv's Atom feed every day, based on abstracts from the Zotero library, using the SentenceTransformer model (the defaultavsolatorio/GIST-small-Embedding-v0
) calculates the relevance. Recommendation results are sorted by score and contain the paper title, author, abstract, AI-generated TL;DR and download link to the configured email address. The email content is organized in HTML format to clearly display information about each paper. - AI generates TL;DR
The TL;DR for each paper is generated by a large language model and takes about 70 seconds/paper. Users can use theMAX_PAPER_NUM
Control the number of recommendations to avoid running timeouts.TL;DR succinctly summarizes the core content of the paper to help users quickly determine whether they need to read in depth. - Support medRxiv
By setting theMEDRXIV_DAYS
(e.g.7
(indicating the past 7 days) andMEDRXIV_SUBJECTS
(e.g.Clinical Research
), the tool can get medical papers from medRxiv. Recommendation results are grouped by source (arXiv and medRxiv) in the email for easy reading. - debug mode
Test-Workflow
Allows the user to run the workflow at any time to retrieve 5 fixed papers for testing if the configuration is correct. The results of the run are also sent to the email address and the logs can be viewed at GitHub Actions.
caveat
- Ensure that the Zotero library has a sufficient number of papers (especially those containing abstracts) to improve recommendation accuracy.
- SMTP configuration needs to be accurate and it is recommended to use an email account that is not frequently used to avoid security issues.
- Regularly check the Fork repository and merge upstream updates (
TideDra/zotero-arxiv-daily
) for new features and fixes. - The GitHub Actions public repository has a runtime limit, and it is recommended that you set a small
MAX_PAPER_NUM
(e.g., 5) to ensure that the mission is accomplished.
application scenario
- Academic researchers follow developments in the field
Researchers can use the tool to receive daily recommendations of new papers related to their research direction (e.g., AI, Physics), saving the time of manually browsing arXiv and quickly understanding the latest research progress. - Students prepare a literature review
Graduate students can use the tool to collect the latest papers in related fields, combined with AI-generated TL;DR to quickly screen valuable literature to assist in thesis writing and review preparation. - Extended reading for interdisciplinary researchers
Interdisciplinary researchers can configure multiple arXiv categories (e.g.cs.AI+physics.astro-ph
), obtaining recommended papers in different fields, and broadening research horizons. - Medical researchers follow medRxiv
Researchers in the field of medicine can enable medRxiv support to access clinical research or public health related papers and keep abreast of cutting-edge medical developments.
QA
- How can I make sure that the recommended papers are relevant to my research interests?
The tool calculates the relevance of new papers by analyzing the abstracts of papers in the Zotero library in conjunction with the SentenceTransformer model. It ensures that the Zotero library contains papers that are relevant to the research direction, and periodically purges irrelevant literature to improve recommendation accuracy. - Why didn't I get an email over the weekend?
arXiv usually does not publish new papers on weekends and holidays, and the log may show "No new papers found". This is a normal phenomenon, and recommendations will be resumed on weekdays. - How do I add medRxiv support?
Set the GitHub Actions environment variable in theMEDRXIV_DAYS
(e.g.7
(math.) andMEDRXIV_SUBJECTS
(e.g.Epidemiology
) to enable medRxiv paper recommendations. The email will display arXiv and medRxiv papers separately. - What if the run time is too long?
Generating TL;DR is time-consuming and it is recommended that theMAX_PAPER_NUM
Set it to 5-10. if it still times out, try lowering the number of papers or using thedev
An optimized version of the branch. - How do I update my repository for new features?
Regular visitshttps://github.com/TideDr/zotero-arxiv-daily
, check for updates. If there are new features, merge the upstream repository into your Fork repository, as described in the GitHub documentation.