Recently, following Claude Code
After the attention generated by tools such as Google, Google has also launched its free counterpart Gemini CLI
. This tool is designed to integrate powerful generative AI capabilities directly into the command line terminal, opening up new possibilities for automated task processing and local file management.
Command-line AI: a new paradigm for human-computer interaction
together with Cursor
Such AI-native integrated development environments (IDEs) are different.Gemini CLI
There is no graphical user interface (GUI) with its counterparts. All interactions are done in the Terminal through text commands. This positioning makes it not a closed programming software, but a general-purpose AI assistant that can manipulate local files and invoke system functions.
Its core strength is that it acts as a "translator" between natural language and traditional command-line tools. Users can give commands in everyday language, while the Gemini CLI
It is responsible for generating and executing the appropriate and precise command line code. It has built-in Google search, file reading and writing, content finding and other basic tools. Users can type /tools
command to see all the built-in features currently supported by the model.
In addition.Gemini CLI
be in favor of MCP
(Model Capability Pack) framework, which allows developers or advanced users to install additional toolkits to extend its functionality and further enhance the model's ability to handle complex tasks.
Easy for non-programmers to get started
For users unfamiliar with programming, the "command line" often implies complexity and a high threshold. However, using Gemini CLI
The core interaction is typing natural language prompts, not writing code. This means that the experience is not fundamentally different from common AI conversation tools, as long as the network environment ensures a smooth login.
Installation and configuration in two steps
All of the demonstrations in this guide are based on the terminal that comes with macOS. While most of the actions are common to Windows systems, the latter may encounter more environmental issues during configuration.
Step 1: Preparing a working catalog
Before you begin, it is highly recommended to create a new dedicated folder for all the material needed for this assignment. This is a good security practice to ensure that Gemini CLI
All file operations are restricted to this directory to avoid unintended impact on important system files.
Once you have prepared the folder, open the Terminal application and type in cd
(cd followed by a space), then paste the path to the folder and press enter. This way, all subsequent operations will take place in this safe "sandbox" environment.
Step 2: Install and Start the Gemini CLI
In a terminal window, execute the following command:
npx https://github.com/google-gemini/gemini-cli
The npx
is a handy tool that will temporarily download and run the Gemini CLI
, without permanently installing it into your system. This is perfect for a first try or one-time use.
After successful installation, you will be prompted to select the interface color theme and asked to sign in through your Google account. In the terminal, you usually use the up and down arrow keys of the keyboard to select the options and press the Enter key to confirm. After completing the web authorization, you can see the prompt word input box, which means the installation is successful.
If a permanent installation is desired to allow for future installation with a simple gemini
command to start it directly, you can run the following command. This, however, requires an explanation of the npm
Some understanding of package management.
npm install -g @google/gemini-cli
For users who are not familiar with the English interface, you can use the Bob
The tool translates prompts in the terminal at any time, such as a stroke translation tool.
Basic Functional Applications: Local Documentation and Knowledge Management
Gemini CLI
s multimodal capabilities and file manipulation permissions make it excellent for working with local documents and images.
Document generation and analysis
Gemini CLI
The ability to invoke Google search to obtain information and generate new reports in conjunction with local documents. For example, it can be instructed to research specific topics and organize local Markdown files.
请使用 Google 搜索功能,查找关于‘量子计算最新突破’的资料,阅读我本地 /research/papers 目录下的相关文档,然后为我生成一份 Markdown 格式的综合报告,并存为 quantum_computing_report.md。
It is equally adept at analyzing, rewriting and summarizing existing documents. For example, rewriting a technical article into an easy-to-understand blog, or extracting key decisions and to-dos from meeting minutes.
根据 Andrej Karpathy 的《软件3.0》分享文章,将其改写成一篇约 800 字的博客文章,风格要求轻松有趣。然后,为这篇文章生成 3 个适合在 Twitter 上发布的推文版本,并附上 #AI #Tech 标签。
Obsidian Knowledge Base Automation
with regards to Obsidian
Users.Gemini CLI
can be a powerful knowledge base management tool. This is accomplished through the use of the Obsidian
Launching it from the root directory of the library enables deep processing of notes.
For example, it can be instructed to retrieve all the information about a particular topic (such as the MCP
) articles and generates an indexed note with two-way links for quick navigation and review.
检索当前文件夹下所有关于“MCP”的剪藏文章,生成一份新的 Markdown 文档。文档内容需使用无序列表总结每篇文章的核心观点,并在每条总结后附上指向原文的 Markdown 链接。
Obsidian
's Knowledge Graph feature relies on bi-directional links between notes. Adding links manually is a tedious task. Now, this task can be automated.Gemini CLI
The ability to analyze the titles and contents of all notes in a folder and automatically add two-way links to notes that are related, thus building a web-like knowledge structure.
分析当前文件夹下所有文档的标题和正文,为内容相关的文档批量添加双向链接,以便在 Obsidian 中生成知识图谱。
Image content recognition and processing
Based on its multimodal capabilities, theGemini CLI
The ability to "see" and understand the content of an image. This makes it possible to batch process local images. For example, it is possible to analyze a folder of confusingly named images and batch rename them according to their content.
分析当前文件夹下的所有图片,并根据每张图片的核心内容对其进行批量重命名。
Labeling images (generating descriptive text) is a key step in the training of AI mapping models.Gemini CLI
It is possible to automate this process by generating detailed description text for each image and saving it as the same name as the image .txt
file, which is fully compliant with the standard training set.
分析此文件夹中的所有图像,为每张图生成一段详细的描述(包括内容、风格、构图),并将描述文字存放在与图像同名的文本文件中。
System setup and file organization
Gemini CLI
The ability to execute system commands means it can create automated workflow scripts. Users can define their own "deep work modes" to close distracting applications, open work software, and adjust system volume with a single click.
创建一个名为 `deep_work.sh` 的 Shell 脚本。该脚本需执行以下操作:1. 打开 Obsidian;2. 关闭所有浏览器和通讯软件;3. 开启系统‘勿扰模式’;4. 播放我本地 `/music/focus` 文件夹中的白噪音。
Likewise, it can help organize messy folders by automatically creating subfolders and categorizing them based on file types.
在当前目录下新建“Images”和“Captions”两个文件夹,然后将所有的图片文件移动到“Images”,所有文本文档移动到“Captions”。
Advanced Applications: Driving Professional Command Line Tools
Gemini CLI
The real potential lies in its ability to act as a natural language interface to professional command line tools that are powerful but lack a graphical interface. This significantly lowers the bar for specialized tasks such as video processing, image editing and document conversion.
On macOS, most of these tools can be accessed through the Homebrew
(a popular package manager) to install. It is possible to make Gemini CLI
First help you install it:
请帮我安装 Homebrew 并配置好环境变量。
utilization ffmpeg
Enables professional-grade video editing
ffmpeg
is an open source audio and video processing framework that is the underlying core of many commercial video editing software. After installing it, complex video editing tasks can be done in natural language.
First, use the Gemini CLI
mounting ffmpeg
::
请使用 Homebrew 帮我安装 ffmpeg。
After the installation is complete, you can perform the following tasks:
- Add a watermark:
请用 ffmpeg 为视频 "input.mp4" 在右上角添加一个透明度为 10% 的文字水印,内容为 "guizang",并另存为新视频。
- Video to GIF:
请用 ffmpeg 将文件夹中的 "input.mp4" 转换为一个高品质的 GIF 动图。
- Replace the audio track:
请用 ffmpeg 将 "video.mp4" 和 "audio.mp3" 合并,确保音频长度与视频匹配,并在开头和结尾处添加淡入淡出效果。
- Extract sequence frames:
请用 ffmpeg 将 "video.mp4" 转换为 PNG 序列帧,并存放在一个新的文件夹中。
utilization yt-dlp
Download Online Video
yt-dlp
is a powerful online video download tool. By means of Gemini CLI
Install and use it to easily download specified videos and their covers.
请使用 Homebrew 安装 yt-dlp。
请使用 yt-dlp 下载这个视频链接 [此处粘贴链接] 以及它的高清封面。
utilization ImageMagick
Perform advanced image processing
ImageMagick
It is the image processing field of ffmpeg
. It is a feature-rich toolset for format conversion, scaling, cropping, filters, image stitching, and more.
Again, install first:
请使用 Homebrew 安装 ImageMagick。
Batch image processing is available after installation:
- Batch resize & add watermarks:
请使用 ImageMagick 将当前文件夹下所有图片的宽边统一调整为 800 像素,并添加一个灰色的“Internal Use Only”半透明水印,然后将处理后的图片保存在新文件夹中。
- Image stitching:
请用 ImageMagick 将处理过的四张图片拼合成一张 2x2 的四宫格图,图片之间保留白色分隔。
utilization Pandoc
Achieve universal document conversion
Pandoc
It is known as the "Swiss Army Knife" of document format conversion. It can play a huge role in dealing with different formats of office documents.
请使用 Homebrew 安装 Pandoc。
Once installed, it is easy to put Markdown
The file is converted to Word
document and retains most of the formatting.
请使用 Pandoc 将 "Andrej Karpathy 软件 3.0 分享.md" 这个 Markdown 文档转换为 Word (.docx) 格式。
Gemini CLI
s emergence validates an important trend: large-scale language models are becoming the universal interface that connects human intentions with complex machine instructions. Specialized tools that once shut out the average user due to their operational complexity are now becoming accessible through natural language.
This change is not only an increase in efficiency, but also a dissolution of the barriers to technology use. In this new interaction paradigm, the user's imagination, rather than his or her programming skills, will be the key to unlocking the potential of computing.