The recent high-profile GPT-5
After the rumored release of the model, its capabilities in the area of code generation, especially front-end development, became a community focus. Compared to its predecessor model, theGPT-5
Is the progress significant? And how will it compare to Claude 4.1
,Gemini 2.5 Pro
and other major competitors against it?
In this article, we will conduct an in-depth side-by-side review of the code capabilities of these three top big language models through a series of front-end tasks ranging from simple to complex, with the aim of exploring their real-world performances in terms of code generation quality, cue word compliance, user experience design understanding, and handling complex requirements.
Test 1: Bento Grid Stylized Layout
The first test focuses on Bento Grid
style static web page generation. This is a way to mimic the Apple launch PPT
style grid layout that emphasizes the asymmetry of the visual elements and the clarity of the information hierarchy. The test was designed to evaluate the model's effectiveness on modern UI
Understanding of design trends and CSS
Realization of capacity.
Core Cue Words:
基于下面产品介绍文章关键信息,帮我用类似苹果发布会PPT的Bento Grid风格的视觉设计生成一个中文动态网页展示,具体要求为:
1. 尽量在一页展示全部信息,背景为#F8F6F5、卡片背景为白色,文字颜色为#010101,高亮按钮和文字背景色为#F69AAC-DF95E3-7DBDE9 的渐变 ,卡片内的布局为主标题简短表述加大图标
2. 将 Markdown 格式的图片链接的图标放到合适的卡片中,防止图标跟文字重叠
3. 强调超大字体或数字突出核心要点,画面中有超大视觉元素强调重点,与小元素的比例形成反差
4. 网页需要以响应式兼容更大的显示器宽度比如1920px及以上
5. 中英文混用,中文大字体粗体,英文小字作为点缀
6. 简洁的勾线图形化作为数据可视化或者配图元素
7. 运用高亮色自身透明度渐变制造科技感,但是不同高亮色不要互相渐变
8. 数据可以引用在线的图表组件,样式需要跟主题一致
9. 使用HTML5、TailwindCSS 3.0+(通过CDN引入)和必要的JavaScript
10. 避免使用emoji作为主要图标
11. 不要省略内容要点
Review Results:
In this round.Gemini 2.5 Pro
The best performance was achieved by generating pages that were very precise in spacing, use of color, and adding light colored icons as accents for the best detailing.GPT-5
cap (a poem) Claude 4.1
The results were similar and fair, but the Claude 4.1
Failure to fully comply with the "Do not use Emoji
"The directive exposes some of its deficiencies in directive compliance.
Test 2: Public Cover Generation
The second test requires the model to generate a combination of WeChat public number cover images that conforms to the "minimalist grid" style. This task not only tests CSS
Layout ability, moreover, tests the model's ability to understand and reproduce abstract design styles, especially the precise control of details such as fonts, typography, and geometric elements.
Core Cue Words:
你是一位优秀的网页和营销视觉设计师,具有丰富的UI/UX设计经验,曾为众多知名品牌打造过引人注目的营销视觉,擅长将现代设计趋势与实用营销策略完美融合。
请使用HTML和CSS代码按照设计风格要求部分创建一个的微信公众号封面图片组合布局。我需要的设计应具有强烈的视觉冲击力和现代感。
## 基本要求: - **尺寸与比例**: - 整体比例严格保持为3.35:1 - 容器高度应随宽度变化自动调整,始终保持比例 - 左边区域放置2.35:1比例的主封面图 - 右边区域放置1:1比例的朋友圈分享封面 - **布局结构**: - 朋友圈封面只需四个大字铺满整个区域(上面两个下面两个) - 文字必须成为主封面图的视觉主体,占据页面至少70%的空间 - 两个封面共享相同的背景色和点缀装饰元素 - 最外层卡片需要是直角 - **技术实现**: - 使用纯HTML和CSS编写 - 如果用户给了背景图片的链接需要结合背景图片排版 - 严格实现响应式设计,确保在任何浏览器宽度下都保持16:10的整体比例 - 在线 CDN 引用 Tailwind CSS 来优化比例和样式控制 - 内部元素应相对于容器进行缩放,确保整体设计和文字排版比例一致 - 使用Google Fonts或其他CDN加载适合的现代字体 - 可引用在线图标资源(如Font Awesome) - 代码应可在现代浏览器中直接运行 - 提供完整HTML文档与所有必要的样式 - 最下方增加图片下载按钮,点击后下载整张图片
## 设计风格:极简格栅主义封面风格- 黑白极简风格:以纯黑背景和纯白内容区形成鲜明对比- 强烈的几何感:使用简洁的线条、方框和圆形等基础几何元素- 网格系统布局:遵循严格的网格排版规则,结构清晰有序- 留白有度:大量留白创造呼吸感,同时保持视觉重心- 摄影与排版结合:真实场景照片与极简排版形成互补- 工业风格装饰:细线箭头、指示线条等元素增添设计感- 微妙的色彩点缀:小面积绿色等强调色打破黑白单调文字排版风格- 大胆字号对比:核心标题极大化处理,形成主视觉- 几何式分割标题:将主标题分解成独立区块,增强辨识度- 纵横组合排版:文字既有横排也有竖排,创造韵律感- 字体粗细对比强烈:主标题采用超黑体,副文本则较为轻盈- 多层级信息排列:活动名称、日期、宣传语清晰分级- 严格的文字对齐:所有文字元素依循严格的网格对齐原则- 中英文混排:英文作为装饰性元素增添国际设计感视觉元素风格- 裁切的摄影图像:图片经过精心裁切,凸显主题- 指示性线条:箭头、曲线和直线作为引导性视觉元素- 框架式强调:使用方框、底色块等元素强调关键信息- 简洁图形符号:最小化的视觉符号传达核心信息- 构图对称与不对称并存:整体结构有序但细节处理不拘一格- 空间层次感:通过元素大小、位置创造前后层次关系- 数字图形化处理:日期数字被赋予视觉设计感
## 用户输入内容- 公众号标题为:[藏师傅暴论:AI工具尽头是生态|即梦AI创作者成长计划介绍]右侧文字为:”创作者生态”分为两行,居中对齐
Review Results:
GPT-5
Outperforms other models in this round. While not as aesthetically pleasing as Claude
, but it managed to render everything within the fixed canvas without overflow and added the necessary embellishment elements. In contrast, other models such as Gemini
, a problem in handling font size judgments under a fixed canvas, resulting in overlapping content that does not result in a usable design. This suggests that GPT-5
Advances in the understanding of spatial relationships under complex constraints.
Test III: Inventory management tools
This test requires the creation of an inventory management tool with a multi-functional front-end that includes item management, inbound and outbound operations, and data Kanban, and requires the use of the localStorage
Perform data persistence. This is a comprehensive task that tests the model to generate structured, fully functional single-page applications (SPA
) capacity.
Core Cue Words:
请帮我创建一个完整的网页版商品管理工具,具体要求如下: 功能需求
1. 商品管理 - 商品信息录入:商品名称、种类/分类、SKU编号、价格、库存数量 - 商品图片管理:支持图片上传预览(可用文件选择器模拟) - 商品列表展示:表格形式展示所有商品,支持搜索和筛选 - 商品编辑:支持修改商品信息 - 商品删除:支持删除商品(需确认提示)
2. 库存管理 - 入库操作:增加商品库存数量,记录入库时间和数量 - 出库操作:减少商品库存数量,记录出库时间和数量 - 库存记录:显示每个商品的库存变动历史
3. 界面功能 - 仪表板:显示商品总数、库存总值、低库存预警等统计信息 - 响应式设计:适配桌面和移动设备 - 数据持久化:使用localStorage保存数据 技术要求 样式和图标 - CSS框架:使用 TailwindCSS 3.0+ CDN引入 - 图标库:使用 Heroicons 或 Feather Icons CDN引入 - 字体:使用 Google Fonts 代码结构 - 单页面应用:HTML + CSS + JavaScript - 模块化设计:将功能分解为不同的JavaScript模块 - 数据格式:使用JSON格式存储商品数据 界面设计要求 - 现代化UI:简洁美观的界面设计 - 颜色方案:使用专业的商务色彩搭配 - 交互反馈:按钮点击、表单验证等交互效果 - 表单验证:必填字段验证、数据格式验证 数据结构示例 请生成完整的HTML文件,包含所有必要的CSS和JavaScript代码,确保功能完整且可以直接在浏览器中运行。
Review Results:
Claude 4.1
It performed best in this test. It generated a logical and clear multi-page layout, with each function corresponding to a separate page, with a regular style that is intuitive to the user.
GPT-5
Inherits its predecessor model GPT-4o
of a problem: the tendency to stack all functionality on the same page, possibly due to contextual window limitations that make it impossible to efficiently plan complex application structures. This practice seriously harms the user experience.
Gemini
Although multiple pages are implemented, the resulting interface is less aesthetically pleasing and the interaction design is less intuitive when strict style constraints are not imposed.
Test 4: Drag-and-Drop BI Canvas
The fourth test requires the creation of a custom dashboard (BI canvas) that supports drag-and-drop, resizing, and state saving. This task was designed to evaluate the model's ability to handle complex JavaScript
Interaction,DOM
operations and the ability to integrate third-party libraries such as drag-and-drop libraries.
Core Cue Words:
网格布局系统: 支持响应式网格,小部件可以自动对齐和调整大小。
小部件库: 提供一个可供选择的小部件列表,例如图表(折线图、饼图)、数据卡片、任务列表、时钟等。
拖放功能: 用户可以直观地拖动小部件来改变其在仪表盘上的位置。
调整大小: 用户可以拖动小部件的边缘来调整其尺寸。
状态保存: 仪表盘的布局和小部件配置(例如,图表的数据源)可以被保存(例如在 LocalStorage 或后端服务器中),以便用户下次访问时恢复。
添加/删除小部件: 用户可以从库中添加新的小部件,或关闭不再需要的小部件。
专业的用户界面: 简洁、现代的设计,清晰的网格线和占位符提示,流畅的拖放动画。
Review Results:
Surprisingly, on a task that many homegrown models are capable ofClaude 4.1
cap (a poem) Gemini 2.5 Pro
None were successful.Claude 4.1
The dragging feature is flawed and not resizable, while the Gemini
Again, resizing is not possible.
GPT-5
However, it successfully implements all the core functions, including drag and drop and resize, and the default style is more beautiful and outstanding.
Test 5: Luxury Website Checkout Process
The final test was a highly complex task: creating a high fidelity shopping cart and three-step checkout process for a luxury website. UI
and provides extremely detailed specifications for colors, fonts, layouts, and interactions. This comprehensively tests the model's ability to follow complex, refined design constraints.
Core Cue Words:
角色与目标——担任高级 UX/UI 设计师。为 “VELLORA” 在线商店(奢华而平易近人的时尚与配饰)创建高保真的桌面网页。包含:精致的购物车页面,以及独立的结账流程(3 步:送货 • 付款 • 审核/确认)。
【Quiet Luxury · 石墨中性】——色板(Hex)- 页面背景:#F7F7F7(浅灰);内容卡片:#FFFFFF- 主文本:#222426;次文本:#6B6E73;强调近黑:#0E0E0F- 品牌点缀(选其一并全站统一使用):#9AA18E(Sage 鼠尾草)或 #8E7C6D(Mocha 摩卡)- 分隔/描边:#E7E7E7(1px 细线);发丝线可用 rgba(0,0,0,.06)字体- H1/H2:优雅衬线 **Newsreader**(备选:Cormorant Garamond)- UI 文本:几何无衬线 **Manrope**(备选:Inter)- 数字/价格允许使用等宽样式(Manrope Tabular)圆角与阴影- 圆角:按钮与输入 12px;卡片/模态 16px- 阴影:s-sm:0 1px 2px rgba(17,17,19,.06);s-md:0 8px 24px rgba(17,17,19,.08)(浮层/模态使用)- 分隔以 1px 线为主,阴影少量点缀,整体克制屏幕与关键布局(保持原功能/流程,按下述结构出高保真)——
1) 购物车界面(桌面 & 移动)- 桌面布局: · 左栏:购物车商品列表(此处省略长表格)。每张商品卡片包含:缩略图、商品名称、颜色/尺码、单价、数量步进 (+/–)、“稍后保存”、移除 (×) · 右栏(置顶):订单摘要卡片(小计、预计运费、税费/关税、优惠码输入与验证、总计),主要 CTA「去结账」,支持“继续购物”次级链接- 移动布局: · 列表纵向滚动;摘要卡片固定底部(安全区内),显示总计与主要 CTA · 数量步进与移除操作就地完成,避免跳层 · 优惠码折叠,点击展开输入
2) 结账流程(3 步)——送货 • 付款 • 审核/确认- 通用 · 顶部步骤条(当前高亮,已完成打勾,可点击返回修改) · 表单分组拥有清晰分隔标题与说明文本;错误就地提示 · 支持“返回购物车”与“继续下一步”双按钮(主要/次要层级明确)- 步骤一:送货(Shipping) · 字段:收件人、电话、邮箱、国家/地区(联动省市区)、地址1/2、邮编;发票与备注(可选) · 运送方式卡片:标准/加急/当日(价格与预计到达时间),选择后实时更新摘要- 步骤二:付款(Payment) · 方式:信用卡、支付宝/微信;卡片信息实时掩码;账单地址同收货地址的复选 · 安全与合规说明(小字)- 步骤三:审核/确认(Review & Confirm) · 汇总:收货信息、配送方式、付款方式尾号、商品清单与金额;可就地编辑返回相应步骤 · 同意条款复选;下单 CTA;下单后显示订单号与下一步指
Review Results:
GPT-5
The performance in this round is stunning. It followed all design specifications almost perfectly, with impeccable content hierarchy, page consistency and responsive design, demonstrating strong command adherence.
Claude 4.1
The performance is fair, but there are layout issues in the payment process that fail to fully utilize the horizontal space, resulting in too narrow input boxes and a lack of form validation.
Gemini
The result is quite bad, with styles almost completely lost, and the resulting page is like a page that hasn't been through the CSS
Rendered "rough house".
Conclusion and analysis
Taken together.GPT-5
The capabilities in front-end code generation are comparable to those of the OpenAI
Previous models have taken a qualitative leap forward, especially in following complex style constraints and implementing complex JavaScript
Interaction is excellent.
However.GPT-5
Not without shortcomings. On tasks that require understanding and planning complex application architectures (e.g., inventory management tools), it still does not perform as well as the Claude 4.1
. More importantly, it is alleged that Plus
The user version is only available 32K
context window, which becomes its Achilles' heel.
32K
The context of the model means that the model quickly "forgets" the previous dialog after a single generation. For programming tasks that require repeated debugging, modification, and iteration, this limitation greatly reduces their usability when compared to a model with a longer context window. Gemini
and other models are at a competitive disadvantage.