Current Position:fig. beginning " AI knowledge

GEO: Generation Engine Optimization

2025-07-23

summaries

The emergence of large-scale language models (LLMs) has opened up a new paradigm of search engines that use generative models to gather and summarize information to answer user queries. We unify this emerging technology under the umbrella of Generative Engines (GEs), which can generate accurate and personalized responses, rapidly replacing traditional search engines such as Google and Bing.Generative Engines typically satisfy queries by combining information from multiple sources and summarizing it using LLMs. While this shift has significantly increased user utility and traffic to generative search engines, it poses a significant challenge to a third stakeholder - websites and content creators. Given the black-box and rapidly changing nature of generation engines, content creators have little control over when and how their content is displayed. As generation engines become more popular, we must ensure that the creator economy is not disadvantaged as a result. To this end, we introduce Generation Engine Optimization (GEO), the first new paradigm to help content creators improve the visibility of their content in Generation Engine responses by optimizing and defining visibility metrics through a flexible black-box optimization framework. We facilitate systematic evaluation by introducing the GEO-bench, a large-scale benchmark containing diverse user queries from multiple domains, as well as relevant web resources for answering these queries. Through rigorous evaluation, we demonstrate that GEO can improve the visibility in the responses of the generation engine by as much as 401 TP3 T. In addition, we show the effectiveness of these strategies across different domains, highlighting the need for domain-specific optimization approaches. Our work opens a new frontier for information discovery systems with far-reaching implications for generation engine developers as well as content creators.

CCS concept

- computational methodology → natural language processing; machine learning;
- information systems → Web search and information discovery.

byword

Generating models, search engines, datasets and benchmarks

ACM Reference Format

Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, and Ameet Deshpande. 2024. geo: generation engine optimization. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '24), August 25-29, 2024, Barcelona, Spain. ACM, New York, NY, USA, 12 pp. https://doi.org/10.1145/3637528.3671900

1 Introduction

The invention of traditional search engines thirty years ago revolutionized the way information is accessed and disseminated globally [4]. Although they were powerful and spawned numerous applications such as academic research and e-commerce, they were limited to providing a list of relevant websites for user queries. However, the recent success of large-scale language models [5, 21] has paved the way for better systems such as BingChat, Google's SGE, and perplexity.ai, which combine traditional search engines with generative models. We refer to these systems as generative engines (GEs) because they search for information by using multiple sources and generate multimodal responses. Technically, generative engines (see Figure 2) retrieve relevant documents from databases (e.g., the Internet) and use large neural network models to generate source-based responses that ensure attribution and provide users with a way to verify information.

GEO: Generation Engine Optimization-1 Figure 1: Our proposed Generative Engine Optimization (GEO) approach optimizes websites to improve their visibility in generative engine responses. GEO's black-box optimization framework then enables pizza website site owners who otherwise lack visibility to optimize their websites to improve their visibility under generation engines. Additionally, GEO's universal framework allows content creators to define and optimize their custom visibility metrics, thereby gaining greater control in this emerging paradigm.

The utility of generation engines for developers and users is obvious - users can access information faster and more accurately, and developers can craft precise and personalized responses, leading to increased user satisfaction and revenue. However, generation engines work against a third stakeholder - websites and content creators. Unlike traditional search engines, generation engines eliminate the need to navigate to a website, potentially reducing a website's organic traffic and impacting its visibility by delivering accurate and comprehensive responses directly [16]. As millions of small businesses and individuals rely on online traffic and visibility for their livelihood, generation engines will significantly disrupt the creator economy. Furthermore, the black-box and proprietary nature of generation engines makes it difficult for content creators to control and understand how their content is ingested and presented.

In this paper, we present the first generic creator-centric framework for optimizing content in generation engines, which we call Generation Engine Optimization (GEO), to help content creators navigate through this new search paradigm.GEO is a flexible black-box optimization framework for optimizing the visibility of web content in proprietary and closed-source generation engines (Fig. 1).GEO ingests the source website and outputs an an optimized version that increases visibility in the generation engine by adjusting and calibrating presentation, text styles, and content.

In addition, GEO introduces a flexible framework for defining visibility metrics specifically designed for generation engines, as the concept of visibility in generation engines is more subtle and multifaceted than in traditional search engines (Fig. 3). While average ranking is a good visibility measure on the response page of a traditional search engine, which presents a linear list of websites, this does not apply to generation engines. Generation engines provide rich, structured responses and embed websites as inline references within the response, often embedding them at different lengths, in different locations, and in different styles. This requires visibility metrics designed specifically for generation engines, which measure the visibility of the citation source across multiple dimensions, such as the relevance and impact of the citation on the query, through both objective and subjective perspectives.

GEO: Generation Engine Optimization-2 Figure 2: Overview of the generative engine. A generation engine consists mainly of a set of generative models and a search engine for retrieving relevant documents. The generation engine takes a user query as input and goes through a series of steps to generate a final response, which is based on the retrieved sources with inline attribution.

In order to facilitate a faithful and extensive evaluation of the GEO methodology, we present the GEO-bench, a benchmark containing 10,000 queries from multiple domains and sources, adapted to the generation engine.

Through systematic evaluation, we demonstrate that our generation engine optimization approach can improve the visibility of different queries by up to 401 TP3T, providing a useful strategy for content creators. In addition to this, we find that including citations, citations of related sources, and statistics can significantly improve source visibility by more than 401 TP3T across various queries.We also demonstrate the effectiveness of the generation engine optimization on Perplexity.ai, a real generation engine, and show visibility improvements of up to 371 TP3T.

In short, our contribution is threefold:

We present Generation Engine Optimization, the first general-purpose optimization framework to help website owners optimize their sites for generation engines. Generation Engine Optimization can increase a website's visibility by up to 40% on a wide range of queries, domains, and real black-box generation engines.
Our framework proposes a comprehensive set of visibility metrics specifically designed for generation engines and gives content creators the flexibility to optimize their content with custom visibility metrics.
To facilitate the faithful evaluation of GEO methods in generative engines, we present the first large-scale benchmark containing diverse search queries from a wide range of domains and datasets, specifically tailored for generative engines.

2 Formulation and Research Methodology

2.1 Formulation of the generation engine

Although numerous generation engines have been deployed to millions of users, there is no standardized framework. We provide a formulation that can be adapted to the various modular components of its design. We describe a generation engine that includes several back-end generation models and a search engine for source retrieval. The generation engine (GE) accepts user queries q_u and returns a natural language response rwhich P_U represents personalized user information. the GE can be represented as a function:

f_GE := (q_u, P_U) → r

The generation engine contains two key components: a.) A set of generative models G = {G₁, G₂, …, G_n}, each model has a specific purpose, such as query rewriting or summarization, b.) a search engine SEGiven a query qIt returns a set of sources S = {s₁, s₂, …, s_m} We show a representative workflow in Figure 2, which at the time of writing is very similar to the design of BingChat. This workflow decomposes an input query into a set of simpler queries that are more easily consumed by search engines. Given a query, a generative model for query rewriting G₁ = G_qr Generate a set of queries Q¹ = {q₁, q₂, …, q_n}, and then pass those queries to the search engine SE to retrieve a set of ranked sources S = {s₁, s₂, …, s_m} Source set S is passed to a summarization model G₂ = G_sumIt generates a summary for each source Sum_j, thus producing the summary set (Sum = {Sum₁, Sum₂, …, Sum_m}). The summary set is passed to a response generation model G₃ = G_respIt generates a cumulative response supported by the source r. In this work, we focus on single-round generation engines, but the formulation can be extended to multi-round dialog generation engines (Appendix A).

responsive r It is usually a structured text with embedded references. Given the tendency of LLMs to produce phantom information [10], references are important. Specifically, consider a text consisting of the sentence {l₁, l₂, …, l_o} Composition response r. Each sentence may be supported by a set of citations, which are the set of retrieved documents C_i ⊂ S part of the response. An ideal generation engine should ensure that all statements in the response are supported by relevant citations (high citation recall) and that all citations accurately support the statements they are associated with (high citation precision) [14]. We refer the reader to Figure 3 for a representative generation engine response.

2.2 Generation engine optimization

The invention of search engines led to Search Engine Optimization (SEO), a process that helps website creators optimize their content for higher search engine rankings. The higher the ranking, the higher the visibility and website traffic. However, traditional SEO methods are not directly applicable to generation engines. This is because, unlike traditional search engines, the generative modeling in generative engines is not limited to keyword matching, and the use of language modeling in ingesting source documents and response generation leads to a more nuanced understanding of text documents and user queries. As generative engines are rapidly becoming the dominant information delivery paradigm to which SEO is not directly applicable; new techniques are needed. To this end, we propose Generation Engine Optimization, a new paradigm where content creators aim to increase their visibility (or impression) in generation engine responses. We do this through the function Imp(c_i, r) Define the site (also known as a citation)c_i In the generated response r in the visibility, which is what the website creator wants to maximize. From the perspective of the generation engine, the goal is to maximize the visibility of the references that are most relevant to the user's query, i.e., to maximize Σ_if(Imp(c_i, r), Rel(c_i, q, r)), where Rel(c_i, q, r) Measurement in response to r Cited in the context of the c_i and inquiries q The correlation between the f determined by the exact algorithmic design of the generation engine and is a black-box function for the end user. In addition, the function Imp cap (a poem) Rel are subjective and not yet clearly defined for the generation engine, we will define them next.

2.2.1 Impressions of the generation engine

For SEO, a site's impressions (or visibility) are determined by its average ranking on multiple queries. However, the output characteristics of generation engines require different impression metrics. Unlike search engines, generation engines combine information from multiple sources in a single response. The length, uniqueness, and presentation of the citing site determine the true visibility of the citation. Therefore, as shown in Figure 3, while a simple ranking on the response page serves as a valid metric for impressions and visibility in traditional search engines, such a metric does not apply to generation engine responses.

GEO: Generation Engine Optimization-3 Figure 3: In traditional search engines, ranking and visibility metrics are straightforward; they list website sources in rank order and display content verbatim. However, generative engines generate rich, structured responses, often embedding citations in a block, interleaved with each other. This makes ranking and visibility subtle and multifaceted. In addition, unlike search engines, where a great deal of research has been conducted to improve visibility, optimizing visibility in generation engine responses remains unclear. To address these challenges, our black-box optimization framework proposes a set of well-designed impression metrics that creators can use to measure and optimize their site's performance, and allows creators to define their impression metrics.

To address this challenge, we propose a set of impression metrics designed with three key principles in mind: 1.) they should be relevant to creators, 2.) they should be interpretable, and 3.) they should be easily understood by a wide range of content creators. The first such metric is the "word count" metric, which is the normalized number of words in a sentence associated with a quote. Mathematically, this is defined as:

Imp_wc(c_i, r) = (Σ_s∈Sci |s|) / (Σ_s∈Sr |s|)

here are S_ci is a reference c_i The collection of sentences ofS_r is the set of sentences in the response,|s| is the number of words in the sentence. In the case where a sentence is cited by multiple sources, we distribute the word count equally among all citations. Intuitively, the more words there are, the more important a role the source plays in the answer, and therefore the more exposure the user has to that source.

However, since the "word count" is not affected by citation ranking (e.g., whether it appears in the first place or not), we propose a position-adjusted count that reduces the weight by an exponential decay function of the citation position:

Imp_Pwc(c_i, r) = (Σ_s∈Sci |s| - e^{-(pos(s)/|S|)}) / (Σ_s∈Sr |s|)

Intuitively, the further forward a sentence appears in a response, the more likely it is to be read, and the definition of Imp_Pwc The index term in gives a higher weight to such a citation. Thus, despite having fewer words, a site citation at the top may have a higher impression than one in the middle or at the end. Furthermore, the choice of an exponential decay function was inspired by several studies showing that click-through rates as a function of search engine rankings follow a power law [7, 8]. While the aforementioned impression metrics are objective and well-founded, they ignore the subjective aspect of citations on user attention. To address this issue, we propose a "subjective impression" metric that incorporates factors such as relevance, citation impact, uniqueness of the citation's presentation, subjective position, subjective count, probability of clicking on the citation, and diversity of the presentation. We measure these sub-indicators using G-Eval [15], which is the most current technique for evaluating LLMs.

2.2.2 Generation engine optimization methods for websites

In order to improve impression metrics, content creators must make changes to their website content. We propose several generation engine-independent strategies called Generation Engine Optimization (GEO) methods. Mathematically, each GEO method is a function f: W → W'_iwhich W is the initial site content.W' is what is modified after the GEO method is applied. Modifications range from simple style changes to adding new content in a structured format. A well-designed GEO is equivalent to a black-box optimization method that improves the visibility of a website without knowing the exact algorithmic design of the generating engine, and makes modifications to text independent of the exact query.

In our experiments, we apply a Generation Engine Optimization (GEO) method to website content using a large-scale language model and prompt it to perform specific style and content changes to the website. In particular, the source content is modified accordingly based on GEO methods that define a specific set of desired features. We propose and evaluate the following methods:

Authority:Modify the text style of the source content to make it more persuasive and authoritative.
Add statistics:Modify content to include quantitative statistics rather than using qualitative discussions wherever possible.
Keyword stuffing:Modify the content to include more keywords from the query, as expected in classic SEO optimization.
Cited sources & 5. Add a citation:Add relevant citations and quotations from credible sources, respectively.
Easy to understand:Simplify the language of the site, and 7. Fluidity OptimizationImproved the flow of the site's text.
unique vocabulary & 9. Technical terms:Add unique and technical terms wherever possible.

These methods cover a diverse range of generic strategies that can be implemented quickly by website owners and can be used regardless of website content. In addition, with the exception of Methods 3, 4, and 5, the remaining methods enhance the presentation of existing content to make it more persuasive or attractive to the generating engines without the need for additional content. On the other hand, methods 3, 4, and 5 may require some form of additional content. To analyze the performance improvement of our methods, for each input user query, we randomly select a source website to be optimized and apply each GEO method separately to the same source. For more details on the GEO methods, the reader is referred to Appendix B.4.

3 Experimental setup

3.1 Evaluation of the generation engine

Based on previous work [14], we use a 2-step setup for generative engine design. The first step involves fetching the relevant sources for the input query, and then the second step is for the LLM to generate the response based on the fetched sources. Similar to previous work, we do not use summarization and provide the entire response for each source. Due to the context length limitation and the cost of secondary scaling of the context size based on the transformer model, only the first 5 sources are fetched from the Google search engine for each query. This setup is very similar to the workflow used in previous work and the general design adopted by commercial GEs such as you.com and perplexity.ai. The answers were then generated using the gpt3.5-turbo model [20] using the same prompts as in previous work [14]. We sampled five different responses with temperature = 0.7 to minimize statistical bias.

In Section C.1, we evaluate the same generation engine optimization approach on Perplexity.ai, a commercially deployed generation engine, highlighting the generality of our proposed generation engine optimization approach.

3.2 Benchmark: GEO-bench

Since there is no publicly available dataset containing queries related to the generation engine, we curated GEO-bench, a benchmark containing 10K queries from multiple sources re-purposed for the generation engine, as well as synthesized queries. The benchmark includes queries from nine different sources, each of which is further categorized according to its target domain, difficulty, query intent, and other dimensions.

Dataset: 1. MS Macro, 2. ORCAS-1, and 3. Natural Issues:[1, 6, 13] These datasets contain real anonymized user queries from the Bing and Google search engines. Together, these three datasets represent commonly used dataset sets in search engine related research. However, instead of searching for them, the generating engine will be confronted with more difficult and specific queries that aim to synthesize answers from multiple sources. For this reason, we reuse several other publicly available datasets: 4. AllSouls:This dataset contains questions on papers from "All Souls College, Oxford". The queries in this dataset require the generation engine to perform appropriate reasoning to aggregate information from multiple sources.5. LIMA:[25] contains challenging questions that require the generation engine to not only aggregate information but also perform appropriate reasoning to answer the question (e.g., writing a short poem, python code).6. Davinci-Debate [14] Contains debate questions generated to test the generation engine.7. Perplexity.ai Discover2:These queries are derived from the Discover section of Perplexity.ai, which is an updated list of popular queries on the platform.8. ELI-5³::This dataset contains questions from the ELI5 subreddit version where users ask complex questions and expect answers in simple, plain language.9. GPT-4 generated queries:To complement the diversity of query distributions, we prompted GPT-4 to generate queries from different domains (e.g., scientific, historical) and to generate queries based on query intent (e.g., navigational, transactional) and the difficulty and scope of generating responses (e.g., open-ended, fact-based).

Our benchmark consists of 10K queries divided into 8K, 1K and 1K training, validation and testing splits. We retain the real-world query distribution and our benchmark contains 80% informational queries and 10% transactional and navigational queries. Each query is augmented with cleaned text content from the top 5 search results obtained from the Google search engine.

Tags:Optimizing website content usually requires targeted changes based on the domain of the task. In addition, generation engine optimization users may need to determine appropriate strategies for only a portion of the query, taking into account multiple factors such as domain, user intent, and query nature. To facilitate this, we labeled each query using the GPT-4 model and manually verified high recall and precision on the test splits.

Overall, the GEO-bench contains queries from 25 different domains, such as art, health, and gaming; it has a range of query difficulty from simple to multifaceted; includes nine different types of queries, such as informational and transactional; and covers seven different classifications. Due to its specifically designed high diversity, the size of the benchmark, and its real-world nature, GEObench is a comprehensive benchmark for evaluating generative engines and serves as a standard testbed for evaluating them for a variety of purposes in this and future work. We provide more details about GEO-bench in Appendix B.2.

3.3 GEO methodology

We evaluate nine different GEO methods described in Section 2.2.2. We compare them to a baseline, which measures impression metrics from unmodified web sources. We evaluate the methods on the full GEO-bench test split. In addition, to reduce variance in the results, we conduct the experiment under five different randomization seeds and report the mean.

3.4 Assessment of indicators

We utilize the impression metrics defined in Section 2.2.1. Specifically, we use two impression metrics: 1. Number of words after repositioning, which combines word counts and position counts. To analyze the effects of individual components, we also report scores on the two submetrics separately.2. subjective impressionThis is a subjective metric that encompasses seven different aspects: 1) the relevance of the cited sentence to the user's query, 2) the impact of the citation, which evaluates how much the generated response depends on the citation, 3) the uniqueness of the citation's rendered material, 4) the subjective location, which measures the prominence of the source's location from the user's point of view, 5) the subjective count, which measures the amount of content presented by the citation as perceived by the user, 6) the likelihood that the user clicks on the likelihood of citation, and 7) diversity of presented material. These sub-metrics assess different aspects that content creators can target to improve effectiveness in one or more areas. Each sub-indicator was assessed using the GPT-3.5 in a manner similar to that described in G-Eval [15]. In G-Eval, a form-based assessment template is provided to the language model, along with a GE-generated response with citations. The model outputs a score for each citation (computed through multiple sampling). However, since G-Eval scores are poorly calibrated, we normalize them to have the same mean and variance as position-adjusted word counts for fair and meaningful comparisons. We provide the exact template in Appendix B.3.

In addition, all impression metrics were normalized by multiplying them by a constant factor so that the sum of all referenced impressions in the response equaled 1. In our analysis, we compared methods by calculating the relative improvement in impressions. For the initially generated response r From the source S_i ∈ {s₁, …, s_m} and modified response r'Each source s_i The relative improvement in impression was measured as:

Improvement_si = (Imp_si(r') – Imp_si(r)) / Imp_si(r) × 100

Modified response r' is by applying the GEO methodology being evaluated to one of the sources s_i Generated. The chosen source of optimization is chosen randomly, but remains constant across all GEO methods for a given query.

4 Results

We evaluated a variety of generation engine optimization methods designed to optimize website content to improve visibility in generation engine responses, compared to a baseline without optimization. Our evaluation used GEO-bench, a diverse benchmark of user queries from multiple domains and settings. Performance is measured by two metrics: position-adjusted word count and subjective impression. The former takes into account word count and citation position in the GE response, while the latter computes multiple subjective factors to give an overall impression score.

Table 1: Absolute impression metrics of GEO methods on GEO-bench.

methodologies	Number of words after repositioning			subjective impression
methodologies	number of written characters	placement	population (statistics)	(statistics) correlation	affect (usually adversely)	distinctive	variegation	(dialect) remarry	placement	reckoning	on average
Performance without generation engine optimization
no optimization	19.5	19.3	19.3	19.3	19.3	19.3	19.3	19.3	19.3	19.3	19.3
Non-performance generating engine optimization methods
Keyword Filling	17.8	17.7	17.7	19.8	19.1	20.5	20.4	20.3	20.5	20.4	20.2
unique vocabulary	20.7	20.5	20.5	20.5	20.1	19.9	20.4	20.2	20.7	20.2	20.4
Optimization Methods for High Performance Generation Engines
easy grasp	22.2	22.4	22.0	20.2	21.0	20.0	20.1	20.1	20.9	19.9	20.5
(having) authority	21.8	21.3	21.3	22.3	22.1	22.4	23.1	22.2	23.1	22.7	22.9
technical term	23.1	22.7	22.7	20.9	21.7	20.5	21.2	20.8	21.9	20.8	21.4
Fluidity Optimization	25.1	24.6	24.7	21.1	22.9	20.4	21.6	21.0	22.4	21.1	21.9
Cited sources	24.9	24.5	24.6	21.4	22.5	21.0	21.6	21.2	22.2	20.7	21.9
Add by reference	27.8	27.3	27.2	23.8	25.4	23.9	24.4	22.9	24.9	23.2	24.7
Statistical data addition	25.9	25.4	25.2	22.5	24.5	23.0	23.3	21.6	24.2	23.0	23.7

Table 1 details the absolute impression metrics of the different methods on several metrics. The results show that our GEO methods consistently outperform the baseline on GEObench on all metrics. This demonstrates the robustness of these methods to different queries, achieving significant improvements despite query diversity. Specifically, our best-performing methods, Citing Sources, Citation Addition, and Statistical Data Addition, achieve relative improvements of 30-401 TP3T on the position-adjusted word count metric and 15-301 TP3T on the subjective impression metric. These methods, including adding relevant statistics to website content (stats add), incorporating credible citations (citation add), and including citations from credible sources (citation sourcing), required minimal changes but significantly improved the visibility of GE responses and enhanced the credibility and richness of the content.

Interestingly, stylistic changes such as improving the fluency and readability of the source text (Fluency Optimization and Ease of Understanding) also led to significant visibility gains 15-30%. this suggests that the generation engine values not only the content but also the presentation of information.

Furthermore, given that generative models are typically designed to follow instructions, one would expect that a more persuasive and authoritative tone in website content would improve visibility. However, we found no significant improvement, suggesting that the generative engine has become somewhat robust to such changes. This highlights the need for website owners to focus on improving content presentation and credibility.

Finally, we evaluated keyword stuffing, which involves adding more relevant keywords to a site's content. Although widely used in search engine optimization, we found little improvement in the response of this approach to generative engines. This emphasizes the need to rethink optimization strategies in generative engines, as techniques that work in search engines may not translate to success in this new paradigm.

5 Analysis

5.1 Domain-Specific Generation Engine Optimization

In Section 4, we presented the improvements achieved by GEO across the GEO-bench benchmarks. However, in real-world SEO scenarios, domain-specific optimizations are usually applied. With this in mind, and given that we provide categorizations for each query in GEO-bench, we delve deeper into the performance of the various GEO methods in these categorizations.

Table 3 provides a detailed categorization showing the areas where our GEO approach proved most effective. A closer analysis of these results reveals several interesting observations. For example, authority significantly improves performance on debate-style questions and queries related to the "history" domain. This is consistent with our intuition that a more persuasive form of writing may be more valuable in debates.

GEO: Generation Engine Optimization-4 Figure 4: Relative improvement using a combination of GEO strategies. Using fluency optimization in combination with statistics addition results in the greatest performance. The right column shows that using fluency optimization in combination with other strategies is the most beneficial.

Similarly, adding citations by citing sources is particularly beneficial for factual issues, possibly because citations provide a source of validation for the facts presented, thus enhancing the credibility of the response. Different GEO methods work differently in different domains. For example, as shown in row 5 of Table 3, questions such as "Law and Government" and "Opinion" type questions benefit from the addition of relevant statistics to the content of the website, as implemented by the addition of statistics. This suggests that data-driven evidence can improve the visibility of a website in a particular context. The citation addition approach was most effective in the areas of People and Society, Interpretation, and History. This may be because these domains often involve personal narratives or historical events, and direct quotes can add authenticity and depth to the content. Overall, our analysis suggests that website owners should strive to target their websites for greater visibility.

5.2 Optimization of multiple websites

In the evolving environment of generative engines, the GEO approach is expected to be widely adopted, resulting in all source content being optimized using GEO. To understand the impact, we evaluated the GEO approach by simultaneously optimizing all source content, and the results are presented in Table 2.A key observation is that the impact of GEO on a website varies depending on its search engine results page (SERP) ranking. Notably, low-ranking websites, which typically struggle with visibility, benefit more from GEO. This is because traditional search engines rely on a variety of factors, such as number of backlinks and domain presence, which can be difficult for smaller creators. However, since generative engines utilize generative models conditional on website content, factors such as backlink building should not put small creators at a disadvantage. This can be seen in the relative improvement in visibility shown in Table 2. For example, the citation source method resulted in a significant increase in visibility of 115.11 TP3T for the fifth-ranked site in the SERPs, while on average, the first-ranked site experienced a decrease in visibility of 30.31 TP3T.

Table 2: Changes in visibility through the GEO method, for different ranked sources.GEO is particularly helpful for low-ranked sites.

methodologies	Relative improvement in visibility (%)
methodologies	Rank 1	Rank 2	Rank 3	Rank 4	Rank 5
(having) authority	-6.0	4.1	-0.6	12.6	6.1
Fluidity Optimization	-2.0	5.2	3.6	-4.4	2.2
Cited sources	-30.3	2.5	20.4	15.5	115.1
Add by reference	-22.9	-7.0	3.5	25.1	99.7
Statistical data addition	-20.6	-3.9	8.1	10.0	97.9

This finding highlights GEO as a tool to democratize the digital space. Many low-ranking websites are created by small content creators or independent businesses that have traditionally struggled to compete with larger organizations for top search engine results. The emergence of generative engines may initially seem to work against smaller entities. However, applying a GEO approach offers these content creators an opportunity to significantly increase their visibility in the generation engine responses. By enhancing their content with GEO, they can reach a broader audience, leveling the playing field and allowing them to compete more effectively with larger organizations.

5.3 Combination of GEO strategies

While individual GEO strategies show significant improvements across domains, in practice, website owners are expected to employ multiple strategies simultaneously. To investigate the performance improvements achieved by combining GEO strategies, we considered a combination of the top 4 best performing GEO approaches, namely citation sourcing, fluency optimization, stats addition, and citation addition. Figure 4 shows a heat map of the relative improvements achieved by combining different GEO strategies. The analysis shows that the combination of generation engine optimization methods improves performance, with the best combination (fluency optimization and statistics addition) outperforming any single GEO strategy by more than 5.51 TP3T. In addition, while relatively ineffective when used alone (81 TP3T lower than citation addition), citation sourcing significantly improves performance when used in combination with other methods (average: 31.41 TP3T). These findings emphasize the importance of examining combinations of GEO methods as they are likely to be used by real-world content creators.

5.4 Qualitative analysis

We perform a qualitative analysis of the GEO methods in Table 4, containing some representative examples of GEO methods that improve source visibility with minimal changes. Each method optimizes the source by appropriate text additions and deletions. In the first example, we see that simply adding a statement to the source significantly improves visibility in the final answer, requiring minimal effort on the part of the content creator. The second example shows that adding as much relevant statistics as possible ensures improved visibility of the source in the final generation engine response. Finally, the third line shows that emphasizing only parts of the text and using a persuasive text style can also lead to improved visibility.

6 GEO in the real world: experiments with a deployed generation engine

To strengthen the efficacy of our proposed generation engine optimization approach, we evaluated it on Perplexity.ai, a deployed generation engine with millions of active users. Table 5 shows the results. As with our generation engine, citation addition performs best on position-adjusted word count, which is 221 TP3T higher than the baseline.Methods that perform well on our generation engine, such as citation sourcing, and stats addition, show improvements of up to 91 TP3T and 371 TP3T on both metrics. Our observation that traditional SEO methods such as keyword stuffing are ineffective further highlights this as it is 101 TP3T lower than the baseline.These results are important for three reasons: 1) they emphasize the importance of developing different methods of optimization for the generation engines for the benefit of content creators, 2) they highlight the versatility of our proposed GEO methods across different generation engines, and 3) they show that content creators can directly use our easy-to-implement proposed GEO approach and thus have a high real-world impact. We refer the reader to Appendix C.1 for more details.

Table 5: Absolute impression metrics of GEO methods on the GEO-bench, with Perplexity.ai as GE.While SEO methods such as keyword stuffing perform poorly, our proposed GEO method adapts well to multiple generation engines and significantly improves content visibility.

methodologies	Number of words after repositioning	subjective impression
no optimization	24.1	24.7
Keyword Filling	21.9	28.1
Add by reference	29.1	32.1
Statistical data addition	26.2	33.9

7 Related work

Evidence-based answer generation:Previous work has used several techniques to generate source-based answers.Nakano et al [19] trained the GPT-3 navigation network environment to generate source-based answers. Similarly, other approaches [17, 23, 24] access sources through search engines to generate answers. Our work unifies these approaches and provides a common benchmark for improving these systems in the future. In a recent working draft, Kumar and Lakkaraju [11] show that strategic text sequences can manipulate LLM recommendations to improve product visibility in the generation engine. While their approach focuses on increasing product visibility through adversarial text, our approach introduces non-adversarial strategies to optimize any web content to improve visibility in generative engine search results.

Retrieving Enhanced Language Models:Several recent works address the problem of language models with limited memory to accomplish a task by accessing relevant sources from a knowledge base [3, 9, 18]. However, generation engines need to generate answers and provide attribution throughout the answer. Moreover, generation engines are not limited to a single textual modality, either input or output. Moreover, the framework of a generation engine is not limited to fetching relevant sources, but includes multiple tasks such as query rewriting, source selection, and deciding how and when to execute them.

Search Engine Optimization:Over the past 25 years, a great deal of research has been devoted to optimizing website content for search engines [2, 12, 22]. These methods are categorized into on-page SEO, which improves content and user experience, and off-page SEO, which improves site authority through link building. In contrast, GEO involves a more complex, multimodal, dialog-setting environment. Since GEO is optimized for generative models and is not limited to simple keyword matching, traditional SEO strategies are not applicable to generative engine settings, which highlights the need for GEO.

8 Conclusion

In this work, we formulate search engines equipped with generative models, which we call generation engines. We propose Generation Engine Optimization (GEO) to help content creators optimize their content under the generation engine. We define impression metrics for the generation engine and propose and publish the GEO-bench: a benchmark containing diverse user queries from multiple domains and settings, and the sources needed to answer them. We present several methods for optimizing the content of the generation engine and show that these methods can improve the visibility of sources in the generation engine responses by up to 401 TP3 T. Among other findings, we find that including citations, citations of relevant sources, and statistics can significantly improve source visibility. In addition, we find dependencies between GEO method effectiveness and query domains, as well as the potential for combining multiple GEO strategies. We show promising results on a commercialized generation engine with millions of active users, demonstrating the real-world impact of our work. In summary, our work is the first to formalize an important and timely GEO paradigm, releasing algorithms and infrastructure (benchmarks, datasets, and metrics) to facilitate the community's rapid progress on generative engines. This serves as a first step in understanding the impact of generative engines on the digital space and the role of GEO in this new search engine paradigm.

9 Restrictions

While we rigorously tested the performance of our proposed methods on two generative engines, including a publicly available generative engine, these methods may need to be adapted as GE evolves, similar to the evolution of SEO. In addition, while we endeavor to ensure that the queries in our GEObench are as close as possible to real-world queries, the nature of the queries may change over time and require continuous updating. Furthermore, due to the black-box nature of search engine algorithms, we do not evaluate how GEO methods affect search rankings. However, we note that the changes made by GEO methods are targeted changes to textual content, somewhat similar to SEO methods, without affecting other metadata such as domain names, backlinks, etc., and thus they are unlikely to affect search engine rankings. Furthermore, as larger context lengths become economical in language models, it is expected that future generative models will be able to ingest more sources, thus reducing the impact of search rankings. Finally, although each query in our proposed GEObench is labeled and manually checked, there may be discrepancies due to subjective interpretations or tagging errors.

10 Acknowledgements

This material is based on work supported by the National Science Foundation under grant number 2107048. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

bibliography

summarize

appendice

Appendix A: Dialogue Generation Engine

In Section 2.1, we discussed a single-round generation engine that outputs a single response given a user query. However, an advantage of upcoming generation engines will be their ability to engage in an active back-and-forth dialog with the user. The dialog allows the user to provide clarification and ask follow-up questions about the query or the generation engine response. Specifically, in Equation 1, the input is not a single query q_uInstead, it is modeled as a history of dialogue H = (q^t_u, r^t) Pair. Response r^t+1 Subsequently defined:

GE := f_LE(H, P_U) → r^t+1

included among these t It's rounds.

In addition, in order to engage users in a dialog, a separate LLM thatL_follow maybe L_respIt is possible that, depending on the H,P_U cap (a poem) r^t+1 Generate suggested follow-up queries. These suggested follow-up queries are usually designed to maximize the likelihood of user engagement. This benefits not only the generation engine provider by increasing user interaction, but also the website owner by enhancing its visibility. In addition, these follow-up queries can help users obtain more detailed information.

Appendix B: Experimental Setup

B.1 Evaluation of the generation engine

The exact tips used are shown in Listing 1.

B.2 Benchmarks

GEO-bench contains queries from nine datasets. Figure 2 shows representative questions from each dataset. In addition, we label each query according to a set of seven different categories. For labeling, we use the GPT-4 model and manually confirm the high recall and precision of the labels. However, due to such an automated system, the labeling may be noisy and should not be considered carefully. Detailed information about these queries is given below:

Listing 2: Representative questions for the 9 datasets in GEO-bench

GEO: Generation Engine Optimization-2

Difficulty Level:Query complexity, from simple to complex.
Nature of the query:Query the type of information sought, such as factual, opinion, or comparison.
Type:Category or field of inquiry, such as arts and entertainment, finance, or science.
Specific topics:The specifics of the query, such as physics, economics, or computer science.
Sensitivity:Query whether sensitive topics are involved.
User Intent:The purpose of the user's query, such as research, purchase, or entertainment.
Answer Type:Query the format of the answer sought, such as fact, opinion, or list.

B.3 Assessment of indicators

We used seven different subjective impression metrics whose tips are available in our public repository: https://github.com/GEOoptim/GEO. The GPT-3.5 turbo was used for all experiments.

B.4 GEO methodology

We present nine different generation engine optimization methods to optimize web content for generation engines. We evaluated these methods on the full GEO-bench test set. In addition, to reduce the variance in the results, we conducted experiments under five different random seeds and reported the mean values.

Table 6: Absolute impression metrics of GEO methods on GEO-bench.Simple methods like keyword stuffing have traditionally underperformed in SEO compared to baselines. However, our proposed methods, such as stats-add and citation-add, show strong performance improvements across all metrics. The best method improves over the baseline on position-adjusted word count and subjective impressions, respectively 41% and 28%.

methodologies	Number of words after repositioning			subjective impression
methodologies	number of written characters	placement	population (statistics)	relevance	affect (usually adversely)	distinctiveness	variegation	(dialect) remarry	placement	reckoning	average value
Performance without generation engine optimization
no optimization	19.7 (0.7)	19.6 (0.5)	19.8 (0.6)	19.8 (0.9)	19.8 (1.6)	19.8 (0.6)	19.8 (1.1)	19.8 (1.0)	19.8 (1.0)	19.8 (0.9)	19.8 (0.9)
Optimization Methods for Underperforming Generation Engines
Keyword Filling	19.6 (0.5)	19.5 (0.6)	19.8 (0.5)	20.8 (0.8)	19.8 (1.0)	20.4 (0.5)	20.6 (0.9)	19.9 (0.9)	21.1 (1.0)	21.0 (0.9)	20.6 (0.7)
unique vocabulary	20.6 (0.6)	20.5 (0.7)	20.7 (0.5)	20.8 (0.7)	20.3 (1.3)	20.5 (0.3)	20.9 (0.3)	20.4 (0.7)	21.5 (0.6)	21.2 (0.4)	20.9 (0.4)
Well-Performing Generation Engine Optimization Methods
easy grasp	21.5 (0.7)	22.0 (0.8)	21.5 (0.6)	21.0 (1.1)	21.1 (1.8)	21.2 (0.9)	20.9 (1.1)	20.6 (1.0)	21.9 (1.1)	21.4 (0.9)	21.3 (1.0)
(having) authority	21.3 (0.7)	21.2 (0.9)	21.1 (0.8)	22.3 (0.8)	22.9 (0.8)	22.1 (0.9)	23.2 (0.7)	21.9 (0.4)	23.9 (1.2)	23.0 (1.1)	23.1 (0.7)
technical term	22.5 (0.6)	22.4 (0.6)	22.5 (0.6)	21.2 (0.7)	21.8 (0.8)	20.5 (0.5)	21.1 (0.6)	20.5 (0.6)	22.1 (0.6)	21.2 (0.2)	21.4 (0.4)
Fluidity Optimization	24.4 (0.8)	24.4 (0.6)	24.4 (0.8)	21.3 (0.9)	23.2 (1.5)	21.2 (1.0)	21.4 (1.4)	20.8 (1.3)	23.2 (1.8)	21.5 (1.3)	22.1 (1.2)
Cited sources	25.5 (0.7)	25.3 (0.6)	25.3 (0.6)	22.8 (0.9)	26.7 (1.1)	24.6 (0.7)	24.9 (0.9)	23.2 (0.9)	26.4 (1.0)	24.1 (1.2)	25.5 (0.9)
Add by reference	27.5 (0.8)	27.6 (0.8)	27.1 (0.6)	23.1 (1.4)	26.1 (0.9)	23.6 (0.9)	24.5 (1.2)	22.4 (1.2)	26.1 (1.2)	23.8 (1.2)	24.8 (1.1)
Statistical data addition	25.8 (1.2)	26.0 (0.8)	25.5 (1.2)	23.1 (1.4)	24.2 (0.7)	21.7 (0.3)	22.3 (0.8)	21.3 (0.9)	23.5 (0.4)	21.7 (0.6)	22.9 (0.5)

B.5 Tips for the GEO Method

All of our tips are available in our public codebase: https://github.com/GEO-optim/GEO. All experiments were done using the GPT-3.5 turbo.

Appendix C: Additional Results and Discussion

C.1 GEO in the real world: experiments with the deployed generation engine

We also evaluated our proposed generation engine optimization approach on a real-world deployed generation engine, Perplexity.ai. Since Perplexity.ai does not allow users to specify source URLs, we instead uploaded the source text as a file to Perplexity.ai, while ensuring that all answers were generated using only the provided file source. We evaluated all of our methods on a 200-sample subset of our test set. The results using Perplexity.ai are shown in Table 7.

Table 7: Performance improvement of the GEO method on the GEO-bench with Perplexity.ai as the generation engine.Simple methods like keyword stuffing typically perform worse in SEO compared to baselines. However, our proposed methods, such as stats-add and citation-add, show strong performance improvements on all metrics. The best method improves 22% and 37% over baseline in position-adjusted word count and subjective impressions, respectively.

methodologies	Number of words after repositioning			subjective impression
methodologies	number of written characters	placement	population (statistics)	relevance	affect (usually adversely)	distinctiveness	variegation	(dialect) remarry	placement	reckoning	average value
Performance without generation engine optimization
no optimization	24.0	24.4	24.1	24.7	24.7	24.7	24.7	24.7	24.7	24.7	24.7
Optimization Methods for Underperforming Generation Engines
Keyword Filling	21.9	21.4	21.9	26.3	27.2	27.2	30.2	27.9	28.2	26.9	28.1
unique vocabulary	24.0	23.7	23.6	24.9	25.1	24.7	23.0	23.6	23.9	24.1	24.1
Well-Performing Generation Engine Optimization Methods
(having) authority	25.6	25.7	25.9	28.9	30.9	31.2	31.7	31.5	26.9	29.5	30.6
Fluidity Optimization	25.8	26.2	26.0	28.9	29.4	29.8	30.6	30.1	29.6	29.6	30.0
Cited sources	26.6	26.9	26.8	19.8	20.7	19.5	18.9	20.0	18.5	18.9	19.0
Add by reference	28.8	28.7	29.1	31.4	31.9	31.9	32.3	31.4	31.7	30.9	32.1
Statistical data addition	25.8	26.6	26.2	31.6	33.4	34.0	33.7	34.0	33.3	33.1	33.9

Results and analysis

Tables 5 and 7 show the absolute impression metrics of the GEO approach when using Perplexity.ai as the generation engine. The results show that our GEO method performs well in improving content visibility compared to the baseline. Specifically:

Add by reference: Improvement of 221 TP3T over baseline on the position-adjusted word count metric.
Statistical data addition: 371 TP3T improvement from baseline on the subjective impression metric.

These results are significant for three reasons:

Emphasize the importance of different GEO methods: These results suggest that developing different generation engine optimization methods can be beneficial for content creators.
Generalizability of the methodology: Our GEO method performs well on different generation engines, demonstrating its broad applicability.
Practical application value: Content creators can directly use the easy-to-implement GEO methodology we propose to make a significant impact in the real world.

In addition, we observe that traditional SEO methods (e.g., keyword stuffing) perform poorly in generative engines, even at 101 TP3T lower than the baseline.This further supports our view that generative engines require specialized optimization strategies rather than simply employing traditional SEO techniques. Through experiments on Perplexity.ai, we validate the effectiveness of our generation engine optimization methods on different generation engines. These methods not only improve content visibility, but also demonstrate their potential for real-world applications. Our research provides content creators with a new tool to address the challenges posed by generation engines and optimize their content for better visibility and user engagement.

C.2 Discussion

Impact of Domain-Specific Optimization

Our analysis shows that different GEO methods have different effects in different areas. Example:

(having) authority:: Strong performance in debate style questions and queries related to the field of "history". This is consistent with our intuition that more persuasive forms of writing may be more valuable in debates.
Cited sources: Particularly useful for factual questions, as citations provide a source of validation for the facts presented, thus enhancing the credibility of the response.
Statistical data addition: Significant results were seen in the "Law and Government" and "Opinion" types of questions, suggesting that data-driven evidence can improve the visibility of a website in a particular context.
Add by reference: Most effective in the areas of "people and society", "interpretation" and "history". This may be due to the fact that these areas usually involve personal narratives or historical events, and direct quotations can add authenticity and depth to the content.

Impact of Portfolio Strategy

Our study also shows that combining multiple GEO strategies can further enhance performance. For example, using a combination of fluency optimization and statistics addition resulted in the greatest performance. In addition, citation sourcing significantly improves performance when used in combination with other methods, despite relatively poor results when used alone. These findings emphasize the importance of investigating combinations of GEO methods as they are likely to be used by real-world content creators.

Impact on SEO

Our findings have important implications for the SEO field. With the rise of generative engines, traditional SEO techniques may no longer be sufficient. Website owners need to adopt new strategies to optimize their content for this new search paradigm. Our GEO approach offers a new way of thinking that emphasizes the importance of content quality and presentation, rather than relying solely on keyword stuffing and backlink building.

future work

Future research could further explore the following areas:

Long-term effects: A study of the impact of the GEO approach on long-term website visibility and traffic.
User Behavior Analysis: Analyze user behavior patterns when interacting with the generation engine to better understand how to optimize content to attract and retain users.
Multimodal Content Optimization: Extending the GEO methodology to optimize images, videos, and other multimedia content for the generation engine's ability to process multimodal information.
Automation tool development: Develop automated tools to help content creators more easily implement GEO strategies and monitor and adjust their optimization strategies in real time.

Through these research directions, we can more fully understand the impact of generation engines on the digital space and provide content creators with more effective tools to cope with these changes.

May not be reproduced without permission:Chief AI Sharing Circle " GEO: Generation Engine Optimization