You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: blog/2023-12-23-how-we-built-cost-effective-generative-ai-application/index.md
+18-29
Original file line number
Diff line number
Diff line change
@@ -10,47 +10,45 @@ aiDisclaimer: true
10
10
11
11
# How we built a cost-effective Generative AI application
12
12
13
-
Since its inception, CodeRabbit has experienced steady growth in its user base, comprising developers and organizations. Installed on thousands of repositories, CodeRabbit reviews several thousand pull requests (PRs) daily. We have [previously discussed](coderabbit-openai-rate-limits) our use of an innovative client-side request prioritization technique to navigate OpenAI rate limits. In this blog post, we will explore how we manage to deliver continuous, in-depth code analysis cost-effectively, while also providing a robust, free plan to open source projects.
13
+
Since its inception, CodeRabbit has experienced steady growth in its user base, comprising developers and organizations. Installed on thousands of repositories, CodeRabbit reviews several thousand pull requests (PRs) daily. We have [previously discussed](/blog/coderabbit-openai-rate-limits) our use of an innovative client-side request prioritization technique to navigate OpenAI rate limits. In this blog post, we will explore how we manage to deliver continuous, in-depth code analysis cost-effectively, while also providing a robust, free plan to open source projects.
14
14
15
+
<!--truncate-->
15
16
16
17
## CodeRabbit's Product Offering and LLM Consumption
17
18
18
19
CodeRabbit is an AI-first PR Review tool that uses GPT APIs for various functionalities. CodeRabbit offers the following tiers of service:
19
20
20
-
* CodeRabbit Pro: A paid service providing in-depth code reviews for private repositories. It's priced according to the number of developers, starting with a full-featured 7-day free trial.
21
-
* CodeRabbit for Open Source: A free service offering in-depth code reviews for open source (public) repositories.
22
-
* CodeRabbit Free: A free plan for private repositories, providing summarization of code changes in a PR.
21
+
- CodeRabbit Pro: A paid service providing in-depth code reviews for private repositories. It's priced according to the number of developers, starting with a full-featured 7-day free trial.
22
+
- CodeRabbit for Open Source: A free service offering in-depth code reviews for open source (public) repositories.
23
+
- CodeRabbit Free: A free plan for private repositories, providing summarization of code changes in a PR.
23
24
24
25
Our vision is to offer an affordable, AI-driven code review service to developers and organizations of all sizes while supporting the open source community. We are particularly mindful of open source projects, understanding the challenges in reviewing community contributions. Our goal is to reduce the burden of code reviews for open source maintainers by improving submission quality before the review process begins.
25
26
26
27
CodeRabbit's review process is automatically triggered when a PR is opened in GitHub or GitLab. Each review involves a complex workflow that builds context and reviews each file using large language models (LLMs). Code review is a complex task that requires an in-depth understanding of the changes and the existing codebase. High-quality review comments necessitate state-of-the-art language models such as gpt-4. However, these models are significantly more expensive than simpler models, as shown by the [10x-30x price difference](https://openai.com/pricing) between gpt-3.5-turbo and gpt-4 models.
27
28
28
-
| Model | Context Size | Cost per 1k Input Tokens | Cost per 1k Output Tokens |
> gpt-4 model is 10-30x more expensive than gpt-3.5-turbo model
37
37
38
-
39
38
Our primary cost driver is using OpenAI's API to generate code review comments. We will share our cost optimization strategies in the following sections. Without these optimizations, our free offering to open source projects would not be feasible.
40
39
41
40
Let's take a look at the strategies that helped us optimize the cost and improve user experience.
42
41
43
-
----
42
+
---
44
43
45
44
## 1. Dual-models: Summarize & Triage Using Simpler Models
46
45
47
-
For less complex tasks such as summarizing code diffs, simpler models such as gpt-3.5-turbo are adequate. As an initial optimization, we use a mix of models, as detailed in [our earlier blog post](coderabbit-deep-dive). We use gpt-3.5-turbo to compress large code diffs into concise summaries, which are then processed by gpt-4 for reviewing each file. This dual-model approach significantly reduces costs and enhances review quality, enabling us to manage PRs with numerous files and extensive code differences.
46
+
For less complex tasks such as summarizing code diffs, simpler models such as gpt-3.5-turbo are adequate. As an initial optimization, we use a mix of models, as detailed in [our earlier blog post](/blog/coderabbit-deep-dive). We use gpt-3.5-turbo to compress large code diffs into concise summaries, which are then processed by gpt-4 for reviewing each file. This dual-model approach significantly reduces costs and enhances review quality, enabling us to manage PRs with numerous files and extensive code differences.
48
47
49
48
Additionally, we implemented triage logic to skip trivial changes from the review process. We use the simpler model to classify each diff as either trivial or complex, as part of the same prompt used for code diff summarization. Low-risk changes such as documentation updates, variable renames, and so on, are thus excluded from the thorough review process. This strategy has proven effective, as simpler models can accurately identify trivial changes.
50
49
51
50
By using this dual-model approach for summarization and filtering out trivial changes, we save almost 50% on costs.
52
51
53
-
54
52
## Rate-limiting: Enforcing Fair Usage
55
53
56
54
Upon launching our free service for open source projects, we noticed individual developers using it as a coding co-pilot by making hundreds of incremental commits for continuous feedback. CodeRabbit, designed for thorough code reviews unlike tools such as GitHub Copilot, incurs high costs when used in this manner. Therefore, we implemented hourly rate-limits on the number of files and commits reviewed per user, to control excessive usage without compromising user experience. These limits vary across different product tiers. For example, we set more aggressive limits for open source users compared to trial and paid users.
@@ -61,25 +59,16 @@ In FluxNinja Aperture, policies are decoupled from application logic through lab
61
59
62
60
Integration with FluxNinja Aperture SDK
63
61
64
-
65
-
66
62

67
-
Rate limiting commits per hour for open source users
68
-
69
-
70
-
63
+
Rate limiting commits per hour for open source users
71
64
72
65

73
-
Wait time feedback to the user in a comment
66
+
Wait time feedback to the user in a comment
74
67
75
68
Given the high cost and capacity constraints of state-of-the-art models such as gpt-4, rate-limiting is an essential requirement for any AI application. By implementing fair-usage rate limits, we are saving almost 20% on our costs.
76
69
77
-
78
-
79
-
80
70

81
-
Rate limit metrics for open source users
82
-
71
+
Rate limit metrics for open source users
83
72
84
73
## Caching: Avoid Re-generating Similar Review Comments
85
74
@@ -91,8 +80,8 @@ Fortunately, Aperture also provides a simple caching mechanism for summaries fro
91
80
92
81
By using the more cost-effective gpt-3.5-turbo model as an advanced similarity filter before invoking the more expensive gpt-4 model for the same file, we have saved almost 20% of our costs by avoiding the generation of similar review comments.
93
82
94
-
----
83
+
---
95
84
96
85
## Conclusion
97
86
98
-
In this blog post, we briefly discussed how state-of-the-art LLMs such as gpt-4 can be expensive in production. We also shared our strategy of using a combination of simpler models, rate limits, and caching to optimize operational costs. We hope our experiences can assist other AI startups in optimizing their costs and developing cost-effective AI applications.
87
+
In this blog post, we briefly discussed how state-of-the-art LLMs such as gpt-4 can be expensive in production. We also shared our strategy of using a combination of simpler models, rate limits, and caching to optimize operational costs. We hope our experiences can assist other AI startups in optimizing their costs and developing cost-effective AI applications.
0 commit comments