Skip to content

Commit 4b15aa1

Browse files
committed
update blog post
1 parent 0d09e0b commit 4b15aa1

File tree

1 file changed

+49
-31
lines changed
  • blog/fluxninja-acquisition-2024-03-17

1 file changed

+49
-31
lines changed

blog/fluxninja-acquisition-2024-03-17/blog.md

+49-31
Original file line numberDiff line numberDiff line change
@@ -52,31 +52,48 @@ any other software engineering challenge of the past. Based on our learnings
5252
while building complex workflows, it became apparent that we need to invest in a
5353
platform that can solve the following problems:
5454

55-
- Prompt server: Prompt design and runtime rendering is akin to responsive web
56-
design, in which a page has to be rendered differently based on the screen
57-
size and other parameters. We need a platform that can render prompts based on
58-
the context windows of underlying models and prioritize the context packing
59-
based on business attributes. For instance, it's impossible to include the
60-
entire repository and past conversations in a single prompt for code review.
61-
Even if it were possible, LLM models exhibit poor recall when doing an
62-
inference on a large context window. While it may be acceptable for use cases
63-
like chat, it’s not for use cases like code reviews that require accurate and
64-
precise outputs.
55+
- Prompt rendering: Prompt design and rendering is akin to responsive web
56+
design. Web servers render pages based on the screen size and other
57+
parameters, for example, on a mobile device, navigation bars are usually
58+
rendered as hamburger menus, making it easier for human consumption.
59+
Similarly, we need a prompt server that can render prompts based on the
60+
context windows of underlying models and prioritize the packing of context
61+
based on business attributes, making it easier for AI consumption. It's not
62+
feasible to include the entire repository, past conversations, documentation,
63+
learnings, etc. in a single code review prompt because of the context window
64+
size limitations. Even if it was possible, AI models exhibit poor recall when
65+
doing an inference on a completely packed context window. While tight packing
66+
may be acceptable for use cases like chat, it’s not for use cases like code
67+
reviews that require accurate inferences. Therefore, it's critical to render
68+
prompts in such a way that the quality of inference is high for each use-case,
69+
while being cost-effective and fast. In addition to packing logic, basic
70+
guardrails are also needed, especially when rendering prompts based on inputs
71+
from end-users. Since we provide a free service to public repositories, we
72+
have to ensure that our product is not misused beyond its intended purpose or
73+
tricked into divulging sensitive information, which could include our base
74+
prompts.
6575

66-
- Observability into LLM outputs: One key challenge with prompting is that it's
76+
- Validating quality of inference: Generative AI models consume text and output
77+
text. On the other hand, traditional code and APIs required structured data.
78+
Therefore, the prompt service needs to expose a RESTful or gRPC API that can
79+
be consumed by the other services in the workflow. We touched upon the
80+
rendering of prompts based on structured requests in the previous point, but
81+
the prompt service also needs to parse and validate responses into structured
82+
data. This is a non-trivial problem, and multiple tries are often required to
83+
ensure that the response is thorough. For instance, we found that when we pack
84+
multiple files in a single code review prompt, AI models often miss hunks
85+
within a file or miss files altogether, leading to incomplete reviews.
86+
87+
- Observability: One key challenge with generative AI and prompting is that it's
6788
inherently non-deterministic. The same prompt can result in vastly different
6889
outputs, which can be frustrating, but this is precisely what makes AI systems
6990
powerful in the first place. Even slight variations in the prompt can result
70-
in vastly inferior or noisy outputs, leading to a decline in user conversion.
71-
At the same time, the underlying AI models are ever-evolving, and the same
72-
prompts drift over time as the models get regular updates. Traditional
73-
observability is of little use here, and we need to rethink how we classify
74-
and track different outputs and their quality. Again, this is a problem that
75-
we have to solve in-house.
76-
77-
- Guardrails: Since we provide a free service to public repositories, we must
78-
ensure that our product is not misused beyond its intended purpose or tricked
79-
into divulging sensitive information, which could include our base prompts.
91+
in vastly inferior or noisy outputs, leading to a decline in user engagement.
92+
At the same time, the underlying AI models are ever-evolving, and the
93+
established prompts drift over time as the models get regular updates.
94+
Traditional observability is of little use here, and we need to rethink how we
95+
classify and track generated output and measure quality. Again, this is a
96+
problem that we have to solve in-house.
8097

8198
While FluxNinja's Aperture project was limited to solving a different problem
8299
around load management and reliability, we found that the underlying technology
@@ -87,15 +104,16 @@ controlling AI behavior. Packing the context window with relevant documents
87104
of providing proprietary data compared to fine-tuning the model. Most AI labs
88105
focus on increasing the context window rather than making fine-tuning easier or
89106
cheaper. Despite the emergence of these clear trends, applied AI systems are
90-
still in their infancy. None of the recent AI vendors are building the "right"
91-
platform, as most of their focus has been on background/durable execution
92-
platforms, model routing proxies/gateways, chaining RAG pipelines using reusable
93-
components, and so on. Most of these approaches fall short of what a real-world
94-
AI workflow requires. The right abstractions and best practices will still have
95-
to appear, and the practitioners themselves will have to build them. Creating
96-
the “right” AI platform will be a differentiator for AI-first companies, and we
97-
are excited to tackle this problem head-on with a systems engineering mindset.
107+
still in their infancy. None of the recent AI vendors seem to be building the
108+
"right" platform, as most of their focus has been on background/durable
109+
execution frameworks, model routing proxies/gateways, composable RAG pipelines,
110+
and so on. Most of these approaches fall short of what a real-world AI workflow
111+
requires. The right abstractions and best practices will still have to appear,
112+
and the practitioners themselves will have to build them. AI platforms will be a
113+
differentiator for AI-first companies, and we are excited to tackle this problem
114+
head-on with a systems engineering mindset.
98115

99116
We are excited to have the FluxNinja team on board and to bring our users the
100-
best-in-class AI workflows. We are also happy to welcome Harjot Gill, the
101-
founder of FluxNinja, and the rest of the team to CodeRabbit.
117+
best-in-class AI workflows. We are also happy to welcome
118+
[Harjot Gill](https://www.linkedin.com/in/harjotsgill/), the founder of
119+
FluxNinja, and the rest of the team to CodeRabbit.

0 commit comments

Comments
 (0)