Skip to content

feat(vertexai): Gemini multimodal output #8922

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 8, 2025
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .changeset/perfect-camels-try.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
'firebase': minor
'@firebase/vertexai': minor
---

Add support for Gemini multimodal output
12 changes: 12 additions & 0 deletions common/api-review/vertexai.api.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ export { Date_2 as Date }
export interface EnhancedGenerateContentResponse extends GenerateContentResponse {
// (undocumented)
functionCalls: () => FunctionCall[] | undefined;
inlineData: () => GenerativeContentBlob[] | undefined;
text: () => string;
}

Expand Down Expand Up @@ -304,6 +305,8 @@ export interface GenerationConfig {
// (undocumented)
presencePenalty?: number;
responseMimeType?: string;
// @beta
responseModalities?: ResponseModality[];
responseSchema?: TypedSchema | SchemaRequest;
// (undocumented)
stopSequences?: string[];
Expand Down Expand Up @@ -596,6 +599,15 @@ export interface RequestOptions {
timeout?: number;
}

// @beta
export const ResponseModality: {
readonly TEXT: "TEXT";
readonly IMAGE: "IMAGE";
};

// @beta
export type ResponseModality = (typeof ResponseModality)[keyof typeof ResponseModality];

// @public (undocumented)
export interface RetrievedContextAttribution {
// (undocumented)
Expand Down
11 changes: 11 additions & 0 deletions docs-devsite/vertexai.enhancedgeneratecontentresponse.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ export interface EnhancedGenerateContentResponse extends GenerateContentResponse
| Property | Type | Description |
| --- | --- | --- |
| [functionCalls](./vertexai.enhancedgeneratecontentresponse.md#enhancedgeneratecontentresponsefunctioncalls) | () =&gt; [FunctionCall](./vertexai.functioncall.md#functioncall_interface)<!-- -->\[\] \| undefined | |
| [inlineData](./vertexai.enhancedgeneratecontentresponse.md#enhancedgeneratecontentresponseinlinedata) | () =&gt; [GenerativeContentBlob](./vertexai.generativecontentblob.md#generativecontentblob_interface)<!-- -->\[\] \| undefined | Aggregates and returns all [InlineDataPart](./vertexai.inlinedatapart.md#inlinedatapart_interface) from the [GenerateContentResponse](./vertexai.generatecontentresponse.md#generatecontentresponse_interface)<!-- -->'s first candidate. |
| [text](./vertexai.enhancedgeneratecontentresponse.md#enhancedgeneratecontentresponsetext) | () =&gt; string | Returns the text string from the response, if available. Throws if the prompt or candidate was blocked. |

## EnhancedGenerateContentResponse.functionCalls
Expand All @@ -34,6 +35,16 @@ export interface EnhancedGenerateContentResponse extends GenerateContentResponse
functionCalls: () => FunctionCall[] | undefined;
```

## EnhancedGenerateContentResponse.inlineData

Aggregates and returns all [InlineDataPart](./vertexai.inlinedatapart.md#inlinedatapart_interface) from the [GenerateContentResponse](./vertexai.generatecontentresponse.md#generatecontentresponse_interface)<!-- -->'s first candidate.

<b>Signature:</b>

```typescript
inlineData: () => GenerativeContentBlob[] | undefined;
```

## EnhancedGenerateContentResponse.text

Returns the text string from the response, if available. Throws if the prompt or candidate was blocked.
Expand Down
20 changes: 20 additions & 0 deletions docs-devsite/vertexai.generationconfig.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,12 @@ export interface GenerationConfig
| [maxOutputTokens](./vertexai.generationconfig.md#generationconfigmaxoutputtokens) | number | |
| [presencePenalty](./vertexai.generationconfig.md#generationconfigpresencepenalty) | number | |
| [responseMimeType](./vertexai.generationconfig.md#generationconfigresponsemimetype) | string | Output response MIME type of the generated candidate text. Supported MIME types are <code>text/plain</code> (default, text output), <code>application/json</code> (JSON response in the candidates), and <code>text/x.enum</code>. |
<<<<<<< HEAD
| [responseSchema](./vertexai.generationconfig.md#generationconfigresponseschema) | [TypedSchema](./vertexai.md#typedschema) \| [SchemaRequest](./vertexai.schemarequest.md#schemarequest_interface) | Output response schema of the generated candidate text. This value can be a class generated with a [Schema](./vertexai.schema.md#schema_class) static method like <code>Schema.string()</code> or <code>Schema.object()</code> or it can be a plain JS object matching the [SchemaRequest](./vertexai.schemarequest.md#schemarequest_interface) interface. <br/>Note: This only applies when the specified <code>responseMIMEType</code> supports a schema; currently this is limited to <code>application/json</code> and <code>text/x.enum</code>. |
=======
| [responseModalities](./vertexai.generationconfig.md#generationconfigresponsemodalities) | [ResponseModality](./vertexai.md#responsemodality)<!-- -->\[\] | <b><i>(Public Preview)</i></b> Generation modalities to be returned in generation responses. |
| [responseSchema](./vertexai.generationconfig.md#generationconfigresponseschema) | [TypedSchema](./vertexai.md#typedschema) \| [SchemaRequest](./vertexai.schemarequest.md#schemarequest_interface) | Output response schema of the generated candidate text. This value can be a class generated with a <code>[Schema](./vertexai.schema.md#schema_class)</code> static method like <code>Schema.string()</code> or <code>Schema.object()</code> or it can be a plain JS object matching the <code>[SchemaRequest](./vertexai.schemarequest.md#schemarequest_interface)</code> interface. <br/>Note: This only applies when the specified <code>responseMIMEType</code> supports a schema; currently this is limited to <code>application/json</code> and <code>text/x.enum</code>. |
>>>>>>> 4f7f1ecb1 (feat(vertexai): Gemini multimodal output)
| [stopSequences](./vertexai.generationconfig.md#generationconfigstopsequences) | string\[\] | |
| [temperature](./vertexai.generationconfig.md#generationconfigtemperature) | number | |
| [topK](./vertexai.generationconfig.md#generationconfigtopk) | number | |
Expand Down Expand Up @@ -75,6 +80,21 @@ Output response MIME type of the generated candidate text. Supported MIME types
responseMimeType?: string;
```

## GenerationConfig.responseModalities

> This API is provided as a preview for developers and may change based on feedback that we receive. Do not use this API in a production environment.
>

Generation modalities to be returned in generation responses.

- Multimodal response generation is only supported in `gemini-2.0-flash-exp`<!-- -->, not `gemini-2.0-flash`<!-- -->. - Only image generation (`ResponseModality.IMAGE`<!-- -->) is supported.

<b>Signature:</b>

```typescript
responseModalities?: ResponseModality[];
```

## GenerationConfig.responseSchema

Output response schema of the generated candidate text. This value can be a class generated with a [Schema](./vertexai.schema.md#schema_class) static method like `Schema.string()` or `Schema.object()` or it can be a plain JS object matching the [SchemaRequest](./vertexai.schemarequest.md#schemarequest_interface) interface. <br/>Note: This only applies when the specified `responseMIMEType` supports a schema; currently this is limited to `application/json` and `text/x.enum`<!-- -->.
Expand Down
31 changes: 31 additions & 0 deletions docs-devsite/vertexai.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,12 +125,14 @@ The Vertex AI in Firebase Web SDK.
| Variable | Description |
| --- | --- |
| [POSSIBLE\_ROLES](./vertexai.md#possible_roles) | Possible roles. |
| [ResponseModality](./vertexai.md#responsemodality) | <b><i>(Public Preview)</i></b> Generation modalities to be returned in generation responses. |

## Type Aliases

| Type Alias | Description |
| --- | --- |
| [Part](./vertexai.md#part) | Content part - includes text, image/video, or function call/response part types. |
| [ResponseModality](./vertexai.md#responsemodality) | <b><i>(Public Preview)</i></b> Generation modalities to be returned in generation responses. |
| [Role](./vertexai.md#role) | Role is the producer of the content. |
| [Tool](./vertexai.md#tool) | Defines a tool that model can call to access external knowledge. |
| [TypedSchema](./vertexai.md#typedschema) | A type that includes all specific Schema types. |
Expand Down Expand Up @@ -223,6 +225,22 @@ Possible roles.
POSSIBLE_ROLES: readonly ["user", "model", "function", "system"]
```

## ResponseModality

> This API is provided as a preview for developers and may change based on feedback that we receive. Do not use this API in a production environment.
>

Generation modalities to be returned in generation responses.

<b>Signature:</b>

```typescript
ResponseModality: {
readonly TEXT: "TEXT";
readonly IMAGE: "IMAGE";
}
```

## Part

Content part - includes text, image/video, or function call/response part types.
Expand All @@ -233,6 +251,19 @@ Content part - includes text, image/video, or function call/response part types.
export type Part = TextPart | InlineDataPart | FunctionCallPart | FunctionResponsePart | FileDataPart;
```

## ResponseModality

> This API is provided as a preview for developers and may change based on feedback that we receive. Do not use this API in a production environment.
>

Generation modalities to be returned in generation responses.

<b>Signature:</b>

```typescript
export type ResponseModality = (typeof ResponseModality)[keyof typeof ResponseModality];
```

## Role

Role is the producer of the content.
Expand Down
64 changes: 64 additions & 0 deletions packages/vertexai/src/requests/response-helpers.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ import {
FinishReason,
GenerateContentResponse,
ImagenGCSImage,
InlineDataPart,
ImagenInlineImage
} from '../types';
import { getMockResponse } from '../../test-utils/mock-response';
Expand Down Expand Up @@ -132,6 +133,44 @@ const fakeResponseMixed3: GenerateContentResponse = {
]
};

const inlineDataPart1: InlineDataPart = {
inlineData: {
mimeType: 'image/png',
data: 'base64encoded...'
}
};

const inlineDataPart2: InlineDataPart = {
inlineData: {
mimeType: 'image/jpeg',
data: 'anotherbase64...'
}
};

const fakeResponseInlineData: GenerateContentResponse = {
candidates: [
{
index: 0,
content: {
role: 'model',
parts: [inlineDataPart1, inlineDataPart2]
}
}
]
};

const fakeResponseTextAndInlineData: GenerateContentResponse = {
candidates: [
{
index: 0,
content: {
role: 'model',
parts: [{ text: 'Describe this:' }, inlineDataPart1]
}
}
]
};

const badFakeResponse: GenerateContentResponse = {
promptFeedback: {
blockReason: BlockReason.SAFETY,
Expand All @@ -148,13 +187,15 @@ describe('response-helpers methods', () => {
const enhancedResponse = addHelpers(fakeResponseText);
expect(enhancedResponse.text()).to.equal('Some text and some more text');
expect(enhancedResponse.functionCalls()).to.be.undefined;
expect(enhancedResponse.inlineDataParts()).to.be.undefined;
});
it('good response functionCall', async () => {
const enhancedResponse = addHelpers(fakeResponseFunctionCall);
expect(enhancedResponse.text()).to.equal('');
expect(enhancedResponse.functionCalls()).to.deep.equal([
functionCallPart1.functionCall
]);
expect(enhancedResponse.inlineDataParts()).to.be.undefined;
});
it('good response functionCalls', async () => {
const enhancedResponse = addHelpers(fakeResponseFunctionCalls);
Expand All @@ -163,31 +204,54 @@ describe('response-helpers methods', () => {
functionCallPart1.functionCall,
functionCallPart2.functionCall
]);
expect(enhancedResponse.inlineDataParts()).to.be.undefined;
});
it('good response text/functionCall', async () => {
const enhancedResponse = addHelpers(fakeResponseMixed1);
expect(enhancedResponse.functionCalls()).to.deep.equal([
functionCallPart2.functionCall
]);
expect(enhancedResponse.text()).to.equal('some text');
expect(enhancedResponse.inlineDataParts()).to.be.undefined;
});
it('good response functionCall/text', async () => {
const enhancedResponse = addHelpers(fakeResponseMixed2);
expect(enhancedResponse.functionCalls()).to.deep.equal([
functionCallPart1.functionCall
]);
expect(enhancedResponse.text()).to.equal('some text');
expect(enhancedResponse.inlineDataParts()).to.be.undefined;
});
it('good response text/functionCall/text', async () => {
const enhancedResponse = addHelpers(fakeResponseMixed3);
expect(enhancedResponse.functionCalls()).to.deep.equal([
functionCallPart1.functionCall
]);
expect(enhancedResponse.text()).to.equal('some text and more text');
expect(enhancedResponse.inlineDataParts()).to.be.undefined;
});
it('bad response safety', async () => {
const enhancedResponse = addHelpers(badFakeResponse);
expect(enhancedResponse.text).to.throw('SAFETY');
expect(enhancedResponse.functionCalls).to.throw('SAFETY');
expect(enhancedResponse.inlineDataParts).to.throw('SAFETY');
});
it('good response inlineData', async () => {
const enhancedResponse = addHelpers(fakeResponseInlineData);
expect(enhancedResponse.text()).to.equal('');
expect(enhancedResponse.functionCalls()).to.be.undefined;
expect(enhancedResponse.inlineDataParts()).to.deep.equal([
inlineDataPart1,
inlineDataPart2
]);
});
it('good response text/inlineData', async () => {
const enhancedResponse = addHelpers(fakeResponseTextAndInlineData);
expect(enhancedResponse.text()).to.equal('Describe this:');
expect(enhancedResponse.functionCalls()).to.be.undefined;
expect(enhancedResponse.inlineDataParts()).to.deep.equal([
inlineDataPart1
]);
});
});
describe('getBlockString', () => {
Expand Down
60 changes: 60 additions & 0 deletions packages/vertexai/src/requests/response-helpers.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import {
GenerateContentResponse,
ImagenGCSImage,
ImagenInlineImage,
InlineDataPart,
VertexAIErrorCode
} from '../types';
import { VertexAIError } from '../errors';
Expand Down Expand Up @@ -89,6 +90,40 @@ export function addHelpers(
}
return '';
};
(response as EnhancedGenerateContentResponse).inlineDataParts = ():
| InlineDataPart[]
| undefined => {
if (response.candidates && response.candidates.length > 0) {
if (response.candidates.length > 1) {
logger.warn(
`This response had ${response.candidates.length} ` +
`candidates. Returning data from the first candidate only. ` +
`Access response.candidates directly to use the other candidates.`
);
}
if (hadBadFinishReason(response.candidates[0])) {
throw new VertexAIError(
VertexAIErrorCode.RESPONSE_ERROR,
`Response error: ${formatBlockErrorMessage(
response
)}. Response body stored in error.response`,
{
response
}
);
}
return getInlineDataParts(response);
} else if (response.promptFeedback) {
throw new VertexAIError(
VertexAIErrorCode.RESPONSE_ERROR,
`Data not available. ${formatBlockErrorMessage(response)}`,
{
response
}
);
}
return undefined;
};
(response as EnhancedGenerateContentResponse).functionCalls = () => {
if (response.candidates && response.candidates.length > 0) {
if (response.candidates.length > 1) {
Expand Down Expand Up @@ -164,6 +199,31 @@ export function getFunctionCalls(
}
}

/**
* Returns {@link InlineDataPart}s in the first candidate if present.
*
* @internal
*/
export function getInlineDataParts(
response: GenerateContentResponse
): InlineDataPart[] | undefined {
const data: InlineDataPart[] = [];

if (response.candidates?.[0].content?.parts) {
for (const part of response.candidates?.[0].content?.parts) {
if (part.inlineData) {
data.push(part);
}
}
}

if (data.length > 0) {
return data;
} else {
return undefined;
}
}

const badFinishReasons = [FinishReason.RECITATION, FinishReason.SAFETY];

function hadBadFinishReason(candidate: GenerateContentCandidate): boolean {
Expand Down
26 changes: 26 additions & 0 deletions packages/vertexai/src/types/enums.ts
Original file line number Diff line number Diff line change
Expand Up @@ -240,3 +240,29 @@ export enum Modality {
*/
DOCUMENT = 'DOCUMENT'
}

/**
* Generation modalities to be returned in generation responses.
*
* @beta
*/
export const ResponseModality = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the code object we agreed we should be exporting instead of TS enums so I get that, but we've had build issues in the past mixing JS code in types files so we should probably put these in a separate file. Looks like it's not causing build issues now so maybe we can move it along with the others whenever we plan to convert all our enums to JS objects.

/**
* Text.
* @beta
*/
TEXT: 'TEXT',
/**
* Image.
* @beta
*/
IMAGE: 'IMAGE'
} as const;

/**
* Generation modalities to be returned in generation responses.
*
* @beta
*/
export type ResponseModality =
(typeof ResponseModality)[keyof typeof ResponseModality];
Loading
Loading