Open
Description
Description of the feature request:
Is there any documentation explaining how Gimini preprocesses input images or videos before generating tokens? For instance, how does it crop images of arbitrary resolutions, or how does it sample frames from videos of arbitrary lengths?
What problem are you trying to solve with this feature?
No response
Any other information you'd like to share?
No response