Skip to content

fix: Add V100 (older) GPU Support for Mistral 7b Models #1279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

xihajun
Copy link

@xihajun xihajun commented Nov 23, 2023

What does this PR do?

Introduction

This PR introduces changes that significantly improve support for running Mistral 7b models on V100 GPU architectures. By update the latest transformers package, we ensure both compatibility and performance for users with V100 GPUs.

Changes

  • Altered the model instantiation logic in the scenario where the model type is 'mistral'.
  • In environments where FLASH_ATTENTION is not available, the system defaults to using CausalLM. This ensures that Mistral models remain operational even without Flash Attention support.

Impact

These updates are particularly beneficial for users with V100 GPU architectures, which previously faced compatibility issues due to flash-attention package limitations. With these changes, we can expand our hardware support, reduce barriers to entry, and streamline the user experience for a significant segment of our user base.

Additional Notes

The transformers package has been updated to version 4.35.2, which now includes support for Mistral models. This update is essential since the flash-attention package does not support V100 GPUs, as discussed in Dao-AILab/flash-attention#148.

Testing

  • Comprehensive tests have been conducted to ensure that Mistral models perform as expected on V100 GPUs.
  • Additional benchmarks have been added to compare the performance on V100 GPUs with previous versions.

I invite reviewers to pull this branch, test rigorously with V100 GPUs, and provide feedback on any aspects of this enhancement.

Before submitting

Who can review?

@abhijithnair1

xihajun and others added 5 commits November 23, 2023 16:34
Modified the instantiation logic for the 'mistral' model type to support V100 GPU architectures. If FLASH_ATTENTION is not available, the code falls back to the generic CausalLM, ensuring functionality regardless of the underlying hardware. This change circumvents the incompatibility issue with the flash-attention package on V100 GPUs.
@xihajun xihajun changed the title Enhance V100 (older) GPU Support for Mistral 7b Models fix: Add V100 (older) GPU Support for Mistral 7b Models Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant