Skip to content

Commit 910c12d

Browse files
authored
Fine-Tuning Scheduler Tutorial Update for Lightning/PyTorch 2.2.0 (#298)
1 parent 684b63b commit 910c12d

File tree

1 file changed

+6
-9
lines changed

1 file changed

+6
-9
lines changed

lightning_examples/finetuning-scheduler/finetuning-scheduler.py

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -554,9 +554,7 @@ def train() -> None:
554554
# the implicit schedule will limit fine-tuning to just the last 4 parameters of the model, which is only a small fraction
555555
# of the parameters you'd want to tune for maximum performance. Since the implicit schedule is quite computationally
556556
# intensive and most useful for exploring model behavior, leaving [max_depth](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html?highlight=max_depth#finetuning_scheduler.fts.FinetuningScheduler.params.max_depth) 1 allows us to demo implicit mode
557-
# behavior while keeping the computational cost and runtime of this notebook reasonable. To review how a full implicit
558-
# mode run compares to the ``nofts_baseline`` and ``fts_explicit`` scenarios, please see the the following
559-
# [tensorboard experiment summary](https://tensorboard.dev/experiment/n7U8XhrzRbmvVzC4SQSpWw/).
557+
# behavior while keeping the computational cost and runtime of this notebook reasonable.
560558

561559

562560
# %%
@@ -579,16 +577,15 @@ def train() -> None:
579577
# %% [markdown]
580578
# ### Reviewing the Training Results
581579
#
582-
# See the [tensorboard experiment summaries](https://tensorboard.dev/experiment/n7U8XhrzRbmvVzC4SQSpWw/) to get a sense
583-
# of the relative computational and performance tradeoffs associated with these [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) configurations.
584-
# The summary compares a full ``fts_implicit`` execution to ``fts_explicit`` and ``nofts_baseline`` scenarios using DDP
580+
# It's worth considering the relative computational and performance tradeoffs associated with different [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) configurations.
581+
# The example below compares ``fts_implicit`` execution to ``fts_explicit`` and ``nofts_baseline`` scenarios using DDP
585582
# training with 2 GPUs. The full logs/schedules for all three scenarios are available
586583
# [here](https://drive.google.com/file/d/1LrUcisRLHeJgh_BDOOD_GUBPp5iHAkoR/view?usp=sharing) and the checkpoints
587584
# produced in the scenarios [here](https://drive.google.com/file/d/1t7myBgcqcZ9ax_IT9QVk-vFH_l_o5UXB/view?usp=sharing)
588585
# (caution, ~3.5GB).
589586
#
590-
# [![fts_explicit_accuracy](fts_explicit_accuracy.png){height="315px" width="492px"}](https://tensorboard.dev/experiment/n7U8XhrzRbmvVzC4SQSpWw/#scalars&_smoothingWeight=0&runSelectionState=eyJmdHNfZXhwbGljaXQiOnRydWUsIm5vZnRzX2Jhc2VsaW5lIjpmYWxzZSwiZnRzX2ltcGxpY2l0IjpmYWxzZX0%3D)
591-
# [![nofts_baseline](nofts_baseline_accuracy.png){height="316px" width="505px"}](https://tensorboard.dev/experiment/n7U8XhrzRbmvVzC4SQSpWw/#scalars&_smoothingWeight=0&runSelectionState=eyJmdHNfZXhwbGljaXQiOmZhbHNlLCJub2Z0c19iYXNlbGluZSI6dHJ1ZSwiZnRzX2ltcGxpY2l0IjpmYWxzZX0%3D)
587+
# ![fts_explicit_accuracy](fts_explicit_accuracy.png){height="315px" width="492px"}
588+
# ![nofts_baseline](nofts_baseline_accuracy.png){height="316px" width="505px"}
592589
#
593590
# Note that given execution context differences, there could be a modest variation in performance from the tensorboard summaries generated by this notebook.
594591
#
@@ -597,7 +594,7 @@ def train() -> None:
597594
# greater fine-tuning flexibility for model exploration in research. For example, glancing at DeBERTa-v3's implicit training
598595
# run, a critical tuning transition point is immediately apparent:
599596
#
600-
# [![implicit_training_transition](implicit_training_transition.png){height="272px" width="494px"}](https://tensorboard.dev/experiment/n7U8XhrzRbmvVzC4SQSpWw/#scalars&_smoothingWeight=0&runSelectionState=eyJmdHNfZXhwbGljaXQiOmZhbHNlLCJub2Z0c19iYXNlbGluZSI6ZmFsc2UsImZ0c19pbXBsaWNpdCI6dHJ1ZX0%3D)
597+
# ![implicit_training_transition](implicit_training_transition.png){height="272px" width="494px"}
601598
#
602599
# Our `val_loss` begins a precipitous decline at step 3119 which corresponds to phase 17 in the schedule. Referring to our
603600
# schedule, in phase 17 we're beginning tuning the attention parameters of our 10th encoder layer (of 11). Interesting!

0 commit comments

Comments
 (0)