LoRA (Low-Rank Adaptation)
A highly efficient technique for fine-tuning large models without altering all of their original billions of parameters.
LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning (PEFT) technique that drastically reduces the computational cost of training Large Language Models.
Instead of recalculating and updating every single weight in a massive neural network during fine-tuning, LoRA freezes the original model weights and injects a small set of trainable "rank decomposition matrices" into the layers of the architecture.
This means you can fine-tune a massive model on consumer-grade hardware in hours instead of weeks, resulting in a tiny adapter file (often just a few megabytes) that can be swapped in and out over the massive base model.