Show HN: 80% faster, 50% less memory, 0% loss of accuracy Llama finetuning https://ift.tt/h2tTFwO

December 01, 2023

Show HN: 80% faster, 50% less memory, 0% loss of accuracy Llama finetuning https://ift.tt/h2tTFwO

Show HN: 80% faster, 50% less memory, 0% loss of accuracy Llama finetuning Hi HN! I'm just sharing a project I've been working on during the LLM Efficiency Challenge - you can now finetune Llama with QLoRA 5x faster than Huggingface's original implementation on your own local GPU. Some highlights: 1. Manual autograd engine - hand derived backprop steps. 2. QLoRA / LoRA 80% faster, 50% less memory. 3. All kernels written in OpenAI's Triton language. 4. 0% loss in accuracy - no approximation methods - all exact. 5. No change of hardware necessary. Supports NVIDIA GPUs since 2018+. CUDA 7.5+. 6. Flash Attention support via Xformers. 7. Supports 4bit and 16bit LoRA finetuning. 8. Train Slim Orca fully locally in 260 hours from 1301 hours (5x faster). 9. Open source version trains 5x faster or you can check out Unsloth Pro and Max codepaths for 30x faster training! https://ift.tt/6giyAbw... has more info about Unsloth! Hopefully you can try it out! Wrote a blog post at https://ift.tt/LyO0awE if you want to learn more about our manual hand derived backprop or Triton kernels and stuff! Thanks once again! https://ift.tt/ptZQufH December 1, 2023 at 08:12PM

Search This Blog

Hd mp4, Hollywood DVDRip Latest movies Bollywood Dual Audio,

Show HN: 80% faster, 50% less memory, 0% loss of accuracy Llama finetuning https://ift.tt/h2tTFwO

Comments

Post a Comment

Popular Posts

Show HN: Computer Engineering for Babies (Book) https://t.co/JVBVS9tf7y Show HN: Computer Engineering for Babies (Book) https://t.co/flag31aVvy August 31, 2021 at 12:32AM https://t.co/rQFjtIJb9c

Show HN: Prompteus – Visual workflow builder for shipping better AI features https://ift.tt/G0cQ649