Show HN: The fastest way to run Mixtral 8x7B on Apple Silicon Macs https://ift.tt/EYeCgvp

April 08, 2024

Show HN: The fastest way to run Mixtral 8x7B on Apple Silicon Macs https://ift.tt/EYeCgvp

Show HN: The fastest way to run Mixtral 8x7B on Apple Silicon Macs I’d originally launched my app: Private LLM[1][2] on HN around 10 months ago, with a single RedPajama Chat 3B model. The app has come a long way since then. About a month ago, I added support for 4-bit OmniQuant quantized Mixtral 8x7B Instruct model, and it seems to outperform Q4 models at inference speed and Q8 models at text generation quality, while consuming only about 24GB of RAM[3] at 8k context length. The trick is: a) to use a better quantization algorithm and b) to use unquantized embeddings and the MoE gates (the overhead is quite small). Other notable features include many more downloadable models, support for App Intents (Siri, Apple Shortcuts), on-device grammar correction, summarization etc with macOS services and an iOS version (universal app), also with many smaller downloadable models and support for App Intents. There's a small community of users building and sharing LLM based shortcuts on the App's discord. Last week, I also shipped support for the bilingual Yi-34B Chat model, which consumes ~18GB of RAM. iOS users and users with low memory Macs can download the related Yi-6B Chat model. Unlike most popular offline LLM apps out there, this app uses mlc-llm for inference and not llama.cpp. Also, all models in the app are quantized with OmniQuant[4] quantization and not RTN quantization. [1]: https://privatellm.app/ [2]: https://ift.tt/rR5OtSi [3]: https://www.youtube.com/watch?v=4AE8yXIWSAA [4]: https://ift.tt/fBPvcCO April 8, 2024 at 09:37PM

Search This Blog

Hd mp4, Hollywood DVDRip Latest movies Bollywood Dual Audio,

Show HN: The fastest way to run Mixtral 8x7B on Apple Silicon Macs https://ift.tt/EYeCgvp

Comments

Post a Comment

Popular Posts

Show HN: Computer Engineering for Babies (Book) https://t.co/JVBVS9tf7y Show HN: Computer Engineering for Babies (Book) https://t.co/flag31aVvy August 31, 2021 at 12:32AM https://t.co/rQFjtIJb9c

Show HN: Prompteus – Visual workflow builder for shipping better AI features https://ift.tt/G0cQ649