Ai Innovation Microsoft Technology Microsoft Unveils New Inference Framework for 1-Bit Large Language Models Nebula NerdOctober 21, 2024015 views Microsoft Unveils New Inference Framework for 1-Bit Large Language Models Microsoft has recently launched a cutting-edge inference framework specifically designed to optimize the performance of 1-bit large language models (LLMs) like the BitNet b1.58 on local devices. This innovative framework enhances the speed and efficiency of inference processes, enabling lossless inference operations on CPUs. Moreover, Microsoft has announced plans to expand this support to include NPUs and GPUs in the near future. The introduction of this framework marks a significant advancement in reducing energy consumption while simultaneously boosting processing speeds. It is now feasible to operate a 100B model on a single CPU, achieving processing speeds that rival the pace of human reading. This development opens up new possibilities for running complex language models more sustainably and efficiently on a wider range of devices.