Intel demonstra otimizações PyTorch AI para acelerar grandes modelos de linguagem em suas GPUs Arc Alchemist

Intel’s Arc Alchemist GPUs can run large language models like Llama 2, thanks to the company’s PyTorch extension, as demonstrated in a recent blog post. The Intel PyTorch extension, which works on Windows and Linux, allows LLMs to take advantage of FP16 performance on Arc GPUs. However, since Intel states that you will need 14 GB of VRAM to use Llama 2 on Intel hardware, this means you will likely want a 16 GB Arc A770 card.

PyTorch is an open-source framework developed by Meta for machine learning that can then be used to work on LLMs. While this software works out of the box, it is not coded by default to fully leverage every piece of hardware, which is why Intel has its PyTorch Extension. This software is designed to take advantage of the XMX cores within Arc GPUs and had its first release in January 2023. Similarly, AMD and Nvidia have PyTorch optimizations for optimization purposes.

In their blog post, Intel showcases the performance capabilities of the Arc A770 16GB on Llama 2 using the latest update of Intel’s PyTorch extension, released in December and specifically optimized for FP16 performance. FP16, or half-precision floating-point data, trades precision for performance, which is often a good trade-off for AI workloads.

The demonstration shows Llama 2 and Llama 2-Chat LLMs focusing on dialogue, asking questions like “can deep learning have such generalizing capacity as humans?” In response, the LLM was surprisingly humble and said that deep learning was not on the same level as human intelligence. However, to run LLMs like Llama 2 with FP16 accuracy, you will need 14 GB of VRAM according to Intel, and we also did not get numbers on how quickly it responded to inputs and queries.

Although this demonstration only shows FP16 performance, Arc Alchemist also features BF16, INT8, INT4, and INT2 capabilities. Of these other data formats, BF16 is particularly noteworthy as it is often considered even better for AI workloads due to its wider numerical range, which is on par with FP32 with eight bits, while FP16 has only five. Optimizing BF16 performance may be at the top of Intel’s list for their next PyTorch extension update.

Intel demonstra otimizações PyTorch AI para acelerar grandes modelos de linguagem em suas GPUs Arc Alchemist

Links rápidos