What hardware do I need to create a LLM?

I heard LLAMA was running on 6GiB?

Apr 18, 2023

Most people jumping into LLMs know that they are trained using massive data centers, often with expensive A100s which rock 40GiB of VRAM each. Getting into training is much more expensive than running, take a look at the lists below, which show most models require multiple A100s.

What factors into hardware requirements?

LLMs consume memory as a consequence of many factors, the primary of which are the number of parameters, the precision of the data type for storage and the framework used to train.

There are many different data types available today, if you are familiar with data types in Computer Science then FP32 and FP16 will be familiar to you. However due to the every increasing number of parameters, new types of Floating Point numbers have been developed. These focus on enabling a larger range of numbers compared to FP16, which was limited to 65,504, new types like TF32 can represent 3.4028235 × 10^38, while using slightly more memory.

The main ones used today are BFLOAT16 and Tensor Float 32, which reduce memory consumption dramatically.

So how much memory do I need to train?

Below you will find how much memory both VRAM and System Memory (RAM) you will need. For smaller models we will retain FP32 as it can be used, but it will not make sense for larger models. We will divide these into two categories, single-system models, and multi-system models. Single system models can theoretically be trained on a single computer.

Single System

LLAMA-6B

FP16
- 60GiB VRAM
- 32GiB RAM
FP32
- 80GiB VRAM
- 32GiB RAM
BFLOAT16
- 60GiB VRAM
- 32GiB RAM
TF32
- 70GiB VRAM
- 32GiB RAM

LLAMA-13B

FP16
- 121GiB VRAM
- 64GiB RAM
FP32
- 145GiB VRAM
- 64GiB RAM
BFLOAT16
- 121GiB VRAM
- 64GiB RAM
TF32
- 133GiB VRAM
- 64GiB RAM

LLAMA-33B

BFLOAT16
- 310GiB VRAM
- 256GiB RAM
TF32
- OOM on single system

GPT-NEO X 20B

BFLOAT16
- 205GIB VRAM
- 128GiB RAM
TF32
- 60GiB VRAM
- 128GiB RAM

Multi-System (VRAM > 320GiB)

LLAMA-65B

BFLOAT16
- 603GiB VRAM
- 512GiB RAM
TF32
- 666GiB VRAM
- 512GiB RAM

GPT3 (~175B)

BFLOAT16
- 1630GiB VRAM
TF32
- 1790GiB VRAM

Why are these numbers so much larger than what I saw on HN?

A lot of the hype about LLMs recently has been about getting them to run on consumer hardware. This is a completely different type of problem from training, you can use methods to shrink the amount of bits required to run the model such as GPTQ by compressing the model. You first need to train the model at B16 before you can run GPTQ to shrink it to 3-4bits instead of 16.