CS-fan-101

CS-fan-101 OP t1_jd649ko wrote

I wouldn't call it a workaround but rather an advantage.

Neural network models are made up of layers of neurons and connections between them. When there are missing connections, represented as zeros in the weight matrices, we refer to the model as sparse.

Sparsity comes in different forms. It is common for sparsity to occur naturally in the model structure itself if the pattern of connections is designed to only connect a subset of the neurons. Often, models are constructed this way intentionally with a predefined pattern and we refer to this as structured sparsity.

It turns out that even fully dense models, such as GPT, can be made sparse by inducing unstructured sparsity. In this form of sparsity, certain weights are set to zero, which effectively prunes the connections within the model. When the pruning is done without a fixed pattern, we refer to this as unstructured sparsity.

A key benefit of unstructured sparsity is the model retains the original baseline structure, without the need to create a new model architecture. Additionally, the sparse model can provide speedup in both training and inference.

The Cerebras CS-2 is designed to accelerate unstructured sparsity, whereas GPUs are not.

If you are interested in learning more, please check out our blog - https://www.cerebras.net/blog/harnessing-the-power-of-sparsity-for-large-gpt-ai-models

8