Model Compression Techniques

Nvidia says it can shrink LLM memory 20x without changing model weights

Nvidia's KV Cache Transform Coding (KVTC) compresses LLM key-value cache by 20x without model changes, cutting GPU memory costs and time-to-first-token by up to 8x for multi-turn AI applications.

The Next Web

Researchers claim this AI model achieves better compression rates than PNGs

Image compression has been one of the constantly evolving challenges in computer science. Programers and researchers are always trying to improve current standards or create new ones to get better ...

Nature

Deep Learning Network Compression Techniques

Deep learning network compression techniques have emerged as a crucial research area, aiming to reduce the computational and storage requirements of neural networks without significantly compromising ...

TechCrunch

Pruna AI open sources its AI model optimization framework

Pruna AI, a European startup that has been working on compression algorithms for AI models, is making its optimization framework open source on Thursday. Pruna AI has been creating a framework that ...

EurekAlert!

Beyond bigger models: How efficient multimodal AI is redefining the future of intelligence

A generalized architectural blueprint for building efficient MLLMs. This template achieves efficiency through a combination of component choices and data flow optimization. Key strategies include: (1) ...

Dark Reading

Intel Discloses Max Severity Bug in Its AI Model Compression Software

Intel has disclosed a maximum severity vulnerability in some versions of its Intel Neural Compressor software for AI model compression. The bug, designated as CVE-2024-22476, provides an ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results