# Model Optimization

Original deep learning model architectures tend to be large and very complex. In many cases,
smaller and simplified versions can be used. They do the job equally well but perform much
better - an 8-bit model will use a fraction of the memory required by an FP64 one. That
is why you should always use a model that is optimized for your use case.

To do so, use [Neural Network Compression Framework (NNCF)](https://docs.openvino.ai/nncf),
a collection of optimization algorithms that make your models smaller and faster. To learn
more about it, check out NNCF documentation and articles on:

:::{line-block}
[Quantization (no retraining)](https://docs.openvino.ai/2025/openvino-workflow/model-optimization-guide/quantizing-models-post-training.html)
     The easiest way to optimize a model, it does not require retraining or fine-tuning, it
     just reduces the model size. Going from an FP64-based model to a quantized INT8 one
     will greatly improve the file size, memory footprint, throughput and latency. It may
     result in a drop in accuracy, though, which is why you should check if this
     accuracy-performance tradeoff is acceptable.

[Weight Compression](https://docs.openvino.ai/2025/openvino-workflow/model-optimization-guide/weight-compression.html)
     An easy-to-use method targeting Large Language Models. It is a type of quantization that
     compresses only part of the model, its weights, not activations. It provides increased
     performance with relatively little impact on accuracy.

[Training-time Optimization](https://docs.openvino.ai/2025/openvino-workflow/model-optimization-guide/compressing-models-during-training.html)
     A more complex and time-consuming method involving multiple algorithms that are executed
     while the model is retrained. It also requires the use of the model's original framework,
     for NNCF, it is either PyTorch or TensorFlow. With features such as Structured or
     Unstructured Pruning and Quantization-aware Training, it will give you just the model
     that fits your needs, optimally balancing its performance and accuracy.
:::