

VRAM vs RAM:
VRAM (Video RAM): Dedicated memory on your graphics card/GPU - Used specifically for graphics processing and AI model computations - Much faster for GPU operations - Critical for running LLMs locally
RAM (System Memory): Main system memory used by CPU and general operations - Slower access for GPU computations - Can be used as fallback but with performance penalty
So - For basic 7B parameter LLMs locally, you typically need:
Minimum: 8-12 GB VRAM - Can run basic inference/tasks - May require quantization (4-bit/8-bit)
Recommended: 16+ GB VRAM - Smoother performance - Handle larger context windows - Run without heavy quantization
Quantization means reducing the precision of the model’s weights and calculations to use less memory. For example, instead of storing numbers with full 32-bit precision, they’re compressed to 4-bit or 8-bit representations. This significantly reduces VRAM requirements but can slightly reduce model quality and accuracy.
Options if you have less VRAM: CPU-only inference (very slow) - Model offloading to system RAM - Use smaller models (3B, 4B parameters)
Bash is everywhere—embedded systems, minimal containers, legacy servers. Zero dependencies, just #!/bin/bash. In constrained environments, you often can’t install “proper” languages. It’s often the right and only tool for the job.