Check if your device can run different scales of DeepSeek models
VRAM ≈ Parameters × Precision (Bytes) + Activation Memory
Example: 7B FP16 Model ≈ 7B × 2 Bytes = 14 GB (without activation)
Actual deployment needs 20% buffer (e.g., 7B FP16 needs 18GB+ VRAM)
RAM needs to load model weights (≈VRAM) and runtime data, recommend 1.5x VRAM
8-bit quantization halves VRAM usage, 4-bit halves again, with 1-3% accuracy loss
Ollama uses 4-bit quantization by default, VRAM usage is about 1/4 of FP16
ollama run deepseek-r1:1.5b
ollama run deepseek-r1:7b
ollama run deepseek-r1:8b
ollama run deepseek-r1:14b
ollama run deepseek-r1:32b
ollama run deepseek-r1:70b
If you're interested in AI tools, welcome to join our ThinkInAI community. Here you can:
Scan to join ThinkInAI