ExpertFine-tuning
Quantize a 70B model for a single 4090
$950.00held in escrow
The brief
Quantize our 70B base to run on a single 24GB 4090. Target >40 tok/s at acceptable quality (perplexity within +5%). Ship an inference server (vLLM/llama.cpp) + the quant recipe.