Best Local LLMs for Writing & Creative Work
๐ More on this topic: Best Models for Coding ยท Best Models Under 3B ยท VRAM Requirements
Cloud AI writes well, but it reads everything you write. Your novel drafts, journal entries, client work, half-formed ideas โ all stored on someone else’s servers. Local models let you write, brainstorm, edit, and experiment without sending a single word to the cloud.
The catch: not every local model writes well. Some produce generic, stilted prose. Others refuse to write conflict, romance, or anything remotely dark. And the difference between a 7B and a 32B model for writing quality is enormous โ far bigger than for coding or Q&A tasks.
This guide covers which models actually produce good writing, organized by what you want to write and what hardware you have.
What Makes a Good Writing Model?
Writing is one of the hardest tasks for local LLMs. Unlike coding (where output is correct or isn’t) or Q&A (where facts are verifiable), good writing requires:
- Coherence over long passages โ maintaining tone, character, and narrative threads
- Stylistic range โ matching different voices, genres, and registers
- Instruction following โ doing what you ask without drifting
- Not being boring โ avoiding the same safe, generic, corporate-sounding prose
Bigger models are significantly better at all of these. The quality jump from 14B to 32B for writing is more dramatic than for almost any other task. If you can run 32B, do it.
Best Models by VRAM Tier
| VRAM | Model | Quant | Best For | Quality |
|---|---|---|---|---|
| 8 GB | Nous Hermes 3 8B | Q4_K_M | Fiction, creative RP | Good for size |
| 8 GB | Llama 3.1 8B Instruct | Q4_K_M | Blog posts, structured content | Solid all-around |
| 8 GB | Mistral 7B Instruct | Q4_K_M | Quick drafts, brainstorming | Fast, serviceable |
| 12 GB | MN Violet Lotus 12B | Q4_K_M | Character-driven fiction | Strong emotional intelligence |
| 12 GB | Mistral Nemo 12B | Q4_K_M | Blog posts, editing | Clean, structured output |
| 16 GB | Qwen 2.5 14B | Q6_K | Articles, SEO content, editing | Very good |
| 16 GB | Qwen3-14B | Q4_K_M | Balanced creative + factual | Best value mid-range |
| 24 GB | Qwen 2.5 32B | Q4_K_M | Fiction, long-form, editing | Excellent โ the sweet spot |
| 24 GB | DeepSeek-R1-Distill-Qwen-32B | Q4_K_M | Plotted fiction, complex narrative | Great reasoning + prose |
| 24 GB | Mistral Small 24B | Q4_K_M | Nonfiction, editing, rewriting | Reliable, structured |
| 48 GB+ | Midnight Miqu 70B v1.5 | Q4_K_M | Literary fiction, prose quality | Best local prose available |
| 48 GB+ | Llama 3.3 70B Euryale v2.3 | Q4_K_M | Immersive storytelling | Vivid, descriptive |
The jump that matters: 32B is where writing quality shifts from “useful assistant” to “genuinely good collaborator.” If you’re serious about using local AI for writing, a 24GB GPU running Qwen 2.5 32B is the target.
โ Check what fits your hardware with our Planning Tool.
Best for Fiction & Creative Writing
Fiction is the hardest test for a language model. It needs to maintain character voice, pace a scene, build tension, and produce prose that doesn’t sound like a corporate memo.
Top picks:
| Tier | Model | Why |
|---|---|---|
| Best overall (if hardware allows) | Midnight Miqu 70B v1.5 | Community’s top pick for prose quality. “Writes like a novelist.” Understands subtext, pacing, and tone in ways other models don’t. |
| Best on 24GB | Qwen 2.5 32B | Strong coherence, follows style instructions, good at sustained narrative. Detailed and contextually aware. |
| Best on 24GB (plotted work) | DeepSeek-R1-Distill-Qwen-32B | The reasoning capability helps with complex plotting and maintaining story logic. Can over-think simple scenes though. |
| Best on 12GB | MN Violet Lotus 12B | A merge of Violet Twilight and Lumimaid. High emotional intelligence โ maintains character motivations and feelings across long conversations. |
| Best on 8GB | Nous Hermes 3 8B | Coherent long-form output, maintains character consistency. Best fiction model at this size. |
What to expect by size:
- 7-8B: Generates readable prose but tends to rush scenes, repeat phrases, and lose track of story details after a few thousand tokens.
- 14B: Noticeably better coherence. ~30% quality improvement over 8B for long-form text per community testing. Can maintain a scene but may struggle with complex multi-character interactions.
- 32B: Where fiction gets genuinely good. Models understand subtext, can maintain narrative threads across chapters, and produce prose with actual stylistic variety.
- 70B: The premium tier. Natural pacing, subtlety, sustained coherence. This is where the top community models (Midnight Miqu, Euryale) live.
Best for Blog Posts & Articles
Blog writing is more structured than fiction โ you need clear sections, factual tone, and consistent formatting. The good news: smaller models handle this well because the structure does a lot of the heavy lifting.
Top picks:
| VRAM | Model | Why |
|---|---|---|
| 8 GB | Llama 3.1 8B | Clean, controllable prose. Good at following outline structures. |
| 16 GB | Qwen 2.5 14B | Detailed, contextually complete answers. Strong instruction following. |
| 24 GB | Qwen 2.5 32B | Best local option for researchy, detailed articles. |
| 24 GB | Mistral Small 24B | Reliable, well-structured nonfiction. Fast. |
Tips for blog writing with local models:
- Provide an outline in the prompt โ models follow structure much better than they invent it
- Generate section by section, not the entire article at once
- Use a system prompt that specifies tone and audience: “Write in a direct, practical tone for technically literate readers. No filler phrases.”
Best for Editing & Rewriting
Editing is harder than generating. The model needs to understand your intent, preserve what’s good, and improve what isn’t โ without rewriting everything in its own voice.
Top picks:
| VRAM | Model | Why |
|---|---|---|
| 8 GB | Phi-4 (14B) | Excels at text tasks โ rewriting, summarization, rephrasing. Fits at Q4. |
| 16 GB | Qwen 2.5 14B | Strong instruction following. Does what you ask without going rogue. |
| 24 GB | Mistral Small 24B | Good at targeted edits. Fast iteration. |
| 24 GB | Qwen 2.5 32B | Best at complex editing tasks that require understanding context. |
Key: For editing, instruction-following matters more than raw creativity. You want a model that can execute “rewrite this paragraph to be more concise while keeping the technical details” without deciding to restructure your entire article. Qwen models excel at this.
Best for Brainstorming & Outlining
Speed matters more than quality here. You want fast idea generation, not polished prose.
Any 7-8B model works well for brainstorming. Run it at Q4_K_M on 8GB VRAM and you’ll get 30-40+ tok/s โ fast enough for real-time conversation.
Good picks: Llama 3.1 8B, Mistral 7B Instruct, Qwen 2.5 7B. Don’t waste 24GB of VRAM on brainstorming.
The Censorship Problem
You’re writing a thriller. A character picks up a knife. The model refuses to continue because “violence.”
This is the biggest frustration with local writing models. Default instruct models have safety filters that trigger on violence, romance, dark themes, morally complex characters, and sometimes even mild conflict. For fiction writing, this is crippling.
The Solution: Abliterated Models
Abliteration is the current standard for uncensored local models. Instead of retraining on “edgy” data (which degrades model intelligence), abliteration surgically removes the refusal mechanism from the model’s weights. The result: same intelligence, no safety refusals.
Recommended uncensored models for writing:
| Model | Size | VRAM | Notes |
|---|---|---|---|
| Eva Qwen 2.5 | 7B-72B | 8-48GB | Uncensored Qwen variants. Good across sizes. |
| Dolphin 3.0 | 8B-70B | 8-48GB | Strong conversational flow and instruction following. |
| Nous Hermes 3 | 8B | ~8 GB | Creative writing focused. Coherent long-form. |
| Mistral Small 3.1 (abliterated) | 24B | ~12 GB | Budget uncensored option. |
| Llama 3.3 70B (abliterated) | 70B | ~40 GB | Full power, no filters. |
| Midnight Miqu 70B v1.5 | 70B | ~40 GB | Already uncensored. Best prose quality. |
Search HuggingFace for “abliterated” + your preferred model name. Most popular models have community-made abliterated variants.
Tradeoff: Abliterated models occasionally produce lower-quality output on non-creative tasks compared to their filtered counterparts. Use the filtered version for factual work, abliterated for fiction.
System Prompts That Actually Help
The right system prompt makes a measurable difference in writing quality. Here are three that work:
For fiction writing:
You are an experienced literary fiction author. Write vivid, emotionally
engaging prose with natural dialogue. Show, don't tell. Focus on sensory
details and character psychology. Avoid these words: tapestry, delve,
testament, beacon, journey, realm. Avoid single-sentence paragraphs.
Do not summarize emotions โ show them through action and dialogue.
Do not rush to resolution. Build scenes gradually.
For blog/article writing:
You are a technical writer who explains complex topics clearly. Write
in a direct, practical tone. Lead with the useful information, not
background context. Use short paragraphs. No filler phrases like
"it's worth noting" or "in today's landscape." Be specific โ use
numbers, examples, and concrete details instead of vague claims.
For editing:
You are a careful editor. When asked to edit text, preserve the
author's voice and intent. Only change what is specifically requested.
Do not add new content unless asked. Do not restructure unless asked.
Explain each change you make and why.
Critical tip: Repeat your most important instructions. Models deprioritize instructions over long conversations. If “show, don’t tell” matters, say it in the system prompt and again in your scene prompt.
Context Length and Long-Form Writing
Models advertise 128K context windows, but writing quality degrades well before that. Research consistently shows performance drops start at 8K-16K tokens, even when models can technically handle more.
Practical limits for writing:
| Context | Pages | What Works | What Breaks |
|---|---|---|---|
| 2-4K tokens | 3-6 pages | Single scenes, dialogue | โ |
| 8-16K tokens | 12-24 pages | Chapters, short stories | Character details start drifting |
| 16-32K tokens | 24-49 pages | Multi-chapter with summaries | Tone inconsistency, repetition |
| 32K+ tokens | 49+ pages | Possible but quality suffers | Narrative coherence degrades |
The practical approach: Write chapter by chapter. Keep a running summary of characters, plot points, and tone in the system prompt. Feed the summary + current chapter into context rather than trying to keep the entire manuscript loaded. Most professional AI-assisted writers cap generation at 800-1200 words per turn for best quality.
Set num_ctx appropriately โ don’t rely on defaults. On constrained VRAM, KV cache quantization (Q8) can double your usable context with minimal quality loss.
Hardware Quick Reference
| Writing Goal | Minimum VRAM | Recommended GPU | Model to Run |
|---|---|---|---|
| Brainstorming, outlining | 8 GB | RTX 3060 8GB, RX 6600 | Any 7-8B |
| Blog posts, articles | 12-16 GB | RTX 3060 12GB | Qwen 2.5 14B |
| Serious fiction | 24 GB | RTX 3090 (used, ~$750) | Qwen 2.5 32B |
| Best local prose | 40-48 GB | Dual 3090 or Mac Studio | Midnight Miqu 70B |
| CPU-only option | 16-32 GB RAM | โ | 7-14B at Q4 |
The Bottom Line
Local models are genuinely useful for writing now โ not just as gimmicks, but as real tools. The key is matching the right model to the right task:
- Brainstorming: Any 7-8B model. Speed over quality.
- Blog posts: Qwen 2.5 14B on 16GB VRAM. Structured, reliable.
- Fiction: Qwen 2.5 32B on 24GB VRAM. The sweet spot where prose gets good.
- Best prose: Midnight Miqu 70B if you have the hardware. Nothing local comes close.
- Uncensored: Look for abliterated variants of whatever model you’re already using.
Don’t fight safety filters with clever prompting โ switch to an abliterated model. Don’t try to generate entire novels in one shot โ work chapter by chapter. And if your writing feels generic, go bigger. The 14B-to-32B jump is where local AI stops sounding like an AI.