๐Ÿ“š More on this topic: Best Models for Coding ยท Best Models Under 3B ยท VRAM Requirements

Cloud AI writes well, but it reads everything you write. Your novel drafts, journal entries, client work, half-formed ideas โ€” all stored on someone else’s servers. Local models let you write, brainstorm, edit, and experiment without sending a single word to the cloud.

The catch: not every local model writes well. Some produce generic, stilted prose. Others refuse to write conflict, romance, or anything remotely dark. And the difference between a 7B and a 32B model for writing quality is enormous โ€” far bigger than for coding or Q&A tasks.

This guide covers which models actually produce good writing, organized by what you want to write and what hardware you have.


What Makes a Good Writing Model?

Writing is one of the hardest tasks for local LLMs. Unlike coding (where output is correct or isn’t) or Q&A (where facts are verifiable), good writing requires:

  • Coherence over long passages โ€” maintaining tone, character, and narrative threads
  • Stylistic range โ€” matching different voices, genres, and registers
  • Instruction following โ€” doing what you ask without drifting
  • Not being boring โ€” avoiding the same safe, generic, corporate-sounding prose

Bigger models are significantly better at all of these. The quality jump from 14B to 32B for writing is more dramatic than for almost any other task. If you can run 32B, do it.


Best Models by VRAM Tier

VRAMModelQuantBest ForQuality
8 GBNous Hermes 3 8BQ4_K_MFiction, creative RPGood for size
8 GBLlama 3.1 8B InstructQ4_K_MBlog posts, structured contentSolid all-around
8 GBMistral 7B InstructQ4_K_MQuick drafts, brainstormingFast, serviceable
12 GBMN Violet Lotus 12BQ4_K_MCharacter-driven fictionStrong emotional intelligence
12 GBMistral Nemo 12BQ4_K_MBlog posts, editingClean, structured output
16 GBQwen 2.5 14BQ6_KArticles, SEO content, editingVery good
16 GBQwen3-14BQ4_K_MBalanced creative + factualBest value mid-range
24 GBQwen 2.5 32BQ4_K_MFiction, long-form, editingExcellent โ€” the sweet spot
24 GBDeepSeek-R1-Distill-Qwen-32BQ4_K_MPlotted fiction, complex narrativeGreat reasoning + prose
24 GBMistral Small 24BQ4_K_MNonfiction, editing, rewritingReliable, structured
48 GB+Midnight Miqu 70B v1.5Q4_K_MLiterary fiction, prose qualityBest local prose available
48 GB+Llama 3.3 70B Euryale v2.3Q4_K_MImmersive storytellingVivid, descriptive

The jump that matters: 32B is where writing quality shifts from “useful assistant” to “genuinely good collaborator.” If you’re serious about using local AI for writing, a 24GB GPU running Qwen 2.5 32B is the target.

โ†’ Check what fits your hardware with our Planning Tool.


Best for Fiction & Creative Writing

Fiction is the hardest test for a language model. It needs to maintain character voice, pace a scene, build tension, and produce prose that doesn’t sound like a corporate memo.

Top picks:

TierModelWhy
Best overall (if hardware allows)Midnight Miqu 70B v1.5Community’s top pick for prose quality. “Writes like a novelist.” Understands subtext, pacing, and tone in ways other models don’t.
Best on 24GBQwen 2.5 32BStrong coherence, follows style instructions, good at sustained narrative. Detailed and contextually aware.
Best on 24GB (plotted work)DeepSeek-R1-Distill-Qwen-32BThe reasoning capability helps with complex plotting and maintaining story logic. Can over-think simple scenes though.
Best on 12GBMN Violet Lotus 12BA merge of Violet Twilight and Lumimaid. High emotional intelligence โ€” maintains character motivations and feelings across long conversations.
Best on 8GBNous Hermes 3 8BCoherent long-form output, maintains character consistency. Best fiction model at this size.

What to expect by size:

  • 7-8B: Generates readable prose but tends to rush scenes, repeat phrases, and lose track of story details after a few thousand tokens.
  • 14B: Noticeably better coherence. ~30% quality improvement over 8B for long-form text per community testing. Can maintain a scene but may struggle with complex multi-character interactions.
  • 32B: Where fiction gets genuinely good. Models understand subtext, can maintain narrative threads across chapters, and produce prose with actual stylistic variety.
  • 70B: The premium tier. Natural pacing, subtlety, sustained coherence. This is where the top community models (Midnight Miqu, Euryale) live.

Best for Blog Posts & Articles

Blog writing is more structured than fiction โ€” you need clear sections, factual tone, and consistent formatting. The good news: smaller models handle this well because the structure does a lot of the heavy lifting.

Top picks:

VRAMModelWhy
8 GBLlama 3.1 8BClean, controllable prose. Good at following outline structures.
16 GBQwen 2.5 14BDetailed, contextually complete answers. Strong instruction following.
24 GBQwen 2.5 32BBest local option for researchy, detailed articles.
24 GBMistral Small 24BReliable, well-structured nonfiction. Fast.

Tips for blog writing with local models:

  • Provide an outline in the prompt โ€” models follow structure much better than they invent it
  • Generate section by section, not the entire article at once
  • Use a system prompt that specifies tone and audience: “Write in a direct, practical tone for technically literate readers. No filler phrases.”

Best for Editing & Rewriting

Editing is harder than generating. The model needs to understand your intent, preserve what’s good, and improve what isn’t โ€” without rewriting everything in its own voice.

Top picks:

VRAMModelWhy
8 GBPhi-4 (14B)Excels at text tasks โ€” rewriting, summarization, rephrasing. Fits at Q4.
16 GBQwen 2.5 14BStrong instruction following. Does what you ask without going rogue.
24 GBMistral Small 24BGood at targeted edits. Fast iteration.
24 GBQwen 2.5 32BBest at complex editing tasks that require understanding context.

Key: For editing, instruction-following matters more than raw creativity. You want a model that can execute “rewrite this paragraph to be more concise while keeping the technical details” without deciding to restructure your entire article. Qwen models excel at this.


Best for Brainstorming & Outlining

Speed matters more than quality here. You want fast idea generation, not polished prose.

Any 7-8B model works well for brainstorming. Run it at Q4_K_M on 8GB VRAM and you’ll get 30-40+ tok/s โ€” fast enough for real-time conversation.

Good picks: Llama 3.1 8B, Mistral 7B Instruct, Qwen 2.5 7B. Don’t waste 24GB of VRAM on brainstorming.


The Censorship Problem

You’re writing a thriller. A character picks up a knife. The model refuses to continue because “violence.”

This is the biggest frustration with local writing models. Default instruct models have safety filters that trigger on violence, romance, dark themes, morally complex characters, and sometimes even mild conflict. For fiction writing, this is crippling.

The Solution: Abliterated Models

Abliteration is the current standard for uncensored local models. Instead of retraining on “edgy” data (which degrades model intelligence), abliteration surgically removes the refusal mechanism from the model’s weights. The result: same intelligence, no safety refusals.

Recommended uncensored models for writing:

ModelSizeVRAMNotes
Eva Qwen 2.57B-72B8-48GBUncensored Qwen variants. Good across sizes.
Dolphin 3.08B-70B8-48GBStrong conversational flow and instruction following.
Nous Hermes 38B~8 GBCreative writing focused. Coherent long-form.
Mistral Small 3.1 (abliterated)24B~12 GBBudget uncensored option.
Llama 3.3 70B (abliterated)70B~40 GBFull power, no filters.
Midnight Miqu 70B v1.570B~40 GBAlready uncensored. Best prose quality.

Search HuggingFace for “abliterated” + your preferred model name. Most popular models have community-made abliterated variants.

Tradeoff: Abliterated models occasionally produce lower-quality output on non-creative tasks compared to their filtered counterparts. Use the filtered version for factual work, abliterated for fiction.


System Prompts That Actually Help

The right system prompt makes a measurable difference in writing quality. Here are three that work:

For fiction writing:

You are an experienced literary fiction author. Write vivid, emotionally
engaging prose with natural dialogue. Show, don't tell. Focus on sensory
details and character psychology. Avoid these words: tapestry, delve,
testament, beacon, journey, realm. Avoid single-sentence paragraphs.
Do not summarize emotions โ€” show them through action and dialogue.
Do not rush to resolution. Build scenes gradually.

For blog/article writing:

You are a technical writer who explains complex topics clearly. Write
in a direct, practical tone. Lead with the useful information, not
background context. Use short paragraphs. No filler phrases like
"it's worth noting" or "in today's landscape." Be specific โ€” use
numbers, examples, and concrete details instead of vague claims.

For editing:

You are a careful editor. When asked to edit text, preserve the
author's voice and intent. Only change what is specifically requested.
Do not add new content unless asked. Do not restructure unless asked.
Explain each change you make and why.

Critical tip: Repeat your most important instructions. Models deprioritize instructions over long conversations. If “show, don’t tell” matters, say it in the system prompt and again in your scene prompt.


Context Length and Long-Form Writing

Models advertise 128K context windows, but writing quality degrades well before that. Research consistently shows performance drops start at 8K-16K tokens, even when models can technically handle more.

Practical limits for writing:

ContextPagesWhat WorksWhat Breaks
2-4K tokens3-6 pagesSingle scenes, dialogueโ€”
8-16K tokens12-24 pagesChapters, short storiesCharacter details start drifting
16-32K tokens24-49 pagesMulti-chapter with summariesTone inconsistency, repetition
32K+ tokens49+ pagesPossible but quality suffersNarrative coherence degrades

The practical approach: Write chapter by chapter. Keep a running summary of characters, plot points, and tone in the system prompt. Feed the summary + current chapter into context rather than trying to keep the entire manuscript loaded. Most professional AI-assisted writers cap generation at 800-1200 words per turn for best quality.

Set num_ctx appropriately โ€” don’t rely on defaults. On constrained VRAM, KV cache quantization (Q8) can double your usable context with minimal quality loss.


Hardware Quick Reference

Writing GoalMinimum VRAMRecommended GPUModel to Run
Brainstorming, outlining8 GBRTX 3060 8GB, RX 6600Any 7-8B
Blog posts, articles12-16 GBRTX 3060 12GBQwen 2.5 14B
Serious fiction24 GBRTX 3090 (used, ~$750)Qwen 2.5 32B
Best local prose40-48 GBDual 3090 or Mac StudioMidnight Miqu 70B
CPU-only option16-32 GB RAMโ€”7-14B at Q4

The Bottom Line

Local models are genuinely useful for writing now โ€” not just as gimmicks, but as real tools. The key is matching the right model to the right task:

  • Brainstorming: Any 7-8B model. Speed over quality.
  • Blog posts: Qwen 2.5 14B on 16GB VRAM. Structured, reliable.
  • Fiction: Qwen 2.5 32B on 24GB VRAM. The sweet spot where prose gets good.
  • Best prose: Midnight Miqu 70B if you have the hardware. Nothing local comes close.
  • Uncensored: Look for abliterated variants of whatever model you’re already using.

Don’t fight safety filters with clever prompting โ€” switch to an abliterated model. Don’t try to generate entire novels in one shot โ€” work chapter by chapter. And if your writing feels generic, go bigger. The 14B-to-32B jump is where local AI stops sounding like an AI.