Return

NOVA: Engineering Quality Discovery for Online Fiction

A ground-up platform rebuild treating creative writing with the rigor of software engineering.

NOVA Editor Interface

PROBLEM

The online fiction ecosystem optimizes for quantity over quality.

Platforms like Wattpad and WebNovel force writers into algorithmic servitude—daily uploads to survive engagement metrics, black-box discovery that buries craft under clickbait, and zero tooling for serious collaboration. Royal Road became a LitRPG monoculture. WuxiaWorld serves one niche brilliantly but leaves everyone else behind.

Worse, the technical infrastructure is archaic:

The core tension: Platforms prioritize metrics that drive engagement (clicks, time-on-site) over metrics that reflect craft (completion rates, structural quality, reader satisfaction).

SOLUTION

A semantic-first platform with Git-inspired workflows and transparent quality algorithms.

NOVA rebuilds online fiction publishing from first principles:

Real-Time Collaboration

Built on Operational Transformation (OT)—the same tech powering Google Docs—allowing multiple authors to edit simultaneously without conflicts. The system tracks changes via revision numbers and transforms concurrent edits to preserve both writers' intent.

Technical Implementation:

Semantic Discovery

Traditional platforms search keywords. NOVA searches meaning using 768-dimensional BERT embeddings:

Technical Stack:

Quality-First Ranking

A transparent composite algorithm measuring:

Structural Score (40%):

Engagement Score (40%):

Newness Boost (20%):

Cold Start Strategy: Week 1: 80% structural, 20% newness (craft gets you visibility) Week 2-4: Progressive transition to engagement weighting Week 5+: Full composite scoring

Branch-Based Storytelling

Git for fiction. Writers can:

Version Control Features:

MY ROLE

Solo full-stack engineer, designer, and technical architect.

I built NOVA from zero to closed beta across 18 months:

Backend Infrastructure:

Frontend Architecture:

NLP Pipeline:

DevOps:

THE HARD PART

1. Real-Time Collaboration at Scale

Challenge: Resolving concurrent edits without locking the editor or losing work.

Solution: Implemented central-server OT with revision-based conflict resolution. Each operation includes a baseRevision number. If the server has advanced, the operation is transformed against all intermediate operations before being applied.

The hardest part wasn't the transform function—it was diffing Lexical's tree structure. I built a custom differ that:

  1. Flattens the JSON tree to plain text
  2. Runs Myers' Diff Algorithm (same as git diff)
  3. Generates Retain/Insert/Delete operations
  4. Reconstructs the tree with preserved formatting

Performance: Sub-100ms round-trip for most operations, handles 10+ simultaneous editors.

2. Quality Scoring Without Gaming

Challenge: Build transparent metrics writers can understand while preventing manipulation.

Solution: Multi-signal scoring with hidden telemetry:

Public Metrics: Structural quality (lexical diversity, pacing, dialogue density) Private Metrics: Reader behavior patterns (scroll velocity, dwell time uniformity, interaction clustering) Anomaly Detection: Gini coefficient for engagement distribution, coefficient of variation for read times

Writers see their structural scores with actionable feedback. They don't see the anti-gaming layer detecting bot activity.

Result: 0.9+ suspicion score triggers weight reduction (not ban) to discourage abuse while preserving user experience.

3. Semantic Search Costs

Challenge: BERT embeddings are expensive. Running them on every save would bankrupt infrastructure.

Solution: Aggressive caching architecture:

TIER 1: In-memory HashMap (FNV-1a hash) → ~10μs lookup
TIER 2: Disk cache (bincode serialization) → ~500μs lookup
TIER 3: BERT forward pass (Candle + CUDA) → ~800ms cold

Optimization:

Impact: 10k-word chapter analysis: 800ms cached, 1.5s cold start.

4. Cross-Language Type Safety

Challenge: Maintaining type safety across TypeScript (frontend), Node.js (backend), and Rust (NLP).

Solution:

ARCHITECTURE

Data Flow for Chapter Save:

  1. User edits in Lexical → generates JSON operations
  2. WebSocket sends operation to server with baseRevision
  3. Server validates permissions and revision
  4. Operation persisted to PostgreSQL, cached in Redis
  5. Broadcasted to all connected clients
  6. BullMQ job queued for semantic analysis
  7. Rust NLP engine computes embeddings (cached)
  8. Qdrant vector DB updated for discovery

RESULTS

Technical Performance

Platform Metrics (Closed Beta)

Feature Completeness

Shipped (Wave 1):

In Development (Wave 2):

Design Principles

Every interface decision follows:

TECHNICAL INNOVATIONS

Semantic Analysis Pipeline

Rust NAPI Implementation:

Fiction-Specific Metrics:

Operational Transformation

Implementation Details:

Why OT over CRDTs?

Transparent Quality Algorithm

Unlike black-box competitors:

Anti-Gaming Measures:

BUSINESS MODEL

Subscription Tiers:

Revenue Splits:

What We'll Never Do:

LESSONS LEARNED

Technical Decisions

What Worked:

What I'd Change:

Product Insights

Community Feedback

WHAT'S NEXT

Wave 2 (Current):

Wave 3 (Q1 2026):

Wave 4 (Q3 2026):

Full roadmap: codex.novusatlas.org/blog/roadmap

KEY TAKEAWAYS

  1. Quality measurement is possible: Structural analysis + engagement signals create transparent rankings
  2. Real-time collaboration isn't optional: Writers expect Google Docs-level UX
  3. Semantic search changes discovery: Vector embeddings understand intent, not just keywords
  4. Transparency builds trust: Public algorithms and open roadmaps differentiate from competitors
  5. Infrastructure matters: Performance problems become UX problems at scale

OPEN QUESTIONS

LINKS

"NOVA exists because writers deserve platforms that treat them like artists, not content farms. Technology should serve creativity, not extract value from it."

— Founding Philosophy

Tech Stack Summary:

Performance:

Status: Closed Beta (Wave 1), Wave 2 in active development