# HeadlineSift — Database Migration Guide

## Current State

- **Provider:** SQLite
- **ORM:** Prisma (`@prisma/client` v5.x)
- **Database file:** `data/headlinesift.db`
- **Migrations:** `prisma/migrations/`

## Why Migrate to PostgreSQL?

SQLite works well for a single-server deployment with moderate traffic. Migrate to
PostgreSQL when you need:

- **Concurrent writes** — multiple workers fetching, clustering, and ranking
  simultaneously won't contend on the single-writer lock.
- **Connection pooling** — many concurrent dashboard readers.
- **Full-text search** — `tsvector` indexes on article titles and content.
- **Native JSONB** — query inside `affectedGroups`, `payload`, and `metadata`
  fields without parsing strings.
- **Native enums** — database-level constraint enforcement for status/type columns.
- **Point-in-time recovery** — WAL archiving, replication, managed backups.

---

## Step 1: Prepare the Schema for PostgreSQL

The existing `prisma/schema.prisma` is designed to be PostgreSQL-compatible.
Make these changes in order:

### 1a. Switch the datasource provider

```diff
datasource db {
-  provider = "sqlite"
+  provider = "postgresql"
   url      = env("DATABASE_URL")
}
```

### 1b. Enable native PostgreSQL enums (optional but recommended)

Add enum blocks and replace String fields with enum references:

```prisma
enum AdminRole {
  SUPER_ADMIN
  ADMIN
  EDITOR
  VIEWER
}

enum EntityStatus {
  ACTIVE
  INACTIVE
  SUSPENDED
}

enum SourceType {
  RSS
  API
  MANUAL
  SCRAPE
}

enum SourceStatus {
  ACTIVE
  INACTIVE
  ERROR
  RATE_LIMITED
}

enum FetchStatus {
  STARTED
  SUCCESS
  PARTIAL
  FAILED
}

enum CategoryLevel {
  GLOBAL
  COUNTRY
  BOTH
}

enum AiSafetyLevel {
  LOW
  MEDIUM
  HIGH
}

enum ArticleStatus {
  PENDING
  PROCESSED
  SKIPPED
  ERROR
}

enum StoryStatus {
  DRAFT
  NEEDS_REVIEW
  APPROVED
  PUBLISHED
  HIDDEN
  REJECTED
}

enum RelationType {
  EXACT_DUPLICATE
  SAME_STORY
  RELATED_STORY
  DIFFERENT_STORY
}

enum ImpactLevel {
  LOW
  MEDIUM
  HIGH
}

enum ConfidenceLevel {
  LOW
  MEDIUM
  HIGH
}

enum JobType {
  FETCH_SOURCE
  FETCH_ALL_SOURCES
  CLUSTER_ARTICLES
  RANK_STORIES
  GENERATE_AI_ANALYSIS
  PUBLISH_APPROVED_STORIES
  BACKUP_DATABASE
}

enum JobStatus {
  PENDING
  RUNNING
  COMPLETED
  FAILED
  CANCELLED
}

enum LogLevel {
  DEBUG
  INFO
  WARN
  ERROR
  FATAL
}
```

Then replace fields. Examples:

```diff
model AdminUser {
-  role   String   @default("EDITOR") // SUPER_ADMIN | ADMIN | EDITOR | VIEWER
+  role   AdminRole @default(EDITOR)

-  status String   @default("ACTIVE") // ACTIVE | INACTIVE | SUSPENDED
+  status EntityStatus @default(ACTIVE)
}
```

### 1c. Change JSON string fields to native JSONB

```diff
model AiStoryAnalysis {
-  affectedGroupsJson  String?
+  affectedGroups      Json?
}

model Job {
-  payloadJson  String?
+  payload      Json?
}

model SystemLog {
-  metadataJson String?
+  metadata     Json?
}
```

### 1d. Add PostgreSQL-specific optimizations

```prisma
// Full-text search vector (requires tsvector trigger or generated column)
model RawArticle {
  // ... existing fields ...
  searchVector Unsupported("tsvector")? // Managed by DB trigger
}

// Use @db.VarChar(n) for bounded strings (performance win)
model Country {
  code String @unique @db.VarChar(2)
}
```

---

## Step 2: Set Up PostgreSQL

### 2a. Install and start PostgreSQL

```bash
# macOS
brew install postgresql@16
brew services start postgresql@16

# Ubuntu/Debian
sudo apt install postgresql-16
sudo systemctl start postgresql

# cPanel/WHM
# PostgreSQL is available via "Manage Databases" in cPanel.
# Create the database and user through the cPanel interface.
```

### 2b. Create the database and user

```sql
CREATE USER headlinesift WITH PASSWORD 'strong-password-here';
CREATE DATABASE headlinesift OWNER headlinesift;
GRANT ALL PRIVILEGES ON DATABASE headlinesift TO headlinesift;
```

### 2c. Update DATABASE_URL in .env

```bash
DATABASE_URL="postgresql://headlinesift:strong-password-here@localhost:5432/headlinesift?schema=public"
```

### 2d. Add connection pooling (optional)

```bash
# Direct PostgreSQL URL (no pooling)
DATABASE_URL="postgresql://..."

# With PgBouncer (transaction mode)
DATABASE_URL="postgresql://headlinesift:password@pghost:6432/headlinesift?pgbouncer=true"
```

---

## Step 3: Migrate the Data

### Option A: Fresh start (recommended for dev/staging)

```bash
# Remove SQLite artifacts
rm -rf prisma/migrations/ data/headlinesift.db*

# Switch provider in schema.prisma (see Step 1)

# Create new PostgreSQL migration
npx prisma migrate dev --name init

# Seed the database
npx tsx seed/index.ts
```

### Option B: Migrate existing data (for production)

Use `pgloader` — the most reliable tool for SQLite→PostgreSQL migration:

```bash
# Install pgloader
# macOS: brew install pgloader
# Ubuntu: sudo apt install pgloader

# Create migration script
cat > migrate.load <<'EOF'
LOAD DATABASE
  FROM sqlite:///home/headlinesift/public_html/data/headlinesift.db
  INTO postgresql://headlinesift:password@localhost:5432/headlinesift

WITH
  data only,
  batch rows = 1000,
  batch concurrency = 4

CAST
  type text to varchar,
  type varchar to text

SET
  search_path TO 'public'

ALTER SCHEMA 'public' RENAME TO 'public';

-- Map SQLite tables → PostgreSQL tables (Prisma-mapped names)
-- pgloader auto-converts snake_case table names
EOF

pgloader migrate.load
```

After pgloader completes:

1. Run `npx prisma db pull` to introspect the PostgreSQL schema
2. Compare with your Prisma schema and resolve any drift
3. Run `npx prisma migrate dev --name postgres-migration` to baseline

### Option C: Application-level export/import

Write a migration script that reads from SQLite and writes to PostgreSQL
using Prisma clients connected to both databases.

```typescript
// scripts/migrate-to-pg.ts
import { PrismaClient as SQLiteClient } from "@prisma/client";
// After changing provider, generate a second client:
// import { PrismaClient as PgClient } from "./generated/pg-client";

async function migrate() {
  const sqlite = new SQLiteClient({
    datasources: { db: { url: "file:../data/headlinesift.db" } }
  });
  const pg = new PgClient();

  // Migrate table by table
  const countries = await sqlite.country.findMany();
  for (const c of countries) {
    await pg.country.create({ data: c });
  }
  // ... repeat for each model

  // Reset sequences (PostgreSQL auto-increment counters)
  await pg.$executeRawUnsafe(`
    SELECT setval(pg_get_serial_sequence('admin_users', 'id'), max(id)::int) FROM admin_users;
  `);
}
```

---

## Step 4: Update Application Code

### 4a. Prisma client singleton

Update `lib/db/client.ts`:

```diff
-  // SQLite pragmas — remove these
-  client.$connect().then(() => {
-    client.$executeRawUnsafe("PRAGMA journal_mode = WAL");
-    client.$executeRawUnsafe("PRAGMA busy_timeout = 5000");
-    client.$executeRawUnsafe("PRAGMA foreign_keys = ON");
-    client.$executeRawUnsafe("PRAGMA cache_size = -64000");
-  });

+  // PostgreSQL connection pooling is handled by Prisma's built-in pool
+  // or by PgBouncer. No pragma calls needed.
```

### 4b. JSON field access

```diff
// Before (SQLite — JSON stored as string, parsed manually)
- const groups = JSON.parse(analysis.affectedGroupsJson || "[]");

// After (PostgreSQL with JSONB — Prisma auto-parses)
+ const groups = analysis.affectedGroups || [];
```

### 4c. Update backup script

`scripts/backup-db.ts` — replace file copy with `pg_dump`:

```typescript
import { execSync } from "node:child_process";

const DB_URL = process.env.DATABASE_URL!;
const backupPath = path.join(BACKUP_DIR, `headlinesift-${timestamp}.sql.gz`);

execSync(
  `pg_dump ${DB_URL} | gzip > ${backupPath}`,
  { stdio: "inherit" }
);
```

---

## Step 5: Verify

```bash
# Validate schema
npx prisma validate

# Run type checks
npx tsc --noEmit

# Start the app and smoke-test the dashboard
npm run dev

# Run seed and verify row counts
npx tsx seed/index.ts
```

### Verification checklist

- [ ] All tables exist: `\dt` in psql
- [ ] All indexes exist: `\di` in psql
- [ ] Enums are registered: `\dT+` in psql
- [ ] Seed data is present
- [ ] Dashboard loads without errors
- [ ] Source fetching works
- [ ] Article deduplication works (urlHash unique constraint)
- [ ] Job polling works (query by status + runAfter)
- [ ] AI analysis generation works
- [ ] Backup script produces valid dumps

---

## Step 6: Production Considerations

### Connection pooling

Use PgBouncer in transaction mode for serverless/edge environments:

```ini
# pgbouncer.ini
[databases]
headlinesift = host=localhost port=5432 dbname=headlinesift

[pgbouncer]
pool_mode = transaction
default_pool_size = 20
max_client_conn = 100
```

### Full-text search

Add a generated `tsvector` column and trigger:

```sql
ALTER TABLE raw_articles ADD COLUMN search_vector tsvector;

CREATE FUNCTION update_search_vector() RETURNS trigger AS $$
BEGIN
  NEW.search_vector :=
    setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(NEW.rawContent, '')), 'B');
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_search_vector
  BEFORE INSERT OR UPDATE ON raw_articles
  FOR EACH ROW EXECUTE FUNCTION update_search_vector();

CREATE INDEX raw_articles_search_idx ON raw_articles USING gin(search_vector);
```

### Backups

```bash
# Daily cron job
0 3 * * * pg_dump -Fc headlinesift | gzip > /backups/headlinesift-$(date +\%Y\%m\%d).dump.gz
```

### Monitoring

Watch for slow queries:

```sql
-- Enable query logging
ALTER SYSTEM SET log_min_duration_statement = 1000; -- Log queries >1s
SELECT pg_reload_conf();

-- Check index usage
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE tablename IN ('raw_articles', 'story_clusters', 'jobs', 'fetch_logs')
ORDER BY idx_scan DESC;
```
