π The SaaS Template Playbook - Part 1 π
A comprehensive, opinionated, actionable guide for building a professional, reusable SaaS template that you can fork and reskin for any vertical (CRM, project management, analytics, internal tooling, vertical SaaS, etc.).
If you read only one section first, read Β§3 The 12 Pillars and Β§5 Multi-Tenancy β those two ideas dictate every other decision in this document.
π Table of Contents
- π§ What "SaaS Template" Actually Means
- β‘ The 30-Second Mental Model
- ποΈ The 12 Pillars of a Production SaaS
- ποΈ Reference Architecture
- π’ Multi-Tenancy β the Keystone Decision
- π Authentication & Authorization
- π₯ Accounts, Organizations, Workspaces, Teams
- πͺ Onboarding & Activation
- π³ Billing, Subscriptions & Metering
- ποΈ Database Design Patterns
- π API Design
- βοΈ Background Jobs, Queues & Schedulers
- π‘ Real-time & Eventing
- π¨ Email, Notifications & Inbox
- π¦ File Storage, Uploads & CDN
- π Search (Full-Text + Semantic)
- π© Feature Flags & Experiments
- π Audit Logs, Activity Feeds & Telemetry
- π‘οΈ Security, Compliance & Privacy
- β‘ Performance, Caching & Scaling
- π Observability β Logs, Metrics, Traces, Errors
- π¨ Frontend Architecture
- π Internationalization & Accessibility
- π§ Admin & Internal Tooling
- π Marketing Site, Docs & SEO
- π’ CI/CD, Environments & Release Strategy
- π§° Developer Experience (DX)
- π§ͺ Testing Strategy
- π° Pricing, Plans & Packaging Strategy
- π― Product Analytics & Growth
- π€ Customer Support & Success
- π¦ Reusability β How to Make This a Template
- πΊοΈ The 14-Phase Build Plan
- β οΈ Common Pitfalls & Hard-Won Guardrails
- π Cheat Sheet
1. π§ What "SaaS Template" Actually Means
A reusable SaaS template is the boring 80% you'd otherwise rebuild for every product:
- Sign-up, login, password reset, SSO, MFA
- Organizations / workspaces / teams / invites
- Roles + permissions
- Billing, subscriptions, plans, usage metering, invoices
- Email + notifications + in-app inbox
- Audit logs + activity feeds
- Admin panel
- Feature flags
- Background jobs, scheduled jobs, webhooks
- File uploads + CDN
- API keys + rate limiting
- Observability + error tracking
- CI/CD + multi-environment deploys
- Marketing landing page + docs site
It is NOT:
- Your product's domain logic β that's the unique 20% you build on top.
- A no-code platform β it's a code starter.
- A magic SaaS-in-a-box β you still need product judgment.
The right mental model: infrastructure for the parts every SaaS has, with clean seams where your domain plugs in.
2. β‘ The 30-Second Mental Model
βββββββββββββββββββββββββββββββββββββββ
β Marketing Site + Docs + Status β
βββββββββββββββββββββββ¬ββββββββββββββββ
β
βββββββββββββββββββββββΌββββββββββββββββ
β Web App (SPA) β
β + (optional) Mobile/Desktop β
ββββββββββ¬ββββββββββββββββββ¬βββββββββββ
β REST/GraphQL β WS/SSE
ββββββββββΌββββββββββββββββββΌβββββββββββ
β Edge / API Gateway β
β (auth, rate limit, CORS, WAF) β
ββββββββββ¬βββββββββββββββββββββββββββββ
β
βββββββββββββββββββΌββββββββββββββββββββββββββββββ
βΌ βΌ βΌ
ββββββββββ ββββββββββββ ββββββββββββ
β App APIβ βββββΊ βWorker(s) β β Webhooks β
β (BFF) β β+ Cron β β Out/In β
βββββ¬βββββ ββββββ¬ββββββ ββββββ¬ββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Postgres (core) β’ Redis (cache+queue) β
β Object Storage (S3) β’ Search (PG/Meili/Elastic) β
β Time-series / Analytics (ClickHouse / DuckDB) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββββββββ
βΌ βΌ βΌ
Stripe Email (Resend) Auth (Clerk/
(billing) SMS (Twilio) WorkOS) [opt]
Sentry Segment/PostHog OpenAI/etc.
Three deployable surfaces, one source of truth:
| Surface | Built from | Where it runs |
|---|---|---|
| Marketing + docs | Next.js static / Astro | CDN (Vercel / Cloudflare Pages) |
| Web app | React SPA (Vite) or Next.js | CDN + edge |
| API + workers | Go / Python / Node | Container platform (Fly / Railway / ECS / k8s) |
3. ποΈ The 12 Pillars of a Production SaaS
Every SaaS template needs all twelve. Skip one, and you eat scope creep later.
| # | Pillar | What "done" looks like |
|---|---|---|
| 1 | Identity | Email/password, OAuth (Google/GitHub), magic link, MFA, SSO (SAML/OIDC), session + token model. |
| 2 | Tenancy | Org/workspace boundary, every query filtered by workspace_id, RBAC + (optional) ABAC. |
| 3 | Billing | Stripe wired, plans configurable, trials, dunning, usage metering, invoice portal. |
| 4 | Lifecycle | Onboarding flow, email verification, invites, offboarding, account deletion (GDPR-clean). |
| 5 | Eventing | In-process bus β outbox β workers β webhooks. Idempotent. |
| 6 | Observability | Structured logs + traces + metrics + error tracker, all correlated by request_id + tenant_id. |
| 7 | Audit | Append-only audit log of every privileged action, queryable by tenant. |
| 8 | Notifications | Transactional email + in-app inbox + (opt) SMS/push, all with per-user preferences. |
| 9 | Files | Direct-to-S3 uploads via signed URLs; never proxy bytes through your API. |
| 10 | Admin | Internal dashboard for support: impersonate, refund, suspend, inspect tenant. |
| 11 | Flags | Feature flags per environment + per tenant + per user. Kill-switch culture. |
| 12 | DX | One command to dev (make dev), seed data, fast tests, docs that don't lie. |
4. ποΈ Reference Architecture
4.1 The Spine
[Browser / Mobile / Desktop]
β
βΌ
[CDN / Edge Cache]
β
βΌ
[Reverse Proxy / WAF] β TLS terminates here
(Caddy: automatic HTTPS via Let's Encrypt,
or Traefik: dynamic routing from Docker/K8s labels)
β
ββββββββββββΌββββββββββββ
βΌ βΌ βΌ
[API Gateway] [WebSocket] [Static Assets]
β β
βΌ βΌ
[App API (stateless, horizontally scalable)]
β
ββββββββββΌββββββββββββββ¬ββββββββββββββ
βΌ βΌ βΌ βΌ
[DB] [Cache] [Queue] [Object Store]
Postgres Redis Redis/SQS S3
β β β β
βΌ βΌ βΌ βΌ
[Read [Pub/Sub [Workers + [CDN signed
replica] for WS] cron] URLs]
4.2 What lives where
| Concern | Where |
|---|---|
| Source of truth | Postgres |
| Hot reads, sessions, idempotency keys, rate-limit counters | Redis |
| Heavy/slow work, retries, scheduled work | Workers consuming a queue |
| Real-time fanout to clients | WS hub backed by Redis pub/sub (multi-node) |
| Bulk analytics & reporting | ClickHouse / BigQuery / DuckDB (mirrored from Postgres) |
| Static UI | CDN |
| User-uploaded files | S3 + CDN with signed URLs |
| Secrets | Env (dev) / SSM / Vault / Doppler (prod) |
4.3 Suggested tech stack (opinionated, swappable)
| Layer | Default | Why |
|---|---|---|
| API (Go) | chi + sqlc + pgx (lean) or Gin + GORM (batteries-included) | Fast, predictable, low-overhead. Gin/GORM is the path-of-least-resistance combo most Go SaaS teams ship on. |
| API (Node) | Hono / Fastify + Prisma | Edge-friendly, ergonomic |
| ML / heavy compute | Python (FastAPI + uv + pydantic v2 + structlog) | Ecosystem advantage; structlog gives you JSON logs out of the box |
| Web | React 19 + TypeScript + Vite + TanStack Query + Zustand + Tailwind | Boring, excellent, zero magic |
| DB | Postgres 16+ (with pgvector, pg_trgm) |
One DB to do 90% of jobs |
| Cache | Redis 7 | Battle-tested |
| Queue / Eventing | Redis (simple) β NATS JetStream (durable streams, replay, KV, multi-tenant subjects) | NATS is the right answer when you need at-least-once delivery, replay, or fan-out across services without standing up Kafka. |
| Search | Postgres FTS (start) β Meilisearch / Typesense (scale) | Cheap β fast |
| Object store | S3 / Cloudflare R2 (no egress) / Supabase Storage (if you're already on Supabase) | Standard |
| Resend or Postmark | Reliable transactional, simple SDKs | |
| Auth (managed SaaS) | Clerk (fast UX), WorkOS (enterprise SSO/SCIM), Supabase Auth (if you want auth + DB + storage in one) | Saves weeks; pick by where the rest of your stack lives. |
| Auth (self-hosted OSS) | Ory Kratos (identity) + Ory Hydra (OIDC) + Ory Keto (permissions) β pure API, no UI bundled. Casdoor β full-stack IAM with built-in admin UI, OIDC/SAML, RBAC, MFA. | Own your identity layer without writing it. Kratos = composable primitives; Casdoor = drop-in IAM. |
| Auth (DIY) | Lucia / Auth.js / your own JWT + refresh | Maximum ownership, maximum maintenance |
| Billing | Stripe (default) / Paddle or LemonSqueezy (Merchant-of-Record, global tax) / PayPal (add as a secondary payment method when you have non-card markets β LATAM, parts of EU, gamer/creator audiences) | Stripe owns card-first markets; PayPal is the second checkout option customers ask for. |
| Logging (Go) | zerolog (zero-allocation JSON) or slog (stdlib, 1.21+) |
zerolog is the production default for Go SaaS β fast, structured, contextual. |
| Logging (Python) | structlog + orjson renderer |
Structured, contextvars-aware, async-safe |
| Background jobs | Asynq (Go, Redis) / River (Go, Postgres) / BullMQ (Node) / Celery / Arq (Python) / NATS JetStream consumers (cross-language) | Match language, or use NATS if you already have it for eventing. |
| Reverse proxy / TLS | Caddy (automatic HTTPS, simplest config) or Traefik (dynamic config, great with Docker/K8s/labels) β nginx if you have a reason. | Caddy = "it just works" for VMs. Traefik = service-discovery-driven for containerized stacks. |
| Observability | OpenTelemetry β Grafana / Honeycomb / Datadog | Vendor-neutral export |
| Errors | Sentry | Best-in-class |
| Analytics | PostHog (self-host or cloud) | Product + flags + session replay in one |
| CI/CD | GitHub Actions | Where your code already is |
| Infra (PaaS, fastest start) | Fly.io / Railway / Render | Push-to-deploy, no ops |
| Infra (cheap VMs, more control) | Hetzner (best β¬/CPU in the market β β¬4ββ¬40/mo dedicated cores) or Digital Ocean (polished UX, managed PG/Redis, App Platform) | Most bootstrapped SaaS run profitably on a Hetzner box + DO managed Postgres. Pair with Caddy/Traefik. |
| Infra (hyperscaler, when you have to) | AWS / GCP / Azure | Compliance, region breadth, enterprise procurement |
Two reference stacks to pick from on day one:
- "Bootstrapped solo / small team": Go (Gin + GORM + zerolog) + Postgres + NATS JetStream + Caddy on a single Hetzner box, Casdoor or Ory Kratos for auth, Stripe + PayPal for payments. ~β¬30/mo, scales to thousands of paying customers.
- "Funded / enterprise-ready": Go (chi + sqlc) + managed Postgres + Redis + NATS cluster behind Traefik on Digital Ocean App Platform / Kubernetes, WorkOS or Supabase Auth, Stripe Billing, OTel β Grafana Cloud.
4.4 Cross-cutting building blocks (the glossary)
These are the load-bearing concepts every later section assumes. Define them once here; deeper coverage is in the linked sections.
π§± The middleware chain
A request flows through a fixed stack of middleware before any handler runs. Order is load-bearing β wire it once in main.go and don't rearrange.
Request
β
βΌ
[1] Recovery β catch panics, return 500 + Sentry capture
[2] RequestID β generate or accept X-Request-ID header
[3] Logger β bind request_id to ctx logger (zerolog/structlog)
[4] Tracing β OTel span for the request
[5] CORS β allowlist origins
[6] RateLimit β Redis token bucket per IP / API key (Β§11.7)
[7] Auth β verify session/JWT/API key β set Actor in ctx (Β§6)
[8] Tenant β resolve workspace_id β set in ctx + SET LOCAL app.workspace_id (Β§5)
[9] CSRF β cookie endpoints only
[10] Idempotency β POSTs with Idempotency-Key header (Β§11.6)
β
βΌ
Handler β Service β Repository
β
βΌ
Response
β
βΌ
[Logger middleware closes the span, emits access log line]
Auth comes before Tenant (you need an actor before resolving their workspace). Recovery is outermost so a panic anywhere still produces a clean 500. RateLimit goes before Auth so unauthenticated abuse hits the limiter first.
π¦ What ctx carries
context.Context is the request-scoped envelope. Everything below is bound by middleware and read by handlers/services/repos.
| Key | Set by | Read by |
|---|---|---|
request_id |
RequestID middleware | logs, error responses, traces |
logger |
Logger middleware | every layer (log.Ctx(ctx)) |
actor |
Auth middleware | permission checks, audit log |
workspace_id |
Tenant middleware | every repo query, RLS GUC |
trace_id / span |
OTel middleware | downstream HTTP/DB instrumentation |
db (per-request handle with GUCs set) |
Tenant middleware | repos |
Rule: if a function needs any of these, it takes ctx context.Context as the first argument. No globals. No req.Context() 3 layers deep β pass ctx explicitly.
π The Actor type (polymorphic identity)
Every action in the system is performed by something β a human, an API key, or the system itself. Don't model "user" everywhere; model Actor.
type Actor struct {
Type ActorType // user | api_key | system
ID uuid.UUID
// for users: cached membership in current workspace
Role Role // owner | admin | member | viewer
Permissions []string // resolved at auth time
}
func (a *Actor) Can(action string, resource Resource) bool { /* Β§6.3 */ }
This pairs with the polymorphic-actor DB pattern (created_by_type, created_by_id β see Β§35) so audit logs, activity feeds, and created_by fields handle integrations and humans uniformly.
ποΈ Layered architecture (handler β service β repo)
Each layer has a strict allowed-imports list. Violations are caught by golangci-lint depguard rules (or equivalent in other languages).
| Layer | Knows about | Forbidden |
|---|---|---|
| Handler | HTTP, Service interfaces, request/response DTOs | DB, SQL, third-party SDKs |
| Service | Domain logic, other Services, Repository interfaces, the Bus |
HTTP types (http.Request, gin.Context) |
| Repository | DB driver, SQL, models | HTTP, business rules, other repos |
A handler never touches the DB. A repo never decides whether an action is allowed. This is what makes services testable without a server and repos swappable.
π The kernel interfaces (the seams)
Every cross-cutting capability is a Go interface (or TS type) defined in kernel/. The product imports the interface; wiring picks the implementation at startup. These are the seams that keep the template reusable.
type Auth interface { // Β§6
Authenticate(ctx, token) (*Actor, error)
Issue(ctx, user *User) (Token, error)
}
type Bus interface { // Β§13
Publish(ctx, subject string, payload []byte) error
Subscribe(ctx, subject string, h Handler) (Subscription, error)
}
type Storage interface { // Β§15
PresignPut(ctx, key string, opts PutOpts) (string, error)
PresignGet(ctx, key string, ttl time.Duration) (string, error)
}
type Mailer interface { // Β§14
Send(ctx, msg Message) error
}
type Meter interface { // Β§9.6
Increment(ctx, workspaceID uuid.UUID, metric string, n int64) error
}
type Flags interface { // Β§17
IsEnabled(ctx, key string, scope FlagScope) bool
}
type Cache interface { // Β§20
Get(ctx, key string) ([]byte, bool, error)
Set(ctx, key string, val []byte, ttl time.Duration) error
Bump(ctx, tag string) error // tag-based invalidation
}
Implementations: casdoor.Auth, workos.Auth, kratos.Auth / nats.Bus, redis.Bus, inproc.Bus / s3.Storage, r2.Storage, supabase.Storage / resend.Mailer, postmark.Mailer / etc. Swapping providers = changing one line in main.go.
π Transactions: the WithTx pattern
Don't manually Begin/Commit/Rollback β it leaks on panics and confuses nested calls. Use a closure helper that the repo layer owns:
func (r *Repo) WithTx(ctx context.Context, fn func(tx *Repo) error) error {
return r.db.Transaction(func(db *gorm.DB) error {
return fn(&Repo{db: db})
})
}
// Service:
err := repo.WithTx(ctx, func(tx *Repo) error {
if err := tx.Orders().Create(ctx, order); err != nil { return err }
return tx.Outbox().Append(ctx, "order.created", order) // Β§12.4
})
Two rules:
- Never hold a transaction across a network call (HTTP, Stripe, S3). Read first, do external work, then write fast inside the tx.
- DB writes + event emission live in the same tx via the outbox pattern (Β§12.4). Anything else is eventually-inconsistent in failure modes.
π Idempotency (everywhere, not just Β§11.6)
Three places idempotency shows up; same idea, different keys:
| Surface | Key | Storage |
|---|---|---|
Public API POST |
Idempotency-Key header (Β§11.6) |
Redis, 24h TTL, scoped by (workspace_id, key) |
| Stripe/PayPal webhooks | event.id (Β§9.3) |
Redis, 7-day TTL |
| Background jobs | (job_type, dedup_key) (Β§12.3) |
Postgres unique index, or Redis SETNX |
The shape is always: check if you've seen this key β if yes, return cached result / no-op β else do work, then record the key.
π ID conventions
UUID v7for all primary keys β sortable by time, single column for PK + chronology, nocreated_atindex needed for ordering.- Prefixed display IDs in API responses for human-readable references:
proj_01HMZ...,inv_01HMZ.... The DB stores the raw UUID; the API serializer adds the prefix. Saves debugging time when a customer pastes an ID into a ticket.
π The standard handler shape
Every handler in the codebase looks the same. Deviation = reviewer flag.
func (h *ProjectHandler) Create(c *gin.Context) {
ctx := c.Request.Context()
actor := auth.ActorFrom(ctx) // set by Auth middleware
workspaceID := tenant.IDFrom(ctx) // set by Tenant middleware
var req CreateProjectRequest
if err := c.ShouldBindJSON(&req); err != nil {
respondError(c, errs.Validation(err)); return
}
project, err := h.svc.Create(ctx, actor, workspaceID, req)
if err != nil {
respondError(c, err); return // single error envelope (Β§11.5)
}
c.JSON(201, project)
}
Five lines of mechanical work, then one line of actual business logic delegated to the service. If a handler grows past 20 lines, push the logic down a layer.
The single most consequential architectural choice. Decide at day one and enforce in code.
5.1 The three models
| Model | Description | When to use |
|---|---|---|
| Pool (shared) | One DB, every row tagged workspace_id (or org_id). |
Default for B2B SaaS. Best ops/cost. |
| Bridge (silo schema) | One DB, one schema per tenant. | Mid-enterprise; per-tenant migrations possible. |
| Silo (isolated DB) | One DB per tenant. | Regulated tenants (banks, healthcare), VIP customers. |
Recommendation: Start with Pool. Add Silo later as an enterprise tier. Don't try to do all three on day one.
5.2 Hard rules for the Pool model
- Every tenant-owned table has
workspace_id(ororg_id) NOT NULL. - Every query filters by
workspace_idβ no exceptions. Enforce via:- Repository methods that require
workspaceIDas a typed argument. - Postgres Row-Level Security (RLS) as a belt-and-suspenders defense.
- Repository methods that require
- The active tenant is resolved once per request from the auth token and stored in
context.Context/ request-local state. - Cross-tenant queries (admin, analytics) go through a separate, audited code path. Never inside the user request handler.
5.3 Postgres RLS as defense-in-depth
ALTER TABLE issue ENABLE ROW LEVEL SECURITY;
CREATE POLICY issue_tenant_isolation ON issue
USING (workspace_id = current_setting('app.workspace_id')::uuid);
In your handler middleware:
tx.Exec(`SET LOCAL app.workspace_id = $1`, workspaceID)
Even if a developer forgets a WHERE workspace_id = ?, RLS blocks the leak.
5.4 The "two-actor" rule for queries
Every query has two implicit parameters:
actor_user_id(who's asking)tenant_id(which tenant they're acting in)
Don't accept "logged-in user" alone. The same user can belong to multiple workspaces.
5.5 Tenant resolution
Either:
- Subdomain:
acme.app.yourtool.comβacmeβ workspace lookup. - Path:
app.yourtool.com/w/acme/... - Header:
X-Workspace-ID: <uuid>(good for APIs, but UI needs a workspace switcher).
Most SaaS pick subdomain or path β pick one and stick with it.
6. π Authentication & Authorization
6.1 Auth methods you must support
- Email + password (always β even if SSO available).
- Magic link (best UX for low-stakes products).
- OAuth: Google + GitHub minimum. Apple if iOS app.
- MFA: TOTP (Authenticator apps) β easy to add, big trust signal.
- Passkeys (WebAuthn) β increasingly expected.
- SSO (SAML 2.0 + OIDC) β gate behind enterprise plan; outsource to WorkOS or Clerk unless you want to own the support burden.
- API keys β per-workspace, scoped, revocable, hashed at rest (
sha256). - Personal access tokens (PATs) β for CLIs, with rotation.
6.2 Sessions vs JWTs β pick a hybrid
| Use case | Mechanism |
|---|---|
| Browser session | HttpOnly secure cookie with opaque session ID β server-side session in Redis. Easy revocation. |
| Mobile / desktop / CLI | Short-lived JWT (15 min) + refresh token stored securely. |
| Public API | API key (long-lived, scoped, revocable). |
| Service-to-service | mTLS or signed JWT with short TTL. |
Rule: JWT or server-side session β pick per surface. Don't mix-and-match within one surface.
6.3 Authorization β RBAC, then ABAC if needed
Start with role-based access control (RBAC):
Workspace roles: owner | admin | member | viewer
Resource permissions derived from role
Only add attribute-based access control (ABAC) (e.g., "user X can edit only resources where assignee_id = user.id") when RBAC alone produces unmaintainable conditionals.
// Permission helper signature
func Can(actor *Actor, action string, resource Resource) bool
Centralize all permission logic in one package. Never inline if user.Role == "admin" checks in handlers.
6.4 Open-source policy engines
- Casbin β Go, lightweight, RBAC + ABAC.
- OPA (Open Policy Agent) β sidecar, enterprise-grade.
- Oso β embedded, declarative.
- Ory Keto β Google Zanzibarβstyle relationship-based access control as a service.
For a template, hand-rolled Can() is fine until you hit ~20 permission rules.
6.5 Don't-build-it-yourself: managed & self-hostable identity
Auth is a tarpit. Ship a real identity service before you ship your second feature. Pick by where you want the trust boundary:
| Option | Type | Sweet spot | Watch out for |
|---|---|---|---|
| Clerk | Managed SaaS | B2C/PLG products that want pre-built React components and great DX. | Per-MAU pricing scales painfully past ~50k actives. |
| WorkOS | Managed SaaS | B2B selling into mid-market/enterprise β SSO (SAML/OIDC), SCIM, directory sync, audit log API. | Light on consumer-style password/magic-link flows; pair with Clerk or your own for those. |
| Supabase Auth (GoTrue) | Managed or self-hosted | You're already using Supabase Postgres + Storage; auth comes "free" with RLS hooks wired in. | You're now Supabase-shaped; migrating off later isn't trivial. |
| Casdoor | Self-hosted OSS | Single binary IAM with a built-in admin UI. OIDC/OAuth2/SAML/CAS providers, RBAC/ABAC, MFA, social logins, webhooks. | UI is functional, not premium β usually fine since admins use it, not end users. |
| Ory Kratos + Hydra + Keto | Self-hosted OSS | API-first, headless, composable. Kratos = identity + flows, Hydra = OIDC/OAuth2 server, Keto = permissions. You bring your own UI. | More moving parts; budget a week to wire flows + UI. |
| Authentik / Zitadel / Keycloak | Self-hosted OSS | Alternatives in the same shape as Casdoor β pick on UX preference and language affinity. | Keycloak is JVM-heavy; Authentik/Zitadel are lighter. |
Template recommendation by audience:
- Solo / bootstrapped: start with Casdoor (one container, admin UI, OIDC works in 30 minutes) or Supabase Auth if you want DB + auth co-located.
- Funded B2B: WorkOS for SSO/SCIM + your own password/magic-link, or Ory Kratos if you must self-host for compliance.
- Consumer-facing PLG: Clerk for the fastest path to a polished sign-in experience.
Your app should talk to identity through a thin auth package interface (Authenticate(token) β Actor, Issue(ctx, user) β token). Swapping Casdoor for WorkOS later is then a ~1-day adapter change, not a rewrite.
6.6 Auth security checklist
- [ ] Passwords hashed with
argon2id(or bcrypt cost 12+). - [ ] Email enumeration defended (same response for "email not found" and "wrong password").
- [ ] Rate limiting on
/login(5/min/IP + 10/hr/email). - [ ] Lockout after N failed attempts, with email notification.
- [ ] CSRF protection on cookie-auth endpoints.
- [ ] Session fixation defense: rotate session ID on login.
- [ ] Logout invalidates server-side session.
- [ ] Refresh tokens rotated on use; revoke entire family on reuse-detection.
- [ ] Password reset tokens are single-use, expire in 1h, are sent to verified email only.
- [ ] MFA backup codes generated, shown once, hashed at rest.
7. π₯ Accounts, Organizations, Workspaces, Teams
7.1 The canonical hierarchy
User ββ¬ββΊ Membership ββΊ Workspace (tenant)
β β
β βββ Teams (subgroups)
β βββ Resources (projects, issues, β¦)
β βββ Subscription (Stripe)
β βββ Settings (branding, SSO, etc.)
β
βββΊ Personal account (optional β for solo plans)
A User is a global identity. A Membership ties a user to a workspace with a role.
7.2 Required tables (minimum)
user (id, email, password_hash, email_verified_at, mfa_enabled, created_at, ...)
workspace (id, slug, name, plan, owner_user_id, created_at, ...)
membership (id, user_id, workspace_id, role, status, invited_by, joined_at)
invite (id, workspace_id, email, role, token_hash, expires_at, accepted_at)
team (id, workspace_id, name, parent_team_id NULL)
team_membership (id, team_id, user_id, role)
api_key (id, workspace_id, name, prefix, hash, scopes JSONB, created_by, last_used_at, revoked_at)
7.3 Invites
- Email a single-use signed token (expires in 7 days).
- Accepting creates the
membershiprow. - Critical: if invitee already has an account, just attach a membership β don't force a separate signup flow.
7.4 Workspace switcher UI
A persistent UI element (sidebar dropdown or top nav) that:
- Shows current workspace.
- Lets user switch (changes URL:
/w/<slug>/...). - Lets user create a new workspace.
- Cache the active workspace ID per-user in a cookie/localStorage so it survives reloads.
7.5 Offboarding & deletion
- Delete account: GDPR right-to-be-forgotten. Anonymize PII, retain audit log entries with
user_id = NULL+display_name = "Deleted user". - Leave workspace: just removes the membership row.
- Delete workspace: 30-day soft-delete with restore option. Hard-delete after grace period via cron.
8. πͺ Onboarding & Activation
The 5-minute window between sign-up and first value is the highest-leverage UX you'll ever build.
8.1 The signup flow
1. /signup β email + password (or OAuth)
2. Send verification email immediately (but don't block app entry on it)
3. Land in "create your workspace" step
4. Land in product with one-time guided tour
5. Trigger first-aha-moment within β€ 3 clicks
8.2 Activation events
Define the activation event β the action that predicts retention. Examples:
- Slack: send 2,000 team messages
- Dropbox: upload 1 file
- Linear: create 3 issues
- Figma: invite 1 collaborator
Track this as activated_at on the workspace, fire it from your event bus, and trigger lifecycle emails off it.
8.3 Email verification β required vs optional
- Required for sensitive actions (billing, inviting users, API keys).
- Optional for read-only browsing.
- Show a banner ("Verify your email β we sent a link to alice@β¦") and a one-click resend button.
8.4 Sample data / templates
For B2B SaaS, ship with a demo workspace that's pre-populated. Lets new users explore before they set up their own data.
8.5 Empty states are product surface
Every list view (/issues, /projects, β¦) needs an empty state with:
- One sentence of context ("No issues yet β issues are how you track work").
- A primary CTA button.
- An optional "import from CSV / Linear / Jira" hook.
9. π³ Billing, Subscriptions & Metering
9.1 Use Stripe. (Or Paddle / LemonSqueezy if you want them to handle global tax.)
Don't build billing yourself. Stripe has solved every edge case you'd hit in year three.
On PayPal: Stripe is the default subscription engine. PayPal is a checkout option, not a billing system. A meaningful slice of customers β LATAM, parts of Asia/EU, freelancer/creator markets, B2C audiences who don't want to hand over a card β will bounce if PayPal isn't there. The right shape is:
- Subscriptions ledger lives in your DB. Plan, status, period, seats β your tables, your truth.
- Stripe for cards / Apple Pay / Google Pay / SEPA / ACH (subscription billing via Stripe Billing).
- PayPal Subscriptions API wired as a parallel payment provider β same
subscriptionrow, differentpayment_providercolumn. - One webhook handler per provider writing into the same idempotent state machine. Don't try to unify webhooks; unify the resulting state.
subscription (
id UUID PK,
workspace_id UUID,
plan_id UUID,
status TEXT, -- trialing | active | past_due | canceled
payment_provider TEXT, -- 'stripe' | 'paypal' | 'manual'
provider_subscription_id TEXT, -- stripe sub_β¦ / paypal I-β¦
provider_customer_id TEXT,
current_period_end TIMESTAMPTZ,
cancel_at TIMESTAMPTZ NULL,
...
)
Skip PayPal until a real customer asks for it twice. Then add it behind a feature flag and offer it only on the plan-selection page.
9.2 Required Stripe surfaces
| Surface | Stripe product |
|---|---|
| Plan selection at signup | Stripe Checkout (hosted) |
| In-app upgrade/downgrade | Stripe Billing Portal (hosted) β or build your own using the API |
| Usage-based billing | Metered prices |
| Trials | Set trial_period_days on subscription |
| Discounts / coupons | Stripe coupons + promotion codes |
| Invoices, payment methods, receipts | Customer Portal handles all this for free |
9.3 The webhook contract
Subscribe to (at minimum):
customer.subscription.createdcustomer.subscription.updatedcustomer.subscription.deletedinvoice.paidinvoice.payment_failedcustomer.updatedcheckout.session.completed
Idempotency rule: every webhook handler must be idempotent. Stripe will retry. Use the event.id as a dedup key.
9.4 Plan model
plan (id, name, stripe_price_id, monthly_price_cents, yearly_price_cents, features JSONB, limits JSONB)
subscription (id, workspace_id, stripe_subscription_id, stripe_customer_id, plan_id, status, current_period_end, cancel_at, ...)
usage_record (id, workspace_id, metric, quantity, recorded_at, billed_at)
features and limits should be JSONB so you can add new feature gates without migrations:
{
"features": { "sso": false, "audit_log_export": false, "custom_domains": false },
"limits": { "members": 10, "projects": 5, "ai_credits_per_month": 1000 }
}
9.5 Feature gating
// Single helper, used everywhere
if (!can(workspace, "feature.sso")) {
return upgradePrompt("SSO is available on the Team plan and above");
}
Every paywall is a can() check + a UI prompt. Never silently 403.
9.6 Metering
For usage-based pricing (AI credits, API calls, storage GB, β¦):
// In the request path, fast and non-blocking:
meter.Increment(ctx, workspaceID, "ai.tokens", n)
meter.Increment writes to Redis (incr counter) + buffers writes to Postgres / Stripe in the worker. Never call Stripe synchronously in the request path.
9.7 Dunning (failed payments)
- 1st failure: email "We couldn't charge your card."
- 3rd failure (~7 days): downgrade to free + email.
- 30 days unpaid: suspend workspace (read-only) + email.
- 60 days: hard-delete or hand to collections.
Stripe handles the retry schedule (Smart Retries) β you handle the in-app messaging.
9.8 Trials done right
- Length: 14 days is the cultural norm. Don't overthink it.
- Card upfront vs not: card-up-front filters tire-kickers (lower volume, higher conversion); no-card maximizes top-of-funnel. For B2B SaaS template, default to no-card with trial countdown banners.
- Trial extension: offer once, free, no questions. ("Need more time? Extend 7 days.")
- Trial expiration UX: read-only mode + upgrade banner. Don't delete data.
9.9 When you'd outgrow Stripe-direct: Merchant-of-Record platforms
Stripe leaves you responsible for global tax (VAT, GST, US state sales tax). Below ~$1M ARR or with US-only customers, that's fine. Beyond that, or if you sell into the EU/UK as a non-resident, the compliance overhead becomes a real cost β at which point a Merchant-of-Record (MoR) sells the product to the customer and from you, taking the tax problem off your plate.
| Option | Type | Sweet spot | Watch out for |
|---|---|---|---|
| Paddle | Managed MoR | Established (15+ years), broad payment-method coverage, good for B2B SaaS selling globally. | Higher fees than raw Stripe (~5% all-in vs ~2.9% + 30Β’); less granular control over the checkout. |
| LemonSqueezy | Managed MoR (Stripe-owned since 2024) | Indie/SMB-friendly, simple pricing, good license-key + digital-product support. | Acquired by Stripe β long-term roadmap may converge with Stripe Tax. |
| Polar | OSS + managed MoR | Open-source, developer-focused, optimized for indie hackers and dev-tool SaaS. Native usage-based billing, GitHub integration, customer benefits/perks built in. The right pick when you want MoR + a tool that feels native to a dev-first product. | Younger than Paddle/LMSqueezy; smaller ecosystem of integrations. Verify supported regions/payment methods match your market. |
| Stripe Tax (add-on, not MoR) | Managed | You stay the merchant of record but Stripe calculates and (in some jurisdictions) files tax for you. The middle ground. | Doesn't solve "non-resident seller of digital services in the EU" β you're still the entity registered for VAT. |
Decision rule: stay on raw Stripe until tax compliance starts costing you 1+ engineer-week per quarter. Then go MoR. Polar is the right default for indie / dev-tool / open-core SaaS; Paddle/LemonSqueezy for broader B2B.
The same pattern as PayPal (Β§9.1): your subscription table is provider-agnostic β payment_provider TEXT distinguishes stripe / paypal / polar / paddle. Switching MoRs later is a webhook-handler swap, not a rewrite.
10. ποΈ Database Design Patterns
10.1 Conventions
- Singular table names (
user,issue) β matches Go struct naming. - Every table has:
id(UUID v7 β sortable),created_at,updated_at, andworkspace_id(if tenant-scoped). - UUID v7 is sortable by time β primary key + chronological order in one column.
- Soft delete:
deleted_at TIMESTAMPTZ NULLwith a partial unique index wheredeleted_at IS NULL. - Append-only history tables for things that need provenance (audit log, billing events, webhooks).
10.2 Migrations
- Always forward. Never edit an applied migration. Create a new one to fix mistakes.
- Use
gooseorgolang-migrate(Go β both fine;golang-migrateships a CLI + library + Docker image and supports many DB drivers,goosehas nicer Go-based migrations) /alembic(Python) /prisma migrate/drizzle-kit/ Atlas (declarative, language-agnostic). - Number them sequentially:
001_init.up.sql,002_add_invites.up.sql, β¦. - Run automatically on deploy (with a deploy gate / dry-run for prod).
- Online migrations: never block writes on a hot table. Add column nullable β backfill in batches β add NOT NULL in a later migration.
10.3 Indexes that pay rent
- Every foreign key.
- Every
WHEREclause column you actually filter on (runEXPLAIN ANALYZE). (workspace_id, status, created_at DESC)for typical "list X for tenant" queries.- Partial indexes for soft delete:
WHERE deleted_at IS NULL.
10.4 Transactions
- Wrap every multi-write operation in a transaction.
- Use the outbox pattern for cross-service events (see Β§13.3).
- Don't hold transactions open across HTTP/RPC calls. Read first, do external work, write fast.
10.5 Ergonomics
- Use sqlc (Go) / Prisma (TS) / SQLAlchemy 2.0 + Alembic (Python). Skip ORMs that hide SQL.
- Co-locate migrations and queries in the repo; check them in.
- Seed scripts for local dev that create realistic data (
make seed).
11. π API Design
11.1 REST is the default; GraphQL is the exception
- REST + JSON for 90% of endpoints. Predictable, cacheable, debuggable.
- GraphQL if you have a complex, deeply-nested data graph and many client surfaces. Otherwise it's overhead.
- gRPC for service-to-service inside your infra.
11.2 Resource conventions
GET /api/v1/projects list
POST /api/v1/projects create
GET /api/v1/projects/:id read
PATCH /api/v1/projects/:id partial update (preferred over PUT)
DELETE /api/v1/projects/:id delete
GET /api/v1/projects/:id/issues sub-collection
POST /api/v1/projects/:id/issues create in sub-collection
11.3 Pagination
- Cursor-based (
?cursor=<opaque>&limit=50) β not offset. Offsets break under concurrent inserts. - Return
{ items: [], next_cursor, has_more }. - Cap
limitat 100.
11.4 Filtering & sorting
?status=open&priority=high&sort=-created_at&limit=50
Document supported filters per endpoint. Reject unknown query params (don't silently ignore β typos won't surface).
11.5 Error envelope (one shape, everywhere)
{
"error": {
"code": "validation_error",
"message": "Title is required",
"fields": { "title": "must not be empty" },
"request_id": "req_01HMZ..."
}
}
Include request_id in every response (header + body) so support can grep your logs.
11.6 Idempotency
- For
POSTendpoints that create resources or trigger side effects, accept anIdempotency-Keyheader. - Cache
(workspace_id, idempotency_key) β responsein Redis for 24h. - Return the cached response on retry. Stripe's the canonical example.
11.7 Rate limiting
- Per API key + per IP + per workspace.
- Token bucket in Redis (
INCR+EXPIRE). - Return
429withRetry-Afterheader. - Document limits in your API docs and surface them in the response headers (
X-RateLimit-Limit,X-RateLimit-Remaining,X-RateLimit-Reset).
11.8 Versioning
- URL versioning (
/api/v1/,/api/v2/) β boring, works. - Or header-based (
Accept: application/vnd.yourtool.v2+json) β fancy, more work. - Never break v1 once published. Add v2 alongside.
11.9 OpenAPI
- Maintain a hand-written or generated OpenAPI 3.1 spec.
- Generate client SDKs from it (
openapi-generator,oapi-codegen). - Render docs with Stoplight / Redoc / Mintlify.
11.10 Webhooks (outgoing)
- Per-workspace endpoints registered in settings.
- Sign every payload:
X-Signature: sha256=<hmac(body, secret)>. - Include
X-Event-Id(idempotency) andX-Timestamp(replay defense). - Retry with exponential backoff (1m, 5m, 30m, 2h, 12h) β fail and notify after final retry.
12. βοΈ Background Jobs, Queues & Schedulers
12.1 Three job categories
| Category | Examples | Constraint |
|---|---|---|
| Async (fire-and-forget) | Send email, post to webhook, sync to CRM | Must be retried on failure |
| Scheduled | Daily reports, dunning emails, data exports | Must run within window, not on hot path |
| Long-running | Imports, AI batch jobs, video transcode | Need progress tracking + cancellation |
12.2 Job system
- Pick one library per language and stick to it.
- Go: River (Postgres-backed, transactional) or Asynq (Redis-backed).
- Python: Arq (asyncio + Redis) or Celery (mature, heavy).
- Node: BullMQ.
12.3 Idempotency
Every handler must tolerate being called twice. Use a (job_type, dedup_key) unique key, or check-then-act inside a transaction.
12.4 Outbox pattern
When you need "DB write + event emission" to be transactional:
INSERT INTO order ...;
INSERT INTO outbox (event_type, payload) VALUES ('order.created', '...');
COMMIT;
A separate worker polls outbox, fires the event (queue / webhook / Stripe sync), marks it done.
12.5 Cron / scheduled jobs
- Use a single, deduplicated scheduler β not
cronper box (you'll get duplicate runs on multi-instance deploys). - Postgres-backed
pg_cronor library-level (robfig/cron+ leader election) work fine. - Every scheduled job logs its run + duration to a
cron_runtable for visibility.
12.6 Long-running progress
For jobs the user can see ("Importing 50,000 contactsβ¦"):
- Persist a
jobrow withstatus,progress_pct,total,current,result,error. - Worker updates progress every N items / N seconds.
- UI polls
GET /jobs/:idor subscribes via WS.
12.7 The tier above queues: durable execution engines
A queue (Asynq, BullMQ) gives you "run this function later, retry on failure." That's enough for 80% of SaaS work. But once your jobs become multi-step workflows that can pause for hours, fan-out and join, survive worker crashes mid-step, and need exactly-once guarantees end-to-end (think: subscription onboarding flow, multi-day customer pipeline, agent runs that pause for human approval), a queue starts to bend. You end up rebuilding state machines, sagas, and resumability on top of it. That's the signal to step up to a durable execution engine.
| Tool | Type | Sweet spot | Watch out for |
|---|---|---|---|
| Temporal | OSS, self-host or Temporal Cloud (managed) | The category leader. Workflows-as-code in Go/TS/Python/Java/.NET, deterministic replay, built-in retries/timeouts/heartbeats/sagas/signals/queries. The right pick for serious multi-step orchestration (billing flows, KYC, ETL pipelines, long-running agents Β§18 of the AI playbook). | Operationally non-trivial β Temporal cluster needs Cassandra/PostgreSQL + history service + matching service. Use Temporal Cloud (~$200/mo starter) until you have a reason not to. Workflow code must be deterministic β surprising at first. |
| Hatchet | OSS, Postgres-backed | Temporal-shaped (durable workflows, retries, fan-out, human-in-the-loop) but runs on just Postgres β no separate cluster. Excellent fit for teams that already have Postgres and don't want to operate Temporal. Python and TS SDKs, Go in progress. | Younger project, smaller ecosystem. Postgres becomes a hot bottleneck at very high workflow volume β fine for thousands/sec, not millions. |
| Inngest | Managed (OSS dev tools) | Step-functions-style workflows in TS/Python, focused on developer ergonomics and event-driven triggers. Best for serverless/Vercel-shaped stacks. | Less control if you self-host; managed pricing scales with executions. |
| Restate | OSS, single binary | Newer durable execution runtime focused on simplicity (single binary, deterministic) with TS/Java/Kotlin/Python/Go/Rust SDKs. Worth watching. | Smaller community than Temporal/Hatchet today. |
When to pick a durable execution engine over a queue:
- A workflow has β₯3 steps, any of which can be retried independently.
- A workflow needs to pause and wait β for an external webhook, a human approval, a timer measured in hours/days.
- "If the worker crashes mid-step, the work must continue from exactly where it left off" is a real requirement, not a nice-to-have.
- You're writing your fourth state-machine table this quarter.
Recommendation by stage:
- Day one of the template: stick with the queue from Β§12.2. Don't import Temporal complexity before you need it.
- Year one, indie/bootstrapped: if you cross the threshold above, Hatchet is the path of least resistance β it slots into your existing Postgres.
- Year two, funded / enterprise: Temporal Cloud is the safe pick β battle-tested, audited, used by Uber/Snap/Netflix, deep tooling. The managed offering removes the operational pain.
The same Bus / Worker interface pattern from Β§4.4 applies: workflows are invoked through a thin adapter so swapping queues for Temporal later is a worker rewrite, not an API rewrite. AI agents in particular (long pause, human-in-the-loop, hours-long runs) are the canonical fit β see the AI playbook Β§18.
13. π‘ Real-time & Eventing
13.1 In-process event bus (the spine)
A simple synchronous publisher with topic-based listeners:
bus.Publish(ctx, "issue.created", IssueCreated{ID: ..., WorkspaceID: ...})
Listeners write derived state, enqueue jobs, and broadcast over WS.
Important: subscribers register before publishers. Document the order in main.go. Order is load-bearing.
13.2 WebSocket vs SSE
| Need | Use |
|---|---|
| Bidirectional (chat, collaborative editing) | WebSocket |
| Server β client only (live dashboards, notifications) | SSE (simpler, plays nice with HTTP/2) |
For most SaaS, SSE is enough. WebSocket only if you have meaningful clientβserver messaging beyond auth handshake.
13.3 Multi-node fanout
Single API node: in-memory hub. Multi-node: backend hub publishes to a pub/sub bus, every node subscribes and forwards to its connected clients.
| Bus | When to pick it |
|---|---|
| Redis pub/sub | You already have Redis. Fire-and-forget. No durability β a disconnected node misses messages. |
| Redis Streams | Same Redis, but with replay + consumer groups. Good middle ground. |
| NATS JetStream | The right answer for any SaaS that's growing into multiple services. Persistent streams, replay, exactly-once-on-ack consumers, KV + object store, per-tenant subjects (ws.<workspace_id>.>), works as eventing backbone and WS fan-out and job queue. Cheap to self-host (single binary), clusters trivially. |
| Kafka / Redpanda | You have a data team and analytics pipelines. Overkill as a starting point. |
[Browser] βWSββΊ [API node A] βpubββΊ [NATS JetStream] βsubββΊ [API node B] βWSββΊ [Browser]
β
βββΊ [Worker pool] (durable consumers, replay on crash)
Why NATS JetStream is the recommended template default once you outgrow single-node:
- One binary replaces Redis pub/sub + a job queue + an event log.
- Per-tenant subject hierarchy (
tenant.<workspace_id>.events.>) maps cleanly to multi-tenancy. - Durable consumers give you the outbox-pattern guarantees (Β§12.4) without an outbox table for cross-service events.
- KV bucket for ephemeral state (presence, rate-limit counters) β you can drop Redis in some deployments.
Don't make any of this required for the dev/single-node experience. Single-node self-host should run on Postgres alone, with the bus interface no-op'd to an in-memory channel.
// Bus abstraction β same interface, different backends.
type Bus interface {
Publish(ctx context.Context, subject string, payload []byte) error
Subscribe(ctx context.Context, subject string, h Handler) (Subscription, error)
}
// inproc.NewBus() | redis.NewBus(rdb) | nats.NewJetStreamBus(js)
13.4 Realtime β Cache invalidation rule
WS events invalidate Query cache. They never write directly to client stores.
Why: WS messages can arrive out of order, can be dropped, can be replayed. Cache invalidation is idempotent; direct writes are not.
ws.on("issue.updated", ({ id }) => {
queryClient.invalidateQueries(["issue", id])
})
14. π¨ Email, Notifications & Inbox
14.1 Three notification surfaces
| Surface | Provider | Use for |
|---|---|---|
| Transactional email | Resend / Postmark / SES | Verify, reset, invite, receipts, dunning |
| In-app inbox | Your own DB | Mentions, comments, status changes, system messages |
| Push / SMS | Twilio / OneSignal / APNS | Mobile-only critical alerts |
14.2 Templates
- Use MJML or React Email for transactional templates. Renders to bulletproof HTML across clients.
- Keep one template per email type. Centralize a "layout" component.
- Plain-text fallback always.
14.3 Per-user preferences
notification_preference (
user_id, workspace_id, channel TEXT, event_type TEXT, enabled BOOL
)
Every email and in-app alert checks preferences before sending. Default new events to "on" β but always allow opt-out with one click.
14.4 Unsubscribe link
- Every transactional email except security/billing has a
List-Unsubscribeheader + footer link. - One-click unsubscribe (
mailto:+ URL). - Persist the opt-out, don't re-send on bounce-back-then-recreate.
14.5 In-app inbox
Same data shape as email events. Render a bell icon with unread count + a list view. Keys:
notificationrows:user_id,workspace_id,kind,payload JSONB,read_at.- WS push for live updates.
- Mark-all-read endpoint.
14.6 Digesting / batching
For high-volume events (chat mentions, comment replies):
- Real-time push if user is online.
- Otherwise, batch into a digest email (hourly/daily), configurable per user.
15. π¦ File Storage, Uploads & CDN
15.1 The cardinal rule
Never proxy file bytes through your API server. Client uploads directly to S3 via signed URL.
[Client] ββGET /upload-urlβββΊ [API] ββsigned PUT URLβββΊ [Client]
[Client] ββPUTββββββββββββββββββββββββββββββββββββββββΊ [S3]
[Client] ββPOST /confirmβββΊ [API] (records metadata)
15.2 Server-issued signed URLs
url := s3.PresignPutObject(ctx, bucket, key, ttl=15min, contentType=..., maxSize=...)
Always set:
- TTL (15 min usually).
Content-Typeconstraint.Content-Lengthmax (defense against unbounded uploads).- Tenant-scoped key prefix:
s3://your-bucket/<workspace_id>/<file_id>.
15.3 File metadata
file (
id UUID PK,
workspace_id UUID,
uploader_user_id UUID,
filename TEXT,
mime_type TEXT,
size_bytes BIGINT,
s3_key TEXT,
sha256 TEXT,
status TEXT, -- pending | uploaded | scanned | quarantined
created_at TIMESTAMPTZ
)
15.4 Virus / content scanning
- For user-uploaded files, scan on upload (S3 event β Lambda / worker β ClamAV / proprietary).
- Until scanned, mark
status = pendingand refuse to serve.
15.5 Serving private files
- Generate signed GET URLs (5β60 min TTL), or
- Stream from server with auth check (only for small / sensitive files).
15.6 CDN
- Cloudflare or CloudFront in front of S3.
- Use signed CloudFront URLs for private content.
- Public assets (avatars, public docs) get a permanent path with cache-busting via content hash.
16. π Search (Full-Text + Semantic)
16.1 Start with Postgres
CREATE INDEX idx_issue_search ON issue
USING GIN (to_tsvector('english', title || ' ' || coalesce(content, '')));
pg_trgm adds typo tolerance:
CREATE INDEX idx_issue_title_trgm ON issue USING GIN (title gin_trgm_ops);
This carries you to ~10M rows easily.
16.2 Move to a search engine when you need
- Fuzzy search across many fields with relevance tuning β Meilisearch or Typesense (both excellent DX).
- Massive scale + analytics β Elasticsearch / OpenSearch.
- Replicate from Postgres via CDC (Debezium) or write-on-write triggers.
16.3 Vector / semantic search
CREATE EXTENSION vector;
ALTER TABLE document ADD COLUMN embedding vector(1536);
CREATE INDEX ON document USING hnsw (embedding vector_cosine_ops);
Generate embeddings via OpenAI / local model in a worker after content changes. Don't generate them in the request path.
16.4 Hybrid search
Combine BM25 (keyword) and vector (semantic) with reciprocal rank fusion:
score(doc) = 1/(k + rank_bm25) + 1/(k + rank_vector)
This dramatically beats either alone for product search.
17. π© Feature Flags & Experiments
17.1 Three flag scopes
flag β environment (dev/staging/prod)
β workspace (tenant-level rollout)
β user (individual override)
Every flag check resolves: env default β workspace override β user override.
17.2 Use a service
- Self-host: PostHog, Unleash, GrowthBook.
- Hosted: LaunchDarkly, Statsig.
- DIY: simple
flagtable + Redis cache β fine for β€ 50 flags.
17.3 The kill-switch culture
Every risky new feature ships behind a flag. Rule: "if it's not behind a flag, it can't ship."
if flags.IsEnabled(ctx, "new_billing_engine", workspaceID) {
return newPath()
}
return oldPath()
After 2 weeks of stable rollout: clean up the flag and the dead branch.
17.4 Experiments / A-B tests
Ship via the same flag system with a randomized assignment. Log assignment + outcome to your analytics warehouse. Decide significance with a stats library or PostHog's experiment view β don't eyeball.
18. π Audit Logs, Activity Feeds & Telemetry
18.1 Three different things, often confused
| Concept | Audience | Retention | Mutability |
|---|---|---|---|
| Audit log | Compliance / security teams | Years | Immutable, append-only |
| Activity feed | End users ("Alice changed the title") | Months | Mutable summaries OK |
| Telemetry / analytics | Your team (product/eng) | Monthsβyears | Aggregated, anonymized |
Don't try to use one table for all three.
18.2 Audit log table
audit_log (
id UUID PK,
workspace_id UUID,
actor_user_id UUID NULL,
actor_type TEXT, -- user | api_key | system
action TEXT, -- "issue.delete", "billing.plan.change", "auth.login"
target_type TEXT,
target_id UUID,
metadata JSONB,
ip_address INET,
user_agent TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- never UPDATE or DELETE this table; partition by month
Log every privileged action: settings change, role change, billing change, member invite/remove, file deletion, login, password change, MFA enable/disable.
18.3 Activity feed
For end-user "what happened to my project":
activity (
id, workspace_id, actor_user_id, verb, object_type, object_id, metadata, created_at
)
Render with templates: "{actor} {verb} {object}".
18.4 Export
Enterprise plan users want audit log export (CSV / JSON / Splunk-compatible). Build the endpoint behind a feature flag.
(... to be continued...) Read Part 2 here https://viblo.asia/p/the-saas-template-playbook-part-2-2vJPdW2MJeK
If you found this helpful, let me know by leaving a π or a comment!, or if you think this post could help someone, feel free to share it! Thank you very much! π
All Rights Reserved