# The data-driven thesis

xly is sold to many printing-industry customers, each of whom wants the ERP
to behave a little differently — different forms, different reports,
different approval rules, sometimes different stored procedures. The naive
solution is a fork per customer: copy the codebase, modify, deploy. That
is unmaintainable past two or three customers.

xly's solution is the opposite: **a single codebase, a single deployment,
and per-customer behaviour expressed as data**. The application's modules,
forms, fields, dropdowns, permissions, document numbering, even the URL
slugs are all rows in metadata tables (`gdsmodule`, `gdsconfigformmaster`,
`gdsconfigformslave`, `gdsroute`, `gdsjurisdiction`, `gdsformconst`, …).

The runtime is an *interpreter*. When a request comes in, the framework
loads the relevant rows, joins the user's tenant context onto them, and
renders the resulting form / list / report on demand. The Java code is
generic; the application's behaviour is in the database. PMs (not
engineers) own the metadata and therefore own the application.

## The cost

Three costs are baked into this design and worth being explicit about:

1. **Per-request metadata reads.** Every page load runs five queries
   on cache miss: `gdsconfigformmaster` (with personalize/customslave
   overlays for the matching slave rows), `gdsformconst`,
   `sysjurisdiction` (per-user grants — the map key is named
   `gdsjurisdiction` but the actual table read is `sysjurisdiction`;
   skipped for ADMIN), `sysbillnosettings`, `sysreport`. The runtime
   caches aggressively, but those reads are unavoidable on cache miss.

2. **A schema that won't stop growing.** New module = a row in
   `gdsmodule` plus 1-50 rows in `gdsconfigformslave` plus a backing
   physical table (often per-document-type). The base-table count climbs
   as more business modules are introduced; production tenants typically
   carry more tables than a clean dev schema, since every customer-
   bespoke module survives in the shared schema.

3. **Relationships are conventions, not constraints.** With FKs disabled
   for performance and migration agility, every join from
   `gdsconfigformmaster.sParentId` to `gdsmodule.sId` (and a hundred
   similar joins) is a [semantic FK](semantic-fk.md). Orphan rows are
   possible.

## What the design enables (and what each enabler still costs)

- **One codebase serves dozens of customers.** Each customer's tenant
  has its own metadata rows; the Java is identical. — *Limit:* it
  *doesn't* serve all customers. The 18 directories under
  `script/客户/` (see [Slice 5](../slices/05-customer-sql-override.md))
  are the wall the data-driven design hits — when a customer needs
  different procedural logic, "single codebase" stops being true and
  becomes "single Java codebase + a fan-out of customer-specific SQL
  the database carries silently".
- **PMs evolve the application without engineering time.** They open
  BACK, add a module, define a form, set permissions, and the next user
  load shows the change. — *Limit:* the PM's effective vocabulary is
  whatever `gdsconfigformmaster` / `gdsconfigformslave` columns
  expose. Anything genuinely new (a custom calculation, a non-standard
  validation, a different save path) requires a stored procedure —
  which takes engineering time again, just in SQL instead of Java. And
  PMs without DB access can't reason about why their metadata change
  produced wrong output, because the procedural side is invisible from
  BACK.
- **Customizations are layered "cleanly"** ([Slice 4](../slices/04-custom-field.md)):
  per-tenant overrides sit *on top of* the shared base without forking.
  — *Limit:* the cleanliness is a Java-side property. The runtime
  merge logic in `BusinessBaseServiceImpl` is non-trivial (3,500+
  lines), debugging "why does this tenant see field X but not Y"
  involves chasing through `gdsconfigformpersonalize` +
  `gdsconfigformcustomslave` + `gdsconfigformuserslave` interactions.
  And the overlay model can't `ALTER TABLE` — adding a real new
  column still needs a coordinated schema migration.

A more candid reading: the data-driven design **shifts complexity
out of Java and into the database and the PM-built metadata**. The
total complexity isn't lower; it's redistributed to people and tools
the framework can't compile-check.

## When it breaks down

Data-driven works until a customer needs behaviour that can't be expressed
as metadata — different SQL, different procedure body, an aggregation rule
that doesn't fit the framework's vocabulary. xly's response is the
[per-customer SQL override channel](../slices/05-customer-sql-override.md):
hand-written SQL committed to `script/客户/<customer>/` and applied
directly to that customer's schema, bypassing the framework entirely.

It's worth being blunt about what this means. "Bypassing the framework"
makes the entire data-driven thesis a *partial* property of the system.
For the 18 customers under `script/客户/` the runtime is **no longer
single-codebase** — the Java is shared but the actual proc bodies
running on each customer's DB diverge, with no automated way to
detect drift. A reviewer reading `Sp_SalSalesCheck` in source has no
guarantee it's what runs in production for any given customer. The
"escape hatch" framing is generous; in practice the override channel
has become the standard answer for material business-logic
differences, which is the failure mode the data-driven design was
supposed to prevent.

## What this means for reading the wiki

Every slice in this wiki documents one *application of the thesis*. Slice 1
is the metadata read on a CRUD module — the canonical instance. Slice 2 is multi-tenant scoping
through every layer. Slice 3 is the read-only / view-backed variant. Slice
4 is the customization overlay. Slice 5 is the escape hatch when the
overlay isn't enough. Together they cover the data-driven design from
its centre out.