Commit ab4cf6c6c484a4b0d35edbd4966cd96b564fcc4b
1 parent
ff2ee55a
docs: en wiki — strip apologetic framing; state design choices with their drawbacks
Editorial pass per user direction: stop justifying the architecture.
For every "why this design works" passage, name the costs the design
imposes — not as a parenthetical aside but as substantive critical
analysis. Each major architectural-claim page now carries an explicit
drawbacks/costs section.
Pages revised:
concepts/thesis.md
- "The reward" → "What the design enables (and what each enabler still costs)":
for each promised benefit (single codebase, PMs evolve without
engineering, customisations layered cleanly), name the limit. Added
closing observation that data-driven design redistributes complexity
to people and tools the framework can't compile-check.
- "When it breaks down": rewrote to call out that "bypassing the
framework" via 18 customer dirs makes the data-driven thesis
partial, not complete.
concepts/semantic-fk.md
- "Why xly disabled FKs": added critical analysis. Both reasons
could be addressed surgically; the chosen "no FKs anywhere" is the
trade for DB-enforced integrity, paid every day the system runs.
concepts/master-slave.md
- "Slave naming caveat": stop framing retention as wise pragmatism.
The naming was a poor choice; preservation has a real ongoing cost.
concepts/modules-forms-vtables.md
- "Three nouns, one engine": the universal dispatch path concentrates
3,500+ lines + edge cases + special-case hardcodes in one class.
Naming the trade.
concepts/multi-tenancy.md
- "How the design scales" → "How the design scales — and where it
doesn't": shared schema = shared contention; tenant-filter index
discipline; no physical hard-delete; rigid (sBrandsId,
sSubsidiaryId) tenancy unit.
concepts/customization-channels.md
- Soften "90%+ should live here" claim — that's an aspirational
target, not a measured fact. The 18 customer override directories
are evidence the channel-2 demand is non-trivial.
concepts/api-surface.md
- "Why three tiers, not one" → "Why three tiers (and what splitting
them costs)": three WARs to deploy, duplicate code, no shared
session, three reverse-proxy entries. Note the alternative
(single-WAR with package boundaries) and what that would cost
vs gain.
reference/maintainer/proc-dispatch.md
- "Why dynamic proc dispatch matters": added five concrete costs
(no compile-time check, no type safety, no call-site discoverability,
no static analysis, broken stack traces). Reframed: dynamic
dispatch made it cheap to keep adding procs, which made the pile
grow, which made the pile harder to audit.
reference/maintainer/cache-invalidation.md
- New "Drawbacks of this design" section: confusing co-named systems,
eviction in same transaction as write (silent corruption on
Redis outage), allEntries=true blunt eviction, no batching,
direct DB writes bypass everything. Also fixed the "if cache is
local" hedge in section 3 (we've now empirically confirmed Redis-
backed, so cache is shared).
reference/maintainer/bi-engine.md
- New "Drawbacks of the homebrewed approach" section: every chart
needs a SQL author, charts run heavy SQL on OLTP DB, no semantic
consistency between charts, no drill-down, customer-divergent KPI
logic. Also dedup'd the duplicated "What this is not" section.
reference/maintainer/sql-templates.md
- "Why this is a 'template' library and not a code generator" →
added costs: no enforcement, no regeneration, no template-origin
tracking, customer overrides drift from scaffold. The 1,687 procs
the schema carries are the evidence that "discipline rather than
enforcement" doesn't fully hold.
reference/maintainer/activiti.md
- "Why this design works for xly's audience" → "Why xly avoided
Activiti — and what that costs": scattered workflow logic, no
central audit trail, no parallel-branch/reassignment, invisible
flow-graph evolution, idle Activiti engine paying boot cost
anyway.
- "Why xly bothered with Activiti at all" → "Why xly bothered with
Activiti — and whether it was worth it": named the costs (second
engine, second schema, second auth surface, modeler UI to learn)
and the damning fact that on this dev DB the engine is idle. A
future cleanup could plausibly remove Activiti entirely.
reference/maintainer/runtime.md
- New "What 'universal CRUD' means in practice" section: 3,500-line
single-point-of-failure class, no type system on Map<String,Object>,
poor discoverability ("what endpoints write to table X" is
unanswerable). The trade: adding a module is essentially free,
touching the runtime essentially never is.
- Updated cache-invalidation cross-link to drop the "open question"
hedge (now empirically resolved).
slices/04-custom-field.md
- "Why it works without code changes" → "Why it works without code
changes — and what that costs": merge runs on every request,
three near-empty tables on every schema, display-only extension
(real persisted fields still need ALTER TABLE), debuggability
requires diffing 3 overlay tables.
slices/05-customer-sql-override.md
- Added drawbacks: no version control on the deployed body, no
type-safety bridge, compounds the BI problem. Reframed the
"right rule of thumb": 18 customer override directories suggest
the channel-2 demand is structural, not exceptional — that's
evidence the metadata model isn't expressive enough, not a
celebration of the escape hatch.
slices/06-hardware.md
- "The cleanest story xly tells about an awkward problem" →
removed the "cleanest" framing. Added costs of "DB as the only
contract": no backpressure, no request/response, bridge-side
state invisible to the framework, three layers of polling
multiply latency, hardest code (byte protocols) gets least CI.
A real-time-aware architecture would use streaming end-to-end;
xly's choice trades latency, observability, flow control for
operational simplicity. Liveable for press tempo, not for
faster shop-floor signals.
Showing
16 changed files
with
499 additions
and
126 deletions
en/docs/concepts/api-surface.md
| @@ -17,22 +17,51 @@ database is what makes their separation work — internal-API writes show | @@ -17,22 +17,51 @@ database is what makes their separation work — internal-API writes show | ||
| 17 | up to external-API reads automatically because both run against the | 17 | up to external-API reads automatically because both run against the |
| 18 | same schema. | 18 | same schema. |
| 19 | 19 | ||
| 20 | -## Why three tiers, not one | 20 | +## Why three tiers (and what splitting them costs) |
| 21 | 21 | ||
| 22 | -Each tier answers a different question, and bundling them would | ||
| 23 | -sacrifice clarity: | 22 | +Each tier was originally split off to answer a different question: |
| 24 | 23 | ||
| 25 | - **Internal** is large (universal CRUD over all metadata-driven | 24 | - **Internal** is large (universal CRUD over all metadata-driven |
| 26 | modules), volatile (changes when the framework changes), and | 25 | modules), volatile (changes when the framework changes), and |
| 27 | intentionally untyped (the SPA decides what to ask for, server obeys). | 26 | intentionally untyped (the SPA decides what to ask for, server obeys). |
| 28 | - **External** is curated (only the endpoints integrators are allowed | 27 | - **External** is curated (only the endpoints integrators are allowed |
| 29 | to use), versioned by `sApiCode`, and authenticated with bearer | 28 | to use), versioned by `sApiCode`, and authenticated with bearer |
| 30 | - tokens — it survives across framework changes precisely because it's | ||
| 31 | - small and explicit. | 29 | + tokens. |
| 32 | - **Inbound webhooks** receive untrusted bodies from third-party | 30 | - **Inbound webhooks** receive untrusted bodies from third-party |
| 33 | systems and route them to xly handlers. The Swagger UI lives here | 31 | systems and route them to xly handlers. The Swagger UI lives here |
| 34 | because that audience benefits most from interactive documentation. | 32 | because that audience benefits most from interactive documentation. |
| 35 | 33 | ||
| 34 | +The split has real costs that the wiki should not gloss over: | ||
| 35 | + | ||
| 36 | +- **Three WARs to deploy, monitor, and version-pin.** A new release | ||
| 37 | + has to ship coordinated builds of `xlyEntry`, `xlyApi`, and | ||
| 38 | + `xlyInterface`. Mismatches (e.g., a schema change in `xlyEntry` | ||
| 39 | + that `xlyApi` hasn't picked up) are silent until the call path | ||
| 40 | + hits them. | ||
| 41 | +- **Duplicate code.** `RequestAddParamUtil` exists in both | ||
| 42 | + `xlyPersist` (for `xlyEntry`) and `xlyApi` (near-identical 56-vs-57 | ||
| 43 | + line copy). `InterfaceController` exists in both `xlyApi` and | ||
| 44 | + `xlyInterface` with overlapping `/interfaceDefine/callthirdparty/*` | ||
| 45 | + endpoints. Keeping the two halves in sync is operational discipline, | ||
| 46 | + not a compile-time guarantee. | ||
| 47 | +- **No shared session.** A user authenticated in BACK has no | ||
| 48 | + session in `xlyApi` — external callers fetch a separate bearer | ||
| 49 | + token. This is correct for *external* integrators but means | ||
| 50 | + internal cross-WAR calls (rare in practice, common in temptation) | ||
| 51 | + have to go through the public token flow. | ||
| 52 | +- **Three context-paths means three reverse-proxy entries.** | ||
| 53 | + The mapping from `BACK=:8597` and `FROUNT=:8598` to the actual | ||
| 54 | + WARs lives in nginx config that isn't in this repo. Misconfigured | ||
| 55 | + proxies are a common failure mode the codebase can't catch. | ||
| 56 | + | ||
| 57 | +Could the split have been a single deployable with internal package | ||
| 58 | +boundaries? Yes — Spring Boot supports it. The benefit of that | ||
| 59 | +alternative would be: one build, one set of dependencies, one | ||
| 60 | +session story, no duplicate utility classes. The cost: harder to | ||
| 61 | +scale tiers independently, harder to rate-limit external callers | ||
| 62 | +without affecting the SPA. xly chose the deployment-time isolation; | ||
| 63 | +the wiki's job is to acknowledge what that choice traded away. | ||
| 64 | + | ||
| 36 | ## What each tier looks like at runtime | 65 | ## What each tier looks like at runtime |
| 37 | 66 | ||
| 38 | - **Internal** — see [the five-key read](../reference/maintainer/runtime.md#the-five-key-read). One | 67 | - **Internal** — see [the five-key read](../reference/maintainer/runtime.md#the-five-key-read). One |
en/docs/concepts/customization-channels.md
| @@ -24,8 +24,12 @@ They are visible in the BACK UI so a PM can audit them. The framework's | @@ -24,8 +24,12 @@ They are visible in the BACK UI so a PM can audit them. The framework's | ||
| 24 | runtime reads them on every request (with caching). The Java code is | 24 | runtime reads them on every request (with caching). The Java code is |
| 25 | unchanged; the application's behaviour is what those rows say it is. | 25 | unchanged; the application's behaviour is what those rows say it is. |
| 26 | 26 | ||
| 27 | -This is the default path. **90%+ of customer customizations should live | ||
| 28 | -here.** | 27 | +This is the path the architecture intends customers to use. Whether |
| 28 | +the actual ratio is 90/10 in favour of Channel 1 isn't measured | ||
| 29 | +anywhere; the empirical signal is that 18 customer directories | ||
| 30 | +under `script/客户/` exist, which is a non-trivial slice of the | ||
| 31 | +customer base needing what Channel 1 can't express. Take "90%+ | ||
| 32 | +should live here" as an aspirational target, not a measured fact. | ||
| 29 | 33 | ||
| 30 | ## Channel 2 — Per-customer SQL overrides | 34 | ## Channel 2 — Per-customer SQL overrides |
| 31 | 35 |
en/docs/concepts/master-slave.md
| @@ -64,13 +64,16 @@ appears verbatim in 14k+ table and column names. | @@ -64,13 +64,16 @@ appears verbatim in 14k+ table and column names. | ||
| 64 | ## "Slave" — naming caveat | 64 | ## "Slave" — naming caveat |
| 65 | 65 | ||
| 66 | The term carries connotations in English that are absent from the | 66 | The term carries connotations in English that are absent from the |
| 67 | -Chinese 主表 / 从表. The wiki preserves "slave" because: | 67 | +Chinese 主表 / 从表. The wiki preserves "slave" verbatim because the |
| 68 | +codebase, schema, and auto-catalog use it in 14k+ identifiers and any | ||
| 69 | +translation would diverge from what developers actually grep. | ||
| 68 | 70 | ||
| 69 | -1. Renaming would break every cross-reference into the codebase, the | ||
| 70 | - schema, and the auto-catalog (14k+ identifiers). | ||
| 71 | -2. Mapping every occurrence to "detail" or "child" would distort | ||
| 72 | - searchability and produce wiki text that diverges from what | ||
| 73 | - developers actually grep. | ||
| 74 | - | ||
| 75 | -Future xly versions may rebrand to "detail" / "header"; until then, the | ||
| 76 | -wiki uses the in-codebase term verbatim and notes it once here. | 71 | +That preservation has a cost. The naming was a poor choice in the |
| 72 | +first place — `主表 / 从表` translates straightforwardly as | ||
| 73 | +`master / detail` or `header / line`, both of which would have | ||
| 74 | +matched both English convention and the actual relational | ||
| 75 | +semantics. The cost of retaining "slave" is borne by every English- | ||
| 76 | +speaking maintainer who has to type or read the term, and by any | ||
| 77 | +future rebrand effort that has to do the schema-wide rename xly | ||
| 78 | +should have done at the start. The wiki documenting it once here | ||
| 79 | +doesn't remove the cost; it just acknowledges it. |
en/docs/concepts/modules-forms-vtables.md
| @@ -87,10 +87,21 @@ sub-tabs. | @@ -87,10 +87,21 @@ sub-tabs. | ||
| 87 | ## Three nouns, one engine | 87 | ## Three nouns, one engine |
| 88 | 88 | ||
| 89 | The runtime — `BusinessBaseController` and `BusinessBaseServiceImpl`, | 89 | The runtime — `BusinessBaseController` and `BusinessBaseServiceImpl`, |
| 90 | -documented in [Slice 1](../slices/01-hello-world.md) — knows how to | ||
| 91 | -render any module / form / virtual-table combination. There is no | ||
| 92 | -per-module Java code. PMs creating new modules are creating new rows; | ||
| 93 | -they are not creating new code paths. | 90 | +documented in [Slice 1](../slices/01-hello-world.md) — handles every |
| 91 | +module / form / virtual-table combination through one universal | ||
| 92 | +dispatch path. There is no per-module Java; PMs creating new modules | ||
| 93 | +are creating new rows. | ||
| 94 | + | ||
| 95 | +The flip side: that one engine has accumulated 3,500+ lines in | ||
| 96 | +`BusinessBaseServiceImpl` alone, plus another 800+ in | ||
| 97 | +`BusinessGdsconfigformsServiceImpl`. Edge cases, special-case | ||
| 98 | +table handling (e.g., the `mftproductionplanslave` hardcode at | ||
| 99 | +`BusinessBaseServiceImpl.java:1768`), per-tenant overlay merge | ||
| 100 | +logic, and the multi-tenant scope-bypass list all live in this | ||
| 101 | +single class. Adding a new feature that the universal dispatch | ||
| 102 | +doesn't handle means either expanding this class or writing | ||
| 103 | +custom code that bypasses it — both of which erode the "one | ||
| 104 | +engine handles everything" property the design promised. | ||
| 94 | 105 | ||
| 95 | ## Business-data table prefixes | 106 | ## Business-data table prefixes |
| 96 | 107 |
en/docs/concepts/multi-tenancy.md
| @@ -69,7 +69,7 @@ tenant. That's the catastrophic data-leak case. Three places to watch: | @@ -69,7 +69,7 @@ tenant. That's the catastrophic data-leak case. Three places to watch: | ||
| 69 | doesn't validate the supplied table against the form's authorised | 69 | doesn't validate the supplied table against the form's authorised |
| 70 | tables, this is a privilege-escalation surface. See [Slice 1](../slices/01-hello-world.md#4-user-edits-a-row-clicks-save). | 70 | tables, this is a privilege-escalation surface. See [Slice 1](../slices/01-hello-world.md#4-user-edits-a-row-clicks-save). |
| 71 | 71 | ||
| 72 | -## How the design scales | 72 | +## How the design scales — and where it doesn't |
| 73 | 73 | ||
| 74 | The framework's multi-tenancy design scales by **row count**, not by | 74 | The framework's multi-tenancy design scales by **row count**, not by |
| 75 | code. A small SaaS deployment with one brand and one subsidiary uses | 75 | code. A small SaaS deployment with one brand and one subsidiary uses |
| @@ -77,3 +77,31 @@ exactly the same Java, MyBatis mappers, and stored procedures as a | @@ -77,3 +77,31 @@ exactly the same Java, MyBatis mappers, and stored procedures as a | ||
| 77 | deployment with dozens of brands × dozens of subsidiaries × several | 77 | deployment with dozens of brands × dozens of subsidiaries × several |
| 78 | editions; only the row distributions in `gdsmodule`, `sisversionflow`, | 78 | editions; only the row distributions in `gdsmodule`, `sisversionflow`, |
| 79 | and the business-data tables differ. | 79 | and the business-data tables differ. |
| 80 | + | ||
| 81 | +Scaling by row count is operationally simple but has limits the | ||
| 82 | +wiki should not paper over: | ||
| 83 | + | ||
| 84 | +- **Shared physical schema means shared resource contention.** | ||
| 85 | + Every tenant's queries hit the same MySQL instance, same tables, | ||
| 86 | + same indexes. A heavy report on tenant A's data competes for | ||
| 87 | + buffer-pool space and CPU with tenant B's order entry. There is | ||
| 88 | + no per-tenant resource isolation. | ||
| 89 | +- **Tenant filters in every WHERE clause.** Every read query | ||
| 90 | + carries `sBrandsId = ? AND sSubsidiaryId = ?`. Indexes have to | ||
| 91 | + lead with these columns to be useful — and almost all xly tables | ||
| 92 | + do, by convention, but a maintainer adding a new index has to | ||
| 93 | + remember. Forgetting produces a query plan that scans across all | ||
| 94 | + tenants' rows and silently slows down once the table gets large. | ||
| 95 | +- **No physical hard-delete boundary.** A tenant offboarding does | ||
| 96 | + not drop a database; it leaves the rows where they are | ||
| 97 | + (sometimes marked `bInvalid`, sometimes deleted, sometimes | ||
| 98 | + untouched). Permanent removal requires a custom cleanup script | ||
| 99 | + per tenant. From a GDPR / data-residency angle, "this tenant | ||
| 100 | + is gone" is hard to prove. | ||
| 101 | +- **`sBrandsId` / `sSubsidiaryId` everywhere is an inflexible | ||
| 102 | + tenancy unit.** "Tenant" means exactly the `(sBrandsId, | ||
| 103 | + sSubsidiaryId)` tuple. Alternate cuts (e.g., per-region access, | ||
| 104 | + per-department access without sub-tenanting) don't fit the model | ||
| 105 | + and would require parallel scoping columns. The model assumed | ||
| 106 | + this shape would always be right for every customer; in | ||
| 107 | + practice, it has been, but it's a hard commitment. |
en/docs/concepts/semantic-fk.md
| @@ -13,7 +13,7 @@ before reading any further. | @@ -13,7 +13,7 @@ before reading any further. | ||
| 13 | 13 | ||
| 14 | ## Why xly disabled FKs | 14 | ## Why xly disabled FKs |
| 15 | 15 | ||
| 16 | -Two reasons given by the architecture, both pragmatic: | 16 | +Two reasons the architecture gives: |
| 17 | 17 | ||
| 18 | 1. **Bulk-write performance.** Mass inserts (work-order calculation, | 18 | 1. **Bulk-write performance.** Mass inserts (work-order calculation, |
| 19 | month-end closures, batch imports) write hundreds of thousands of | 19 | month-end closures, batch imports) write hundreds of thousands of |
| @@ -23,8 +23,28 @@ Two reasons given by the architecture, both pragmatic: | @@ -23,8 +23,28 @@ Two reasons given by the architecture, both pragmatic: | ||
| 23 | 2. **Schema-migration agility.** xly evolves quickly: new modules, | 23 | 2. **Schema-migration agility.** xly evolves quickly: new modules, |
| 24 | new fields, new tables. With FKs, every schema change has to consider | 24 | new fields, new tables. With FKs, every schema change has to consider |
| 25 | the constraint graph; without them, a `CREATE TABLE` or `ALTER TABLE` | 25 | the constraint graph; without them, a `CREATE TABLE` or `ALTER TABLE` |
| 26 | - is a local operation. The cost of that agility is borne at runtime | ||
| 27 | - by the application code. | 26 | + is a local operation. |
| 27 | + | ||
| 28 | +Both are real considerations, but neither is a slam-dunk argument for | ||
| 29 | +"zero FKs across the entire schema": | ||
| 30 | + | ||
| 31 | +- **Bulk-write performance** can be addressed surgically: disable | ||
| 32 | + constraints during the batch (`SET FOREIGN_KEY_CHECKS = 0`), | ||
| 33 | + re-enable after, validate. xly's choice was instead to not have | ||
| 34 | + FKs at all, which means *every* read also pays the cost of trusting | ||
| 35 | + ad-hoc proc validation rather than DB-enforced integrity. | ||
| 36 | +- **Schema-migration agility** is improved by no-FKs, but at the | ||
| 37 | + price of moving every referential check into application code (or | ||
| 38 | + forgetting it). In practice this means the integrity work an FK | ||
| 39 | + would do automatically is now duplicated across hundreds of stored | ||
| 40 | + procedures, with no compile-time guarantee any given proc actually | ||
| 41 | + does the check (see Failure modes below). | ||
| 42 | + | ||
| 43 | +A more honest framing: the system traded **DB-enforced integrity** | ||
| 44 | +for **operational convenience at write time and DDL time**. The | ||
| 45 | +bug surface that trade introduced (orphan rows, cross-tenant | ||
| 46 | +references that go undetected, integrity bugs surfacing weeks | ||
| 47 | +later) is the cost paid every day the system runs. | ||
| 28 | 48 | ||
| 29 | ## What a "semantic FK" is | 49 | ## What a "semantic FK" is |
| 30 | 50 |
en/docs/concepts/thesis.md
| @@ -43,28 +43,61 @@ Three costs are baked into this design and worth being explicit about: | @@ -43,28 +43,61 @@ Three costs are baked into this design and worth being explicit about: | ||
| 43 | similar joins) is a [semantic FK](semantic-fk.md). Orphan rows are | 43 | similar joins) is a [semantic FK](semantic-fk.md). Orphan rows are |
| 44 | possible. | 44 | possible. |
| 45 | 45 | ||
| 46 | -## The reward | ||
| 47 | - | ||
| 48 | -In exchange xly gets: | 46 | +## What the design enables (and what each enabler still costs) |
| 49 | 47 | ||
| 50 | - **One codebase serves dozens of customers.** Each customer's tenant | 48 | - **One codebase serves dozens of customers.** Each customer's tenant |
| 51 | - has its own metadata rows; the Java is identical. | 49 | + has its own metadata rows; the Java is identical. — *Limit:* it |
| 50 | + *doesn't* serve all customers. The 18 directories under | ||
| 51 | + `script/客户/` (see [Slice 5](../slices/05-customer-sql-override.md)) | ||
| 52 | + are the wall the data-driven design hits — when a customer needs | ||
| 53 | + different procedural logic, "single codebase" stops being true and | ||
| 54 | + becomes "single Java codebase + a fan-out of customer-specific SQL | ||
| 55 | + the database carries silently". | ||
| 52 | - **PMs evolve the application without engineering time.** They open | 56 | - **PMs evolve the application without engineering time.** They open |
| 53 | BACK, add a module, define a form, set permissions, and the next user | 57 | BACK, add a module, define a form, set permissions, and the next user |
| 54 | - load shows the change. | ||
| 55 | -- **Customizations are layered cleanly** ([Slice 4](../slices/04-custom-field.md)): | 58 | + load shows the change. — *Limit:* the PM's effective vocabulary is |
| 59 | + whatever `gdsconfigformmaster` / `gdsconfigformslave` columns | ||
| 60 | + expose. Anything genuinely new (a custom calculation, a non-standard | ||
| 61 | + validation, a different save path) requires a stored procedure — | ||
| 62 | + which takes engineering time again, just in SQL instead of Java. And | ||
| 63 | + PMs without DB access can't reason about why their metadata change | ||
| 64 | + produced wrong output, because the procedural side is invisible from | ||
| 65 | + BACK. | ||
| 66 | +- **Customizations are layered "cleanly"** ([Slice 4](../slices/04-custom-field.md)): | ||
| 56 | per-tenant overrides sit *on top of* the shared base without forking. | 67 | per-tenant overrides sit *on top of* the shared base without forking. |
| 68 | + — *Limit:* the cleanliness is a Java-side property. The runtime | ||
| 69 | + merge logic in `BusinessBaseServiceImpl` is non-trivial (3,500+ | ||
| 70 | + lines), debugging "why does this tenant see field X but not Y" | ||
| 71 | + involves chasing through `gdsconfigformpersonalize` + | ||
| 72 | + `gdsconfigformcustomslave` + `gdsconfigformuserslave` interactions. | ||
| 73 | + And the overlay model can't `ALTER TABLE` — adding a real new | ||
| 74 | + column still needs a coordinated schema migration. | ||
| 75 | + | ||
| 76 | +A more candid reading: the data-driven design **shifts complexity | ||
| 77 | +out of Java and into the database and the PM-built metadata**. The | ||
| 78 | +total complexity isn't lower; it's redistributed to people and tools | ||
| 79 | +the framework can't compile-check. | ||
| 57 | 80 | ||
| 58 | ## When it breaks down | 81 | ## When it breaks down |
| 59 | 82 | ||
| 60 | Data-driven works until a customer needs behaviour that can't be expressed | 83 | Data-driven works until a customer needs behaviour that can't be expressed |
| 61 | as metadata — different SQL, different procedure body, an aggregation rule | 84 | as metadata — different SQL, different procedure body, an aggregation rule |
| 62 | -that doesn't fit the framework's vocabulary. xly's escape hatch for that | ||
| 63 | -case is the [per-customer SQL override channel](../slices/05-customer-sql-override.md): | 85 | +that doesn't fit the framework's vocabulary. xly's response is the |
| 86 | +[per-customer SQL override channel](../slices/05-customer-sql-override.md): | ||
| 64 | hand-written SQL committed to `script/客户/<customer>/` and applied | 87 | hand-written SQL committed to `script/客户/<customer>/` and applied |
| 65 | directly to that customer's schema, bypassing the framework entirely. | 88 | directly to that customer's schema, bypassing the framework entirely. |
| 66 | -That channel is real and used. It is also the most expensive form of | ||
| 67 | -customization to maintain. | 89 | + |
| 90 | +It's worth being blunt about what this means. "Bypassing the framework" | ||
| 91 | +makes the entire data-driven thesis a *partial* property of the system. | ||
| 92 | +For the 18 customers under `script/客户/` the runtime is **no longer | ||
| 93 | +single-codebase** — the Java is shared but the actual proc bodies | ||
| 94 | +running on each customer's DB diverge, with no automated way to | ||
| 95 | +detect drift. A reviewer reading `Sp_SalSalesCheck` in source has no | ||
| 96 | +guarantee it's what runs in production for any given customer. The | ||
| 97 | +"escape hatch" framing is generous; in practice the override channel | ||
| 98 | +has become the standard answer for material business-logic | ||
| 99 | +differences, which is the failure mode the data-driven design was | ||
| 100 | +supposed to prevent. | ||
| 68 | 101 | ||
| 69 | ## What this means for reading the wiki | 102 | ## What this means for reading the wiki |
| 70 | 103 |
en/docs/reference/maintainer/activiti.md
| @@ -169,26 +169,47 @@ emit audit entries via a custom `sp_add_flow_log`. This is the | @@ -169,26 +169,47 @@ emit audit entries via a custom `sp_add_flow_log`. This is the | ||
| 169 | empirically-observed customisation channel — Activiti deployment | 169 | empirically-observed customisation channel — Activiti deployment |
| 170 | is not seen in any `script/客户/` directory. | 170 | is not seen in any `script/客户/` directory. |
| 171 | 171 | ||
| 172 | -### Why this design works for xly's audience | 172 | +### Why xly avoided Activiti — and what that costs |
| 173 | 173 | ||
| 174 | The printing-industry ERP customers run rule-driven business | 174 | The printing-industry ERP customers run rule-driven business |
| 175 | processes (quote → order → production → delivery → invoice → payment) | 175 | processes (quote → order → production → delivery → invoice → payment) |
| 176 | -where each step is **its own document with its own form** by | ||
| 177 | -convention. A user expects "Now I open the next form and fill it in" | ||
| 178 | -rather than "the system tells me a task is waiting for me." For | ||
| 179 | -that audience: | ||
| 180 | - | ||
| 181 | -- Path 1 + Path 2 cover every observed scenario in this dev DB. | ||
| 182 | -- Path 3's value (BPMN modeling, reassignment, parallel gateways) is | ||
| 183 | - reserved for the rare tenant whose approval graph genuinely needs | ||
| 184 | - it. | ||
| 185 | - | ||
| 186 | -The trade-off: workflow logic is **scattered across stored procedures** | ||
| 187 | -rather than declarable in one place. Adding a new step to a flow | ||
| 188 | -means writing or editing one or more procs, not editing a BPMN | ||
| 189 | -diagram. For complex, frequently-changing flows, this is brittle. | ||
| 190 | -For the printing-shop reality (quote-to-cash chain that doesn't | ||
| 191 | -change much per customer), it's pragmatic. | 176 | +where each step is conventionally its own document with its own form. |
| 177 | +The audience-fit argument: a user expects "Now I open the next form | ||
| 178 | +and fill it in" rather than "the system tells me a task is waiting | ||
| 179 | +for me," so Path 1 + Path 2 cover every observed scenario in this | ||
| 180 | +dev DB, and Path 3 is held in reserve. | ||
| 181 | + | ||
| 182 | +The costs of going proc-based instead of BPMN-based: | ||
| 183 | + | ||
| 184 | +- **Workflow logic is scattered across stored procedures, not | ||
| 185 | + declarable in one place.** Adding a step to "what happens after a | ||
| 186 | + quote is approved" means writing or editing one or more `Sp_*` procs, | ||
| 187 | + re-grepping every other proc that references the affected document, | ||
| 188 | + and hoping nothing was missed. A BPMN engine would have one diagram | ||
| 189 | + to look at. | ||
| 190 | +- **No central audit trail of who approved what when.** `bCheck = 1` | ||
| 191 | + records that *some* approval happened, plus who approved it via the | ||
| 192 | + `sCheckPerson` column — but the *path the document took* (which | ||
| 193 | + steps, in which order, with what comments) lives only in proc-side | ||
| 194 | + status flags, not in a queryable workflow history. | ||
| 195 | +- **No parallel-branch or reassignment semantics.** Path 1 + 2 cover | ||
| 196 | + linear single-approver flows. The first time a customer needs | ||
| 197 | + "two people must approve in parallel", or "if person A is on | ||
| 198 | + vacation, route to person B", the system has to either fall back | ||
| 199 | + to Path 3 (Activiti, currently disabled) or hand-code the routing | ||
| 200 | + in stored procs. | ||
| 201 | +- **Flow-graph evolution is invisible.** Changing the steps of a | ||
| 202 | + workflow means editing procs and document chains. There is no | ||
| 203 | + diff that says "the order-approval flow changed from N steps to | ||
| 204 | + N+1 steps on date X" — only commit history of individual procs. | ||
| 205 | +- **The Activiti engine is on the classpath and booted at runtime | ||
| 206 | + for nothing.** Memory + JAR + schema (24 `act_*` base tables + 3 | ||
| 207 | + identity views) are paid for in every deployment whether they're | ||
| 208 | + used or not. | ||
| 209 | + | ||
| 210 | +For the printing-shop reality the trade has been viable. It would | ||
| 211 | +not scale to a domain with frequently-changing approval flows or | ||
| 212 | +strict audit requirements. | ||
| 192 | 213 | ||
| 193 | ## Activiti is wired — engine ON | 214 | ## Activiti is wired — engine ON |
| 194 | 215 | ||
| @@ -320,21 +341,46 @@ For a flow to actually run, in roughly this order: | @@ -320,21 +341,46 @@ For a flow to actually run, in roughly this order: | ||
| 320 | transitions; downstream queries that filter on `bCheck = 1` start | 341 | transitions; downstream queries that filter on `bCheck = 1` start |
| 321 | seeing it. | 342 | seeing it. |
| 322 | 343 | ||
| 323 | -## Why xly bothered with Activiti at all | 344 | +## Why xly bothered with Activiti — and whether it was worth it |
| 324 | 345 | ||
| 325 | The codebase has its own `biz_flow` / `biz_todo_item` tables that | 346 | The codebase has its own `biz_flow` / `biz_todo_item` tables that |
| 326 | -*could* implement a hand-rolled approval system. The decision to put | ||
| 327 | -Activiti behind them buys: | 347 | +*could* implement a hand-rolled approval system. The arguments for |
| 348 | +putting Activiti behind them: | ||
| 328 | 349 | ||
| 329 | - Standard BPMN modeling (the JS modeler pulls the same stencilset as | 350 | - Standard BPMN modeling (the JS modeler pulls the same stencilset as |
| 330 | Activiti Explorer). | 351 | Activiti Explorer). |
| 331 | -- Free state-machine semantics — the engine handles "task A done → | ||
| 332 | - task B available" without xly maintaining the FSM in SQL. | 352 | +- Engine-managed state-machine semantics — "task A done → task B |
| 353 | + available" without xly maintaining the FSM in SQL. | ||
| 333 | - Diagram rendering (the page-as-PNG in `ProcessActController`). | 354 | - Diagram rendering (the page-as-PNG in `ProcessActController`). |
| 334 | 355 | ||
| 335 | -The cost: a second engine running in the JVM, a second DB schema with | ||
| 336 | -its own DDL drift, a second authentication surface (which xly papers | ||
| 337 | -over via the `act_id_*` views). | 356 | +The costs are not minor: |
| 357 | + | ||
| 358 | +- A second engine running in the JVM, with its own startup cost, | ||
| 359 | + memory footprint, and operational surface. | ||
| 360 | +- A second DB schema (24 `act_*` tables + 3 identity views) that | ||
| 361 | + diverges from xly's `gds*`/`biz*` conventions and needs its own | ||
| 362 | + DDL migrations across Activiti versions (and indeed: see the | ||
| 363 | + 5.17 vs 6.0 version skew elsewhere on this page). | ||
| 364 | +- A second authentication surface that xly papers over via the | ||
| 365 | + `act_id_*` views projecting xly's own users into Activiti's shape | ||
| 366 | + — a hack that works but creates two-way coupling between user- | ||
| 367 | + table changes and Activiti correctness. | ||
| 368 | +- A modeler UI (Angular 1.x era) that maintainers have to learn | ||
| 369 | + separately from BACK. | ||
| 370 | +- And — the most damning cost — **on this dev DB the engine is | ||
| 371 | + idle**. The `act_re_procdef` and `biz_flow` tables are empty, and | ||
| 372 | + Path 1 / Path 2 handle every observed workflow scenario. The | ||
| 373 | + Activiti dependency is paid for at every startup whether it's | ||
| 374 | + exercised or not. | ||
| 375 | + | ||
| 376 | +A more honest framing: Activiti was bet on as the "real" workflow | ||
| 377 | +solution; in practice the simpler proc-driven paths covered the | ||
| 378 | +actual demand. The wiring stayed because removing it isn't free | ||
| 379 | +either, but the value the engine delivers in the current deployment | ||
| 380 | +is approximately zero. A future cleanup could plausibly remove | ||
| 381 | +Activiti entirely and consolidate on the document-chain pattern, | ||
| 382 | +trading away the *option* of BPMN-style flows for a smaller | ||
| 383 | +codebase and one fewer schema to maintain. | ||
| 338 | 384 | ||
| 339 | ## What this page is *not* | 385 | ## What this page is *not* |
| 340 | 386 |
en/docs/reference/maintainer/bi-engine.md
| @@ -148,21 +148,44 @@ several `Sp_SalesOrder_Kpi*` procs (matches the | @@ -148,21 +148,44 @@ several `Sp_SalesOrder_Kpi*` procs (matches the | ||
| 148 | [per-customer SQL override channel](../../slices/05-customer-sql-override.md) | 148 | [per-customer SQL override channel](../../slices/05-customer-sql-override.md) |
| 149 | — customers who want different KPI rules ship their own proc). | 149 | — customers who want different KPI rules ship their own proc). |
| 150 | 150 | ||
| 151 | -## Why this matters | ||
| 152 | - | ||
| 153 | -xly's BI layer demonstrates the data-driven thesis at scale: | ||
| 154 | - | ||
| 155 | -1. **Adding a new dashboard card requires no Java change** — a PM | ||
| 156 | - inserts a `gdsconfigcharmaster` row pointing at a `Sp_chart_*` proc, | ||
| 157 | - sets `sCharType` and `iWidth`, the SPA picks it up on the next | ||
| 158 | - `getModelBysId` cache miss. | ||
| 159 | -2. **Adding a new chart proc** does require a SQL author (the proc | ||
| 160 | - has to follow the standard tenant-scoped shape so generic dispatch | ||
| 161 | - can call it through `CharServiceImpl`). | ||
| 162 | -3. **No OLAP cube, no MDX, no semantic layer.** Each chart is a | ||
| 163 | - purpose-built SQL stored procedure. This trades reusability for | ||
| 164 | - simplicity — perfect-fit aggregations, no general-purpose ad-hoc | ||
| 165 | - query builder. | 151 | +## Drawbacks of the homebrewed approach |
| 152 | + | ||
| 153 | +The metadata + per-chart-proc design is consistent with xly's data- | ||
| 154 | +driven thesis, and it avoids carrying a heavy OLAP engine. The costs: | ||
| 155 | + | ||
| 156 | +1. **Every new chart needs a SQL author.** "PM adds a metadata row" | ||
| 157 | + is true *after* an engineer has written the matching `Sp_chart_*` | ||
| 158 | + proc. There is no aggregation builder, no field-picker, no auto- | ||
| 159 | + generated query — every metric is a hand-coded stored procedure | ||
| 160 | + the engineering team has to write, review, and maintain. The | ||
| 161 | + 20-proc catalogue and 11 chart types are the **whole** set of | ||
| 162 | + shapes the system can render today. | ||
| 163 | +2. **Charts run heavy SQL on the OLTP DB.** No warehouse, no | ||
| 164 | + pre-aggregation, no incremental rollup. A "today's profit" | ||
| 165 | + chart is a SELECT against the live transactional schema. | ||
| 166 | + Heavy customers will see chart loads contend with order-entry | ||
| 167 | + load on the same MySQL instance. Caching helps, but only on hit; | ||
| 168 | + the first load after metadata change pays full cost. | ||
| 169 | +3. **No semantic consistency between charts.** Each `Sp_chart_*` | ||
| 170 | + proc decides for itself how to compute "monthly profit", "today's | ||
| 171 | + sales", etc. Two charts purporting to show the same metric can | ||
| 172 | + silently disagree because they're separate proc bodies. A real | ||
| 173 | + semantic layer would prevent that; the homebrewed model can't. | ||
| 174 | +4. **No drill-down, no slice-and-dice.** Each chart is a frozen | ||
| 175 | + query shape. Users can't pivot on different dimensions or drill | ||
| 176 | + from a summary card into the underlying transactions without an | ||
| 177 | + engineer authoring a separate proc for each path. | ||
| 178 | +5. **Customer-divergent KPI logic.** Customers under | ||
| 179 | + `script/客户/` ship their own `spKPImodule` and | ||
| 180 | + `Sp_SalesOrder_Kpi*` overrides — different KPI math per | ||
| 181 | + customer, in code that lives only on that customer's DB. This | ||
| 182 | + makes "what does this KPI mean" depend on which schema the | ||
| 183 | + reader is connected to. | ||
| 184 | + | ||
| 185 | +The simpler design is fine for "show me the same 20 cards xly has | ||
| 186 | +always shown". It is not fine if the goal is ad-hoc analytics or | ||
| 187 | +self-service reporting — those would require a separate semantic / | ||
| 188 | +warehouse layer that xly does not have. | ||
| 166 | 189 | ||
| 167 | ## What this is *not* | 190 | ## What this is *not* |
| 168 | 191 |
en/docs/reference/maintainer/cache-invalidation.md
| @@ -117,14 +117,47 @@ against the DB does **not** trigger any cleaner. The cache will serve | @@ -117,14 +117,47 @@ against the DB does **not** trigger any cleaner. The cache will serve | ||
| 117 | stale metadata until either: | 117 | stale metadata until either: |
| 118 | 118 | ||
| 119 | 1. The cache TTL expires (check the cache config for the actual TTL). | 119 | 1. The cache TTL expires (check the cache config for the actual TTL). |
| 120 | -2. A bounce of the application servers (one node at a time if the | ||
| 121 | - cache is local; once if shared). | 120 | +2. A bounce of the application servers (one bounce suffices since the |
| 121 | + cache is Redis-backed and shared — see above). | ||
| 122 | 3. A manual call to one of the | 122 | 3. A manual call to one of the |
| 123 | `BusinessCleanRedisDataImpl.delCleanRedisDataByTableName(<table>, …)` | 123 | `BusinessCleanRedisDataImpl.delCleanRedisDataByTableName(<table>, …)` |
| 124 | - methods is invoked from inside the application (e.g., via a | ||
| 125 | - maintenance endpoint). Note this clears whatever the local | ||
| 126 | - `CacheManager` is bound to; if that turns out to be in-memory, | ||
| 127 | - the cleanup must run on every node. | 124 | + methods is invoked from inside the application — once, on any |
| 125 | + node, since it clears the shared Redis store. | ||
| 126 | + | ||
| 127 | +## Drawbacks of this design | ||
| 128 | + | ||
| 129 | +The synchronous `@CacheEvict`-during-save model is operationally | ||
| 130 | +simple and (with Redis backing) genuinely cross-node coherent. It is | ||
| 131 | +also fragile in ways worth naming: | ||
| 132 | + | ||
| 133 | +- **Two systems with confusingly similar names.** The JMS path | ||
| 134 | + `CHANGE_GDS_MODULE` + `ConsumerChangeGdsModuleThread` *sounds* | ||
| 135 | + like it should be cache invalidation but isn't. This page exists | ||
| 136 | + partly because that conflation is a recurring source of bugs and | ||
| 137 | + reader confusion. A renaming pass (proc and queue → e.g. | ||
| 138 | + `MERGE_BASE_GDS_MODULE`) would help, but isn't free. | ||
| 139 | +- **Eviction is in the same transaction as the write.** If the | ||
| 140 | + Redis call fails mid-save, the row commits but the cache stays | ||
| 141 | + stale. The framework does not detect or recover from this; a | ||
| 142 | + Redis outage during save silently corrupts the cache for | ||
| 143 | + affected rows until TTL expiry. | ||
| 144 | +- **Eviction is "all or nothing per cache region".** Most | ||
| 145 | + `@CacheEvict` annotations on `CleanRedisServiceImpl` use | ||
| 146 | + `allEntries=true`, which dumps the entire cache region rather | ||
| 147 | + than the affected key. Heavy save throughput causes high | ||
| 148 | + cache-miss rates immediately afterwards — fine for small | ||
| 149 | + metadata caches, expensive when dropping a region with thousands | ||
| 150 | + of entries. | ||
| 151 | +- **No invalidation budget / batching.** Bulk metadata changes | ||
| 152 | + (e.g., editing 100 form fields) trigger 100 `@CacheEvict` fires, | ||
| 153 | + each one round-tripping to Redis. There is no mechanism to | ||
| 154 | + coalesce evictions into one batch. | ||
| 155 | +- **Direct DB writes bypass everything.** Any tooling that touches | ||
| 156 | + the schema outside `BusinessBaseServiceImpl` — including database | ||
| 157 | + admin scripts, `script/客户/` overrides applied via `mysql` | ||
| 158 | + command line, and Channel-2 SQL replacements — leaves the cache | ||
| 159 | + stale until manually invalidated. This is a real operational | ||
| 160 | + hazard for the deployment pattern xly actually uses. | ||
| 128 | 161 | ||
| 129 | ## Common bug: the cache is the bug | 162 | ## Common bug: the cache is the bug |
| 130 | 163 | ||
| @@ -135,10 +168,11 @@ old value", check (in this order): | @@ -135,10 +168,11 @@ old value", check (in this order): | ||
| 135 | 2. Did the change go through a path that invokes | 168 | 2. Did the change go through a path that invokes |
| 136 | `BusinessCleanRedisData`? (Direct DB writes or controllers that | 169 | `BusinessCleanRedisData`? (Direct DB writes or controllers that |
| 137 | bypass `BusinessBaseServiceImpl` won't.) | 170 | bypass `BusinessBaseServiceImpl` won't.) |
| 138 | -3. Is the cache shared across nodes (Redis-backed) or local | ||
| 139 | - (`ConcurrentMapCacheManager`)? Confirm by inspecting the active | ||
| 140 | - `CacheManager` bean on a running node. | ||
| 141 | -4. If the cache is local, did every node get the eviction call? | 171 | +3. Was Redis reachable when the save committed? A failed eviction |
| 172 | + does not roll back the save. | ||
| 173 | +4. Is the change in a cache region that's evicted by the table that | ||
| 174 | + was written? `CleanRedisServiceImpl` maps writes to specific | ||
| 175 | + regions; an unmapped table will not invalidate its readers. | ||
| 142 | 176 | ||
| 143 | The five-key composite returned by | 177 | The five-key composite returned by |
| 144 | [`getModelBysId` in Slice 1](../../slices/01-hello-world.md) | 178 | [`getModelBysId` in Slice 1](../../slices/01-hello-world.md) |
en/docs/reference/maintainer/proc-dispatch.md
| @@ -43,9 +43,38 @@ by name lets the framework call any proc the metadata names without a | @@ -43,9 +43,38 @@ by name lets the framework call any proc the metadata names without a | ||
| 43 | code change. The framework treats the proc as a black box: name in, | 43 | code change. The framework treats the proc as a black box: name in, |
| 44 | parameters in, result out. | 44 | parameters in, result out. |
| 45 | 45 | ||
| 46 | -The downside: the runtime cannot statically know which procs exist or | ||
| 47 | -what their effects are. A typo in `gdsmodule.sSaveProName` produces a | ||
| 48 | -runtime "proc not found" error, not a compile error. | 46 | +That convenience comes with substantial costs that are worth being |
| 47 | +explicit about: | ||
| 48 | + | ||
| 49 | +- **No compile-time check** on proc names. A typo in | ||
| 50 | + `gdsmodule.sSaveProName` produces a runtime "proc not found" | ||
| 51 | + error, not a compile error. Refactoring a proc name requires | ||
| 52 | + hand-grepping the metadata; the IDE can't help. | ||
| 53 | +- **No type safety on parameters.** The framework binds parameters | ||
| 54 | + positionally from a `Map<String, Object>`. A proc whose signature | ||
| 55 | + changed but whose callers didn't is a runtime crash with no IDE | ||
| 56 | + warning. | ||
| 57 | +- **No call-site discoverability.** "Which Java code calls | ||
| 58 | + `Sp_SalSalesCheck`?" can't be answered by IDE find-usages because | ||
| 59 | + no Java code does — `gdsmodule` rows do. Maintainers must search | ||
| 60 | + *both* metadata tables *and* the SQL bodies of other procs that | ||
| 61 | + may invoke this one. | ||
| 62 | +- **Effectively no static analysis.** Side effects of any given | ||
| 63 | + proc are invisible to anyone who hasn't read the proc body. A | ||
| 64 | + `Sp_SalSalesCheck` named in `gdsmodule.sProcName` could be a | ||
| 65 | + read-only SELECT or could be doing INSERTs and UPDATEs across a | ||
| 66 | + dozen tables; the framework treats them identically. | ||
| 67 | +- **Stack traces that stop at the boundary.** Java errors thrown | ||
| 68 | + from inside a proc surface as a generic `BadSqlGrammarException` | ||
| 69 | + or `MySQLSyntaxErrorException`. To get the real error you have | ||
| 70 | + to enable MyBatis SQL logging and re-run. | ||
| 71 | + | ||
| 72 | +A more honest framing: hard-wiring 1000+ procs in Java would be | ||
| 73 | +painful, but most of that pain comes from xly *having* 1000+ procs | ||
| 74 | +in the first place. Dynamic dispatch made it cheap to keep adding | ||
| 75 | +them, which made the pile grow, which made the pile harder to | ||
| 76 | +audit. The mechanism is what it is; the *amount* of behaviour | ||
| 77 | +pushed into the SQL layer is the more interesting design question. | ||
| 49 | 78 | ||
| 50 | ## The conventions procs follow | 79 | ## The conventions procs follow |
| 51 | 80 |
en/docs/reference/maintainer/runtime.md
| @@ -221,6 +221,38 @@ Two flagged in slices that belong here permanently: | @@ -221,6 +221,38 @@ Two flagged in slices that belong here permanently: | ||
| 221 | load entirely for `UserType.ADMIN`. ADMIN account governance must | 221 | load entirely for `UserType.ADMIN`. ADMIN account governance must |
| 222 | come from outside the app. | 222 | come from outside the app. |
| 223 | 223 | ||
| 224 | +## What "universal CRUD" means in practice | ||
| 225 | + | ||
| 226 | +The "one controller writes any row in any table" pattern is the | ||
| 227 | +core data-driven move. It also concentrates risk: | ||
| 228 | + | ||
| 229 | +- **`BusinessBaseServiceImpl` is ~3,500 lines** of tightly | ||
| 230 | + intertwined logic: per-tenant scope-bypass list, special-case | ||
| 231 | + table hardcodes (`mftproductionplanslave` at line 1768), | ||
| 232 | + pre/post-save hook dispatch, sTable-driven write routing. Every | ||
| 233 | + bug fix has to navigate the whole class. | ||
| 234 | +- **The class is the single point of failure for the entire | ||
| 235 | + business runtime.** A regression in `addUpdateDelBusinessData` | ||
| 236 | + breaks save for every form in every tenant simultaneously. | ||
| 237 | + Module-specific controllers would localise the blast radius; | ||
| 238 | + the universal one cannot. | ||
| 239 | +- **No type system on `Map<String, Object>`.** The frontend ships | ||
| 240 | + a bag of (key, value) pairs. The runtime trusts the keys | ||
| 241 | + match column names and the values cast to the column types. | ||
| 242 | + Mismatches surface as `BadSqlGrammarException` at the DAO layer | ||
| 243 | + — far from where the wrong value originated. There is no | ||
| 244 | + schema-aware request validation. | ||
| 245 | +- **Discoverability is poor.** "What endpoints write to | ||
| 246 | + `mftproductionplanslave`?" can't be answered by IDE find-usages | ||
| 247 | + — the answer is "any controller that calls | ||
| 248 | + `BusinessBaseServiceImpl.addBusinessData` with `sTable` set to | ||
| 249 | + `mftproductionplanslave`", which is everything. | ||
| 250 | + | ||
| 251 | +The universal pattern is what makes the data-driven thesis work. | ||
| 252 | +It is also the reason adding a new module is essentially free | ||
| 253 | +*and* the reason that touching the runtime is essentially never | ||
| 254 | +free. | ||
| 255 | + | ||
| 224 | ## Cache invalidation | 256 | ## Cache invalidation |
| 225 | 257 | ||
| 226 | When BACK saves a metadata change, the save service synchronously | 258 | When BACK saves a metadata change, the save service synchronously |
| @@ -229,5 +261,5 @@ calls `BusinessCleanRedisData.delCleanRedisData*`, which fires | @@ -229,5 +261,5 @@ calls `BusinessCleanRedisData.delCleanRedisData*`, which fires | ||
| 229 | A separate JMS path (`ConsumerChangeGdsModuleThread`) exists with a | 261 | A separate JMS path (`ConsumerChangeGdsModuleThread`) exists with a |
| 230 | similar name but does base-data merging via stored proc, not cache | 262 | similar name but does base-data merging via stored proc, not cache |
| 231 | invalidation. See [cache invalidation on metadata change](cache-invalidation.md) | 263 | invalidation. See [cache invalidation on metadata change](cache-invalidation.md) |
| 232 | -for the full story (including the open question about cross-node | ||
| 233 | -coherence). | 264 | +for the full story (cross-node coherence is empirically Redis-backed, |
| 265 | +no longer an open question). |
en/docs/reference/maintainer/sql-templates.md
| @@ -61,20 +61,39 @@ the target schema. | @@ -61,20 +61,39 @@ the target schema. | ||
| 61 | document family the proc operates on. | 61 | document family the proc operates on. |
| 62 | - Other placeholders depending on the scaffold. | 62 | - Other placeholders depending on the scaffold. |
| 63 | 63 | ||
| 64 | -## Why this is a "template" library and not a code generator | 64 | +## "Template" library, not a code generator — and what that costs |
| 65 | 65 | ||
| 66 | The framework does **not** auto-generate procs from these templates | 66 | The framework does **not** auto-generate procs from these templates |
| 67 | -based on metadata. The scaffolds exist because xly's procs follow a | ||
| 68 | -common conventional shape; copying the scaffold ensures the new proc: | 67 | +based on metadata. The scaffolds are convention-enforcing copy-paste |
| 68 | +starters, nothing more. They exist to nudge a new proc into the | ||
| 69 | +shape that [generic dispatch](proc-dispatch.md) can call: | ||
| 69 | 70 | ||
| 70 | -- Accepts the standard parameter list `(sGuid, sFormGuid, sLoginId, sBrId, sSuId)` | ||
| 71 | - that [generic dispatch](proc-dispatch.md) can call. | ||
| 72 | -- Returns success/error via the standard `OUT sCode INT, OUT sReturn LONGTEXT`. | 71 | +- Standard parameter list `(sGuid, sFormGuid, sLoginId, sBrId, sSuId)`. |
| 72 | +- Returns success/error via `OUT sCode INT, OUT sReturn LONGTEXT`. | ||
| 73 | - Honours the multi-tenant filter `sBrandsId = sBrId AND sSubsidiaryId = sSuId`. | 73 | - Honours the multi-tenant filter `sBrandsId = sBrId AND sSubsidiaryId = sSuId`. |
| 74 | 74 | ||
| 75 | -A proc that *doesn't* follow these conventions cannot be invoked | ||
| 76 | -through generic dispatch and would have to be called from custom Java | ||
| 77 | -code instead. | 75 | +Costs of staying at "template" instead of "generator": |
| 76 | + | ||
| 77 | +- **No enforcement.** A proc that drifts from the convention compiles | ||
| 78 | + fine. The framework discovers the mismatch at runtime as a | ||
| 79 | + `BadSqlGrammarException` or wrong-shaped result. There is no | ||
| 80 | + pre-merge check. | ||
| 81 | +- **No regeneration.** When the convention itself changes (e.g., a | ||
| 82 | + new standard `OUT` param), the existing procs do not update. | ||
| 83 | + Engineers have to grep + rewrite, with no automation. | ||
| 84 | +- **No knowledge of which proc came from which template.** A proc in | ||
| 85 | + the live DB doesn't record its origin scaffold; understanding what | ||
| 86 | + was customised away requires diffing against the scaffold by hand. | ||
| 87 | +- **Customer overrides under `script/客户/` can — and do — diverge | ||
| 88 | + from the scaffold shape.** This is reasonable per customer but | ||
| 89 | + means the conventions are observed by social contract, not by | ||
| 90 | + any mechanical check. | ||
| 91 | + | ||
| 92 | +A real code-generation pipeline (template + metadata → emitted SQL, | ||
| 93 | +checked in or applied at deploy time) would catch these. The | ||
| 94 | +trade xly made: less tooling to maintain, but discipline-rather- | ||
| 95 | +than-enforcement on proc shapes — visible in the 1,687 procs the | ||
| 96 | +schema currently carries, not all of which follow the conventions. | ||
| 78 | 97 | ||
| 79 | ## Two loaders | 98 | ## Two loaders |
| 80 | 99 |
en/docs/slices/04-custom-field.md
| @@ -112,15 +112,36 @@ ignored at merge time. A maintainer audit script that flags such orphans | @@ -112,15 +112,36 @@ ignored at merge time. A maintainer audit script that flags such orphans | ||
| 112 | is on the [Maintainer Reference](../reference/maintainer/runtime.md)'s | 112 | is on the [Maintainer Reference](../reference/maintainer/runtime.md)'s |
| 113 | TODO list. | 113 | TODO list. |
| 114 | 114 | ||
| 115 | -## Why it works without code changes | ||
| 116 | - | ||
| 117 | -The end-customer never asks an engineer for a new column. They open the | ||
| 118 | -BACK builder, add the row, the field appears in FROUNT for their tenant | ||
| 119 | -only. The system's other tenants are untouched. That single-codebase | ||
| 120 | -property is what xly's data-driven thesis ([Concepts → Thesis](../concepts/thesis.md)) | ||
| 121 | -buys — at the cost of the runtime cost of merging metadata on every | ||
| 122 | -request, plus the schema bloat of three customization tables that most | ||
| 123 | -forms never use. | 115 | +## Why it works without code changes — and what that costs |
| 116 | + | ||
| 117 | +The end-customer never asks an engineer for a new column for the | ||
| 118 | +*display* side. They open the BACK builder, add the row, the field | ||
| 119 | +appears in FROUNT for their tenant only. The system's other tenants | ||
| 120 | +are untouched. | ||
| 121 | + | ||
| 122 | +The price for that property: | ||
| 123 | + | ||
| 124 | +- **The merge runs on every request** (not just on overlay-row | ||
| 125 | + changes). Even tenants with zero `gdsconfigformcustomslave` rows | ||
| 126 | + pay the runtime cost of checking — the framework can't tell upfront | ||
| 127 | + whether a tenant has overrides, so the merge code path runs always. | ||
| 128 | +- **Three near-empty tables on every schema.** The three customization | ||
| 129 | + tables exist whether the tenant uses them or not. In this dev DB | ||
| 130 | + `gdsconfigformcustomslave` has 0 rows; the table is still indexed, | ||
| 131 | + backed up, and queried. | ||
| 132 | +- **Display extension only.** The overlay can render an extra field; | ||
| 133 | + it cannot store its value unless the underlying physical table | ||
| 134 | + already has the column. So "no code change for a new field" is true | ||
| 135 | + only for *display-only* fields. Real new persisted fields still | ||
| 136 | + need a coordinated `ALTER TABLE` (Slice 5 territory) — which means | ||
| 137 | + the wins from "no code change" don't apply to the cases that | ||
| 138 | + actually move business value. | ||
| 139 | +- **Debuggability gets worse.** "Why does tenant A see this field | ||
| 140 | + but tenant B doesn't?" requires diffing | ||
| 141 | + `gdsconfigformcustomslave` + `gdsconfigformpersonalize` + | ||
| 142 | + `gdsconfigformuserslave` rows for both tenants. The merge logic in | ||
| 143 | + `BusinessBaseServiceImpl` is non-trivial; reproducing the exact | ||
| 144 | + layout a user sees often means re-running the merge by hand. | ||
| 124 | 145 | ||
| 125 | ## Concepts this slice introduces | 146 | ## Concepts this slice introduces |
| 126 | 147 |
en/docs/slices/05-customer-sql-override.md
| @@ -90,8 +90,9 @@ framework doesn't know; the framework can't tell. | @@ -90,8 +90,9 @@ framework doesn't know; the framework can't tell. | ||
| 90 | 90 | ||
| 91 | This makes overrides: | 91 | This makes overrides: |
| 92 | 92 | ||
| 93 | -- **Powerful.** Anything you can write in MySQL stored-procedure SQL, | ||
| 94 | - you can use to replace standard behaviour. | 93 | +- **Capable in the technical sense.** Anything you can write in MySQL |
| 94 | + stored-procedure SQL can replace standard behaviour. (This isn't a | ||
| 95 | + good thing per se — see drawbacks below.) | ||
| 95 | - **Operationally fragile.** The override must be re-applied (or kept | 96 | - **Operationally fragile.** The override must be re-applied (or kept |
| 96 | alive) whenever the customer's schema is rebuilt, restored, or | 97 | alive) whenever the customer's schema is rebuilt, restored, or |
| 97 | migrated. It does not travel with backups of the codebase, only with | 98 | migrated. It does not travel with backups of the codebase, only with |
| @@ -101,10 +102,25 @@ This makes overrides: | @@ -101,10 +102,25 @@ This makes overrides: | ||
| 101 | the proc on the live DB is a different piece of code with the same | 102 | the proc on the live DB is a different piece of code with the same |
| 102 | name. Stack traces and "what does this proc do" depend on which | 103 | name. Stack traces and "what does this proc do" depend on which |
| 103 | schema you're connected to. | 104 | schema you're connected to. |
| 104 | - | ||
| 105 | -The right rule of thumb: prefer Slice-4 metadata customization. Reach | ||
| 106 | -for Slice-5 SQL overrides only when the metadata model genuinely cannot | ||
| 107 | -express what the customer needs. | 105 | +- **No version control on the deployed body.** The `.sql` file in |
| 106 | + `script/客户/` shows what *should* have been applied. There is no | ||
| 107 | + audit trail confirming what *was* applied (or when, or by whom), | ||
| 108 | + and no automated re-apply on schema rebuild. | ||
| 109 | +- **No type-safety bridge.** When the override changes a result-set | ||
| 110 | + shape, every Java caller that reads from `Sp_SalSalesCheck` may | ||
| 111 | + silently break for that one customer with a `BadSqlGrammarException` | ||
| 112 | + or — worse — a wrong-shaped row that propagates as a wrong number. | ||
| 113 | +- **Compounds the BI problem.** Charts on customers with overridden | ||
| 114 | + procs ([bi-engine.md](../reference/maintainer/bi-engine.md)) | ||
| 115 | + will silently disagree across tenants because the underlying data | ||
| 116 | + is computed by different SQL. | ||
| 117 | + | ||
| 118 | +The "prefer Slice 4, reach for Slice 5 only as last resort" advice is | ||
| 119 | +correct in principle, but the existence of 18 customer directories | ||
| 120 | +suggests that in practice this channel has become the standard answer | ||
| 121 | +for material business-logic differences. That's a signal the metadata | ||
| 122 | +model isn't expressive enough for the actual customer-customisation | ||
| 123 | +demand the system encounters — not a celebration of the escape hatch. | ||
| 108 | 124 | ||
| 109 | ## Worked-example: 重庆展印's `Sp_SalSalesCheck` vs the standard | 125 | ## Worked-example: 重庆展印's `Sp_SalSalesCheck` vs the standard |
| 110 | 126 |
en/docs/slices/06-hardware.md
| @@ -83,25 +83,50 @@ other data: | @@ -83,25 +83,50 @@ other data: | ||
| 83 | 83 | ||
| 84 | ## The framework / hardware boundary | 84 | ## The framework / hardware boundary |
| 85 | 85 | ||
| 86 | -This is the cleanest story xly tells about an awkward problem: | 86 | +xly's response to the press-PLC problem is a strict separation: |
| 87 | 87 | ||
| 88 | - **Above the line (xlyEntry, xlyApi, all the metadata machinery): | 88 | - **Above the line (xlyEntry, xlyApi, all the metadata machinery): |
| 89 | generic framework.** No knowledge of presses, PLCs, byte protocols. | 89 | generic framework.** No knowledge of presses, PLCs, byte protocols. |
| 90 | - **Below the line (xlyPlc): hardware-specific.** Knows how to talk to a | 90 | - **Below the line (xlyPlc): hardware-specific.** Knows how to talk to a |
| 91 | press. | 91 | press. |
| 92 | 92 | ||
| 93 | -The two communicate only through the database. The bridge writes rows; | ||
| 94 | -the framework reads rows. There's no RPC, no shared in-process state, | ||
| 95 | -no callback. This makes xlyPlc: | ||
| 96 | - | ||
| 97 | -- Independently deployable (and several customers run it on a machine | ||
| 98 | - next to the press, separate from the central ERP server). | ||
| 99 | -- Independently failable: if the bridge crashes, the framework keeps | ||
| 100 | - running on stale machine-state data. If the framework is down, the | ||
| 101 | - bridge keeps writing — when the framework comes back, it sees the | ||
| 102 | - buffered rows. | ||
| 103 | -- Hard to test end-to-end without an actual press. Most CI tests stub | ||
| 104 | - the PLC reads. | 93 | +The two communicate only through the database — the bridge writes rows, |
| 94 | +the framework reads rows. No RPC, no shared in-process state, no | ||
| 95 | +callback. The benefits: | ||
| 96 | + | ||
| 97 | +- Independently deployable; some customers run xlyPlc on a machine next | ||
| 98 | + to the press, separate from the central ERP server. | ||
| 99 | +- Independently failable: if the bridge crashes the framework serves | ||
| 100 | + stale machine-state data; if the framework is down the bridge keeps | ||
| 101 | + writing and the framework picks up the buffered rows on recovery. | ||
| 102 | + | ||
| 103 | +The costs of "DB as the only contract" are real and worth naming: | ||
| 104 | + | ||
| 105 | +- **No backpressure.** If the bridge writes faster than xly can ingest | ||
| 106 | + (or if a slow `mftProduceReportMachineState` index update piles up), | ||
| 107 | + the bridge has no signal to slow down — it just blocks on the next | ||
| 108 | + INSERT. There is no flow-control message between the two halves. | ||
| 109 | +- **No request/response semantics.** The framework cannot ask the | ||
| 110 | + bridge "is the press alive right now?" — it can only read whatever | ||
| 111 | + the bridge last wrote, which may be seconds-to-minutes old depending | ||
| 112 | + on the cron cadence. | ||
| 113 | +- **Bridge-side state is invisible to the framework.** "Why is the | ||
| 114 | + bridge not writing?" requires logging into the bridge host to read | ||
| 115 | + its log; the framework UI shows only the absence of new rows. | ||
| 116 | +- **Cron polling in both directions.** xlyPlc polls the press; the | ||
| 117 | + framework polls the DB; the SPA polls the framework. Three layers | ||
| 118 | + of polling means latency from "press state changes" to "user sees | ||
| 119 | + it" is `cron interval * 3` in the worst case. | ||
| 120 | +- **Hard to test end-to-end without an actual press.** Most CI tests | ||
| 121 | + stub the PLC reads, which means the bridge's most error-prone code | ||
| 122 | + (byte protocol per press model) gets the least automated coverage. | ||
| 123 | + | ||
| 124 | +A real-time-aware architecture would use a streaming channel | ||
| 125 | +(MQTT / Kafka / WebSocket) end-to-end instead of cron + DB. xly's | ||
| 126 | +choice is operationally simpler but trades off latency, observability, | ||
| 127 | +and flow control. For the printing-press tempo (machine state changes | ||
| 128 | +every few seconds, reports every minute) the trade is liveable; for | ||
| 129 | +faster shop-floor signals it would not be. | ||
| 105 | 130 | ||
| 106 | ## Concepts this slice introduces | 131 | ## Concepts this slice introduces |
| 107 | 132 |