# vibe_erp — Architecture Design (v1) **Status:** Approved (brainstorm output, pre-implementation plan) **Date:** 2026-04-07 **Scope:** High-level architecture for the entire framework. Implementation plans for individual PBCs and the v1 cut will be written separately. --- ## 1. Context and intent `vibe_erp` is an **ERP/EBC framework** (not an ERP application) targeting the **printing industry**, intended to be **sold worldwide** and deployed **self-hosted-first** with a managed/hosted version added later. The reference business documentation under `raw/业务流程设计文档/` describes one example printing shop and is treated as a *fixture / acceptance test*, never as a specification — no part of its workflow is hard-coded into the core. The design satisfies — and in several cases *establishes* — the architectural guardrails in `CLAUDE.md`. The pre-existing guardrails (1–6) plus documentation discipline: 1. Core stays domain-agnostic (no printing terms in the core) 2. Workflows are data, not code 3. Extensibility seams come first 4. The reference customer is a test, not a requirement 5. Multi-tenant from day one in spirit 6. Global / i18n from day one (plus "Documentation discipline" as a separate section in CLAUDE.md) This design adds five more guardrails to CLAUDE.md (numbered 7–11), derived from validating the current design against 2026 SOTA (Gartner's **Composable ERP** frame, the **MACH** principles, SAP S/4HANA's **Clean Core** extension model, and ERPNext / Frappe's **metadata-driven Doctype** system): 7. **Clean Core** (extensions never modify the core; A/B/C/D extension grading) 8. **Two-tier extensibility** (key-user no-code metadata + developer pro-code plug-ins, both first-class) 9. **PBC boundaries are sacred** (modular monolith with strict bounded contexts; PBCs never import each other) 10. **`api.v1` is the only stable contract** (semver-governed; everything else is internal) 11. **AI agents are a first-class client** (REST/OpenAPI surface must be MCP-callable; v1.0 architects the seam, v1.1 ships the endpoint) --- ## 2. Foundational decisions | Decision | Choice | Why | |---|---|---| | Deployment model | **Self-hosted-first**, hosted later, same artifact for both | User requirement; matches Odoo/ERPNext/Tryton/SAP S/4HANA self-host story | | Architecture style | **Modular monolith** with strict bounded contexts (PBCs) | MACH allows "modularity OR microservices"; modular monolith is operationally sane for self-host; every successful self-hostable ERP made the same call | | Backend language | **Kotlin on the JVM** | Mature ERP ecosystem (Hibernate, Flowable, ICU4J, JasperReports, PF4J), modern ergonomics, large global hiring pool | | Backend framework | **Spring Boot** | De facto JVM application framework; PF4J integrates cleanly; Spring Data JPA, Spring Security, Actuator are all standard | | Workflow engine | **Embedded Flowable (BPMN 2.0)** | Workflows-as-data is non-negotiable; BPMN is the standard; embedding avoids extra processes | | Persistence | **PostgreSQL** as the only mandatory external dependency | Matches every modern open-source ERP; excellent JSONB + RLS support, both critical to this design | | Multi-tenancy | **Row-level `tenant_id` + Postgres RLS (defense in depth)** | Same code path for self-host (one tenant) and hosted (many tenants); no schema explosion; two independent walls against data leaks | | Custom fields | **JSONB `ext` column on every business table**, described by metadata rows | One row, one read, indexable via GIN, no migrations needed for additions, no joins; EAV is the wrong tool | | Plug-in framework | **PF4J + Spring Boot child contexts** | Classloader isolation, manifest-based lifecycle, cleanest plug-in story on the JVM | | Web client (v1) | **React + TypeScript SPA** | Single SPA covers desktop and tablet office workflows | | Mobile client | **React Native (v2, not v1)** | Defer until core API is stable; reuses TS types from web | | API style | **REST + OpenAPI**, MCP-callable surface | OpenAPI is the universal integration standard; the MCP server is a separate v1.1 deliverable, the seam exists in v1.0 | | Reporting | **JasperReports** | Mature, customer-skinnable, JVM-native | | i18n | **ICU MessageFormat (ICU4J) + Spring `MessageSource`** | Plurals, gender, number/date formatting, locale fallback — all required for "sold worldwide" | | Auth | **Built-in JWT** + **OIDC** (Keycloak-compatible) | Self-hosters get something out of the box; enterprise customers get SSO from day one | --- ## 3. Topology ``` ┌──────────────────────────────────────────────────────────────────────┐ │ Customer's network │ │ │ │ Browser (React SPA) ─┐ │ │ AI agent (MCP, v1.1)─┼─► Reverse proxy ──► vibe_erp backend (1 image)│ │ 3rd-party system ─┘ │ │ │ │ │ │ Inside the image (one Spring Boot process): │ │ │ ┌─────────────────────────────────────┐ │ │ │ │ HTTP layer (REST + OpenAPI + MCP) │ │ │ │ ├─────────────────────────────────────┤ │ │ │ │ Public Plug-in API (api.v1.*) │◄──┤ loaded from │ │ │ — the only stable contract │ │ ./plugins/*.jar │ │ ├─────────────────────────────────────┤ │ via PF4J │ │ │ Core PBCs (modular monolith): │ │ │ │ │ identity · catalog · partners · │ │ │ │ │ inventory · warehousing · │ │ │ │ │ orders-sales · orders-purchase · │ │ │ │ │ production · quality · finance │ │ │ │ ├─────────────────────────────────────┤ │ │ │ │ Cross-cutting: │ │ │ │ │ • Flowable (workflows-as-data) │ │ │ │ │ • Metadata store (Doctype-style) │ │ │ │ │ • i18n (ICU MessageFormat) │ │ │ │ │ • Reporting (JasperReports) │ │ │ │ │ • Job scheduler (Quartz) │ │ │ │ │ • Audit, security, events │ │ │ │ └─────────────────────────────────────┘ │ │ │ ▼ │ │ PostgreSQL (mandatory) │ │ File store (local or S3) │ └──────────────────────────────────────────────────────────────────────┘ Optional sidecars for larger deployments (off by default): • Keycloak (OIDC) • Redis (cache + queue) • OpenSearch (search) • SMTP relay ``` The PBC names above are **illustrative core capabilities**; none is printing-specific. Printing-specific behavior lives in plug-ins under `./plugins/`. --- ## 4. Two-tier extensibility (the "Clean Core" model) The framework supports **two extension paths**, modeled on SAP S/4HANA's clean-core extensibility levels. ### Tier 1 — Key user, no-code Business analysts customize the system through the web UI. Everything they create is stored as **rows in the metadata tables**, scoped to their tenant, and tagged `source = 'user'` so it's preserved across plug-in install/uninstall and core upgrades. | Capability | Stored in | |---|---| | Custom field on an existing entity | `metadata__custom_field` → JSONB `ext` column at runtime | | Custom form layout | `metadata__form` (JSON Schema + UI Schema) | | Custom list view, filter, column set | `metadata__list_view` | | Custom workflow | `metadata__workflow` → deployed to Flowable as BPMN | | Simple "if X then Y" automation | `metadata__rule` | | Custom entity (Doctype-style) | `metadata__entity` → auto-generated table at apply time | | Custom report | `metadata__report` | | Translations override | `metadata__translation` | No build, no restart, no deploy. The OpenAPI spec, the AI-agent function catalog, and the REST API auto-update from the metadata. ### Tier 2 — Developer, pro-code Software developers (the customer's IT, an integrator, or vibe_erp itself) ship a **PF4J plug-in JAR**. The plug-in: - Sees only `org.vibeerp.api.v1.*` — the public, semver-governed contract - Cannot import `org.vibeerp.platform.*` or any PBC's internal classes (rejected by the plug-in linter at install time) - Lives in its own classloader, its own Spring child context, its own DB schema namespace (`plugin___*`), its own metadata-source tag - Can register: new entities, new REST endpoints, new workflow tasks, new form widgets, new report templates, new event listeners, new permissions, new menu entries, new React micro-frontends ### Extension grading (borrowed from SAP) | Grade | Definition | Upgrade safety | |---|---|---| | **A** | Tier 1 only (metadata) | Always safe across any core version | | **B** | Tier 2, uses only `api.v1` stable surface | Safe within a major version | | **C** | Tier 2, uses deprecated-but-supported `api.v1` symbols | Safe until next major; loader emits warnings | | **D** | Tier 2, reaches into internal classes via reflection | UNSUPPORTED; loader rejects unless `--allow-grade-d` is set; will break | A core principle: **anything a Tier 2 plug-in does should also be possible to do as a Tier 1 customization eventually.** Tier 2 is the escape hatch where Tier 1 isn't expressive enough yet. --- ## 5. Module structure (Gradle multi-project) ``` vibe-erp/ ├── api/ │ └── api-v1/ ← THE CONTRACT (semver-governed) │ ├── platform/ ← Framework runtime (internal) │ ├── platform-bootstrap/ │ ├── platform-http/ REST + OpenAPI + MCP host │ ├── platform-security/ AuthN/AuthZ, tenant resolution, OIDC │ ├── platform-persistence/ JPA, multi-tenant routing, RLS, Liquibase │ ├── platform-metadata/ Doctype-equivalent metadata store │ ├── platform-workflow/ Flowable host │ ├── platform-i18n/ ICU MessageFormat, locale resolution │ ├── platform-events/ In-process bus + outbox │ ├── platform-jobs/ Quartz scheduler │ ├── platform-reporting/ JasperReports │ ├── platform-files/ Local + S3 abstraction │ └── platform-plugins/ PF4J host, lifecycle, classloader isolation │ ├── pbc/ ← Core PBCs (each = bounded context) │ ├── pbc-identity/ │ ├── pbc-catalog/ │ ├── pbc-partners/ │ ├── pbc-inventory/ │ ├── pbc-warehousing/ │ ├── pbc-orders-sales/ │ ├── pbc-orders-purchase/ │ ├── pbc-production/ │ ├── pbc-quality/ │ └── pbc-finance/ │ ├── reference-customer/ ← NOT shipped in core │ └── plugin-printing-shop/ Real PF4J plug-in expressing the │ raw/业务流程设计文档/ workflow. │ Built and tested in CI; not loaded by default. │ ├── web/ ← React + TypeScript SPA │ └── docs/ ← Framework documentation ``` ### Dependency rule (strictly enforced) ``` api/api-v1 depends on: nothing (Kotlin stdlib + jakarta.validation only) platform/* depends on: api/api-v1 + Spring + libs pbc/* depends on: api/api-v1 + platform/* (NEVER another pbc) plugins (incl. ref) depend on: api/api-v1 only ``` PBCs communicate **only** through (a) the event bus and (b) service interfaces declared in `api.v1.ext.`. This is the rule that makes "modular monolith now, splittable later" real. ### Per-PBC layout (every PBC follows this) ``` pbc-orders-sales/ ├── api/ ← service contracts re-exported by api.v1 ├── domain/ ← entities, value objects, domain services ├── application/ ← use cases / application services ├── infrastructure/ ← Hibernate mappings, repositories ├── http/ ← REST controllers ├── workflow/ ← BPMN files, task handlers ├── metadata/ ← seed metadata (default forms, rules) ├── i18n/ ← message bundles └── migrations/ ← Liquibase changesets (own table prefix) ``` --- ## 6. The `api.v1` package The single most important contract in the codebase. Everything in `api.v1` is binary-stable within the `1.x` line. Everything not in `api.v1` is internal and can change in any release. ``` org.vibeerp.api.v1 ├── core/ Tenant, Locale, Money, Quantity, Id, Result ├── entity/ Entity, Field, FieldType, EntityRegistry ├── persistence/ Repository, Query, Page, Transaction ├── workflow/ WorkflowTask, WorkflowEvent, TaskHandler ├── form/ FormSchema, UiSchema ├── http/ @PluginEndpoint, RequestContext, ResponseBuilder ├── event/ DomainEvent, EventListener, EventBus ├── security/ Principal, Permission, PermissionCheck ├── i18n/ MessageKey, Translator, LocaleProvider ├── reporting/ ReportTemplate, ReportContext ├── plugin/ Plugin, PluginManifest, ExtensionPoint └── ext/ Typed extension interfaces a plug-in implements ``` `api.v1` is published as `api-v1.jar` to Maven Central so plug-in authors can build against it without pulling the entire vibe_erp source tree. --- ## 7. Plug-in lifecycle ``` 1. Boot ./plugins/*.jar scanned by platform-plugins 2. Manifest plugin.yml read: id, version, requires-api, deps 3. Compatibility rejected if requires-api ≠ current api.v1 major 4. Lint rejected if it imports anything outside api.v1.* 5. Classload PF4J creates an isolated classloader per plug-in 6. Register plug-in's entry class implements api.v1.plugin.Plugin and registers Extensions via @Extension 7. Wire Spring child context per plug-in; plug-in's @Components live there only 8. Migrate plug-in's Liquibase changesets run in plugin___* 9. Seed metadata plug-in's metadata YAML is upserted, tagged with plug-in id 10. Ready endpoints, workflow tasks, forms, reports, listeners live 11. Disable deregister, drop child context; data preserved 12. Uninstall explicit operator action; only then is the schema dropped ``` --- ## 8. Data model and multi-tenancy ### Schema namespacing PBCs and plug-ins use **table name prefixes**, not Postgres schemas: ``` identity__user, identity__role catalog__item, catalog__item_attribute inventory__stock_item, inventory__movement orders_sales__order, orders_sales__order_line production__work_order, production__operation plugin_printingshop__plate_spec (reference plug-in) metadata__custom_field, metadata__form, metadata__workflow flowable_* (Flowable's own tables, untouched) ``` This keeps Hibernate, RLS policies, and migrations all in one logical schema (`public`), avoids `search_path` traps, and gives clean uninstall semantics. ### Tenant isolation - Every business table has `tenant_id`, NOT NULL - Hibernate `@TenantId` filters every query at the application layer - Postgres Row-Level Security policies filter every query at the database layer - Two independent walls; a bug in one is not a data leak Self-hosted single-customer = one tenant row called `default`. Hosted multi-tenant = many tenant rows. **Same code path.** ### Custom fields Every business table has: ```sql ext jsonb not null default '{}', ext_meta text generated ``` Custom fields are JSON keys inside `ext`. A GIN index on `ext` makes them queryable. The `metadata__custom_field` table describes the JSON shape per entity per tenant. The form designer, list views, OpenAPI generator, and AI-agent function catalog all read from this table. For the rare hot-path custom field, an operator can promote a JSON key to a real generated column via an auto-generated Liquibase changeset. This is an optimization, not the default. ### The metadata store ``` metadata__entity metadata__form metadata__permission metadata__custom_field metadata__list_view metadata__role_permission metadata__workflow metadata__rule metadata__menu metadata__report metadata__translation metadata__plugin_config ``` Every row carries `tenant_id`, `source` (`core` / `plugin:` / `user`), `version`, `is_active`. The `source` column makes uninstall/upgrade safe: removing a plug-in cleans up its metadata; user-created metadata is sacred. ### Migrations - Each PBC owns a Liquibase changelog under `pbc-/migrations/` - Plug-ins ship their own changelogs inside their JAR - Forward-only and idempotent by default - Rollback blocks mandatory; CI rejects PRs without them - Tenant onboarding is `INSERT INTO identity__tenant` + seed metadata, not a migration — sub-second ### Data sovereignty (sold worldwide) - **Self-hosted** is automatically compliant — customer chose where Postgres lives - **Hosted** supports **per-region tenant routing**: each tenant row carries a region; `platform-persistence` routes connections to the right regional Postgres cluster - **PII tagging** on field metadata (`pii: true`) drives auto-generated **DSAR exports** and **erasure jobs** (GDPR Articles 15/17) - **Audit log** (`platform__audit`, append-only, monthly partitions) records access to PII fields when audit-strict mode is on --- ## 9. Cross-cutting concerns | Concern | Approach | |---|---| | Security | `PermissionCheck` declared in `api.v1.security`; plug-ins register their own permissions, auto-listed in role editor | | Transactions | Spring `@Transactional` at application-service layer; plug-ins use `api.v1.persistence.Transaction`, never Spring directly | | Audit | `created_at`, `created_by`, `updated_at`, `updated_by`, `tenant_id` on every entity, applied by JPA listener; plug-ins inherit by extending `api.v1.entity.AuditedEntity` | | Events | Typed `DomainEvent`s on every state change; in-process bus by default; **outbox table** in Postgres for cross-crash reliability and as the seam where Kafka/NATS plugs in later without changing PBC code | | AI-agent surface | Same business operations exposed through REST are exposable through an MCP server; v1.1 ships the MCP endpoint, v1.0 architects the seam | --- ## 10. Packaging and deployment ### Shipping artifact **One Docker image** (`ghcr.io/vibeerp/vibe-erp:1.0.0`), plus an optional fat JAR for non-container environments. ``` /app/vibe-erp.jar /app/api-v1.jar /app/migrations/, /app/i18n/, /app/reports/ ← read-only /opt/vibe-erp/ ← customer-mounted volume ├── config/vibe-erp.yaml single config file ├── plugins/ drop *.jar to install ├── i18n-overrides/ ├── files/ if not using S3 └── logs/ ``` ### Single config file (closed key set) `vibe-erp.yaml` covers: instance mode, database, file store, auth, i18n, plugins, observability. Plug-ins read their own config from `metadata__plugin_config`, not from the YAML. ### Install (3 commands) ```bash docker run -d --name vibe-erp \ -p 8080:8080 \ -v /srv/vibeerp:/opt/vibe-erp \ -e DB_PASSWORD=... \ ghcr.io/vibeerp/vibe-erp:1.0.0 ``` First boot: connect → migrate → create `default` tenant → bootstrap admin → ready. Under 30 seconds. ### Upgrade (1 command) `docker rm` + `docker run` with the new image tag. Within a major version, all plug-ins continue to load. Across a major version, `api.v1` and `api.v2` ship side by side for at least one major release. Customer data is never destroyed by an upgrade by default. ### Upgrade contract | Change | Allowed within 1.x? | |---|---| | Add a class to `api.v1` | yes | | Add a method to an `api.v1` interface (with default impl) | yes | | Remove or rename anything in `api.v1` | no — major bump | | Change behavior of an `api.v1` symbol in a way plug-ins can observe | no — major bump | | Anything in `platform.*` or `pbc.*.internal.*` | yes — that's why it's internal | --- ## 11. v1.0 cut line ### v1.0 ships - Single Docker image, fat JAR alternative - Core PBCs: identity, catalog, partners, inventory, warehousing, orders-sales, orders-purchase, production (basic), quality (basic), finance (basic) - `api.v1` published to Maven Central - PF4J plug-in loader with classloader isolation, manifest validation, lifecycle - Metadata store: custom fields, forms, list views, simple rules - Embedded Flowable + BPMN designer in web UI - JSON Schema form designer in web UI - Built-in JWT auth + OIDC SSO - React web SPA covering all core PBCs and customization UIs - REST + OpenAPI on every endpoint - ICU i18n with shipping locales: `en-US`, `zh-CN`, `de-DE`, `ja-JP`, `es-ES` - Reference printing-shop plug-in (built and CI-tested, not loaded by default) - Liquibase migrations with mandatory rollback blocks - Audit log, PII tagging, basic DSAR export - Documentation site - One-command install, one-command upgrade - Health, metrics, structured logs ### v1.0 deferred (architecturally accommodated) - React Native mobile app (v2) - MCP server for AI agents (v1.1) - Hosted multi-tenant deployment with per-region routing, billing, tenant provisioning UI (v2) - Plug-in marketplace / signed plug-ins (v2) - Webhooks-out and Kafka/NATS event streaming (v1.1, outbox seam already exists) - Advanced finance: tax engines, multi-currency revaluation (v1.2+) - Production scheduling / APS (v1.2+) - Hot plug-in reload without restart (v1.2+) - Full-text search beyond Postgres `tsvector` (v1.2+) ### Release policy - Semver on `api.v1`. Major bumps overlap with previous major for ≥1 major release window - Semver on the core image - Plug-ins declare `requires-api: "1.x"`; mismatches fail at install, never at runtime - Minor releases every 6 weeks - LTS on every other major (`1.x`, `3.x`, `5.x`), supported 3 years --- ## 12. Risks and how the design addresses them | Risk | Mitigation | |---|---| | Core gradually accreting printing-specific concepts | The dependency rule + the reference plug-in: anything printing-specific that creeps into core breaks the build of `plugin-printing-shop` only if it's wrong; reviewers must reject any printing terminology in `pbc/*` | | Plug-in API churn breaks the ecosystem | `api.v1` is the only supported surface; plug-in linter rejects internal imports at install time; semver discipline + 1-major deprecation window | | Cross-PBC coupling silently appears | Gradle dependency rule enforced by the build (`pbc-orders-sales` cannot declare `pbc-inventory` as a dependency); CI fails on violations | | Multi-tenancy bug causes data leak in hosted version | Two independent walls (Hibernate filter + Postgres RLS); integration tests with multiple tenants in every PBC | | "Workflows as data" turns into a custom DSL | BPMN 2.0 standard via Flowable; the temptation to invent a vibe_erp-only workflow language must be rejected | | Metadata store becomes a write-once, read-by-no-one configuration graveyard | Every consumer (form renderer, list view, OpenAPI generator, AI function catalog, role editor) reads from it; no parallel sources of truth | | JVM RAM cost makes self-hosting on small shops painful | Minimum spec documented (2 GB RAM, 1 vCPU); GraalVM native image evaluated for v2 | | Customer wants a different DB | Hibernate makes Postgres-only a soft constraint; JSONB and RLS make it harder; we explicitly do not support other DBs in v1.0 and document this | --- ## 13. Verification (how the design will be proved out) The design is verified by **building the framework AND simultaneously building the reference printing-shop plug-in**. The plug-in is the executable acceptance test: - If the plug-in can express the workflows in `raw/业务流程设计文档/` using **only `api.v1`**, the framework is sufficient - If the plug-in needs to reach into a `platform.*` or `pbc.*` internal class, the seam is wrong and `api.v1` needs to grow (deliberately) - If a feature in `pbc/*` is only there to make the printing plug-in work, the design is failing guardrail #1 and the feature must move into the plug-in CI runs the full vibe_erp test suite **and** loads `plugin-printing-shop` in an integration test environment, exercising its key flows end-to-end against a real Postgres. --- ## 14. What happens after this spec This spec is the **architecture-level** design. It is NOT an implementation plan. The next steps are: 1. The user reviews this document and either approves it or requests changes 2. On approval, hand off to the **writing-plans** skill to produce a sequenced implementation plan, broken into work units (each PBC, each platform module, each major capability) 3. CLAUDE.md is updated to reflect the named patterns adopted here (Clean Core, two-tier extensibility, PBCs, `api.v1`, AI-agent seam) 4. The plan is executed incrementally, with the reference printing-shop plug-in built alongside the framework so the abstraction is constantly stress-tested