2026-04-07-vibe-erp-architecture-design.md 26 KB

vibe_erp — Architecture Design (v1)

Status: Approved (brainstorm output, pre-implementation plan) Date: 2026-04-07 Scope: High-level architecture for the entire framework. Implementation plans for individual PBCs and the v1 cut will be written separately.


1. Context and intent

vibe_erp is an ERP/EBC framework (not an ERP application) targeting the printing industry, intended to be sold worldwide and deployed self-hosted-first with a managed/hosted version added later. The reference business documentation under raw/业务流程设计文档/ describes one example printing shop and is treated as a fixture / acceptance test, never as a specification — no part of its workflow is hard-coded into the core.

The design satisfies — and in several cases establishes — the architectural guardrails in CLAUDE.md. The pre-existing guardrails (1–6) plus documentation discipline:

  1. Core stays domain-agnostic (no printing terms in the core)
  2. Workflows are data, not code
  3. Extensibility seams come first
  4. The reference customer is a test, not a requirement
  5. Multi-tenant from day one in spirit
  6. Global / i18n from day one (plus "Documentation discipline" as a separate section in CLAUDE.md)

This design adds five more guardrails to CLAUDE.md (numbered 7–11), derived from validating the current design against 2026 SOTA (Gartner's Composable ERP frame, the MACH principles, SAP S/4HANA's Clean Core extension model, and ERPNext / Frappe's metadata-driven Doctype system):

  1. Clean Core (extensions never modify the core; A/B/C/D extension grading)
  2. Two-tier extensibility (key-user no-code metadata + developer pro-code plug-ins, both first-class)
  3. PBC boundaries are sacred (modular monolith with strict bounded contexts; PBCs never import each other)
  4. api.v1 is the only stable contract (semver-governed; everything else is internal)
  5. AI agents are a first-class client (REST/OpenAPI surface must be MCP-callable; v1.0 architects the seam, v1.1 ships the endpoint)

2. Foundational decisions

Decision Choice Why
Deployment model Self-hosted-first, hosted later, same artifact for both User requirement; matches Odoo/ERPNext/Tryton/SAP S/4HANA self-host story
Architecture style Modular monolith with strict bounded contexts (PBCs) MACH allows "modularity OR microservices"; modular monolith is operationally sane for self-host; every successful self-hostable ERP made the same call
Backend language Kotlin on the JVM Mature ERP ecosystem (Hibernate, Flowable, ICU4J, JasperReports, PF4J), modern ergonomics, large global hiring pool
Backend framework Spring Boot De facto JVM application framework; PF4J integrates cleanly; Spring Data JPA, Spring Security, Actuator are all standard
Workflow engine Embedded Flowable (BPMN 2.0) Workflows-as-data is non-negotiable; BPMN is the standard; embedding avoids extra processes
Persistence PostgreSQL as the only mandatory external dependency Matches every modern open-source ERP; excellent JSONB + RLS support, both critical to this design
Multi-tenancy Row-level tenant_id + Postgres RLS (defense in depth) Same code path for self-host (one tenant) and hosted (many tenants); no schema explosion; two independent walls against data leaks
Custom fields JSONB ext column on every business table, described by metadata rows One row, one read, indexable via GIN, no migrations needed for additions, no joins; EAV is the wrong tool
Plug-in framework PF4J + Spring Boot child contexts Classloader isolation, manifest-based lifecycle, cleanest plug-in story on the JVM
Web client (v1) React + TypeScript SPA Single SPA covers desktop and tablet office workflows
Mobile client React Native (v2, not v1) Defer until core API is stable; reuses TS types from web
API style REST + OpenAPI, MCP-callable surface OpenAPI is the universal integration standard; the MCP server is a separate v1.1 deliverable, the seam exists in v1.0
Reporting JasperReports Mature, customer-skinnable, JVM-native
i18n ICU MessageFormat (ICU4J) + Spring MessageSource Plurals, gender, number/date formatting, locale fallback — all required for "sold worldwide"
Auth Built-in JWT + OIDC (Keycloak-compatible) Self-hosters get something out of the box; enterprise customers get SSO from day one

3. Topology

┌──────────────────────────────────────────────────────────────────────┐
│                          Customer's network                          │
│                                                                      │
│  Browser (React SPA) ─┐                                              │
│  AI agent (MCP, v1.1)─┼─► Reverse proxy ──► vibe_erp backend (1 image)│
│  3rd-party system    ─┘                       │                      │
│                                                │                      │
│   Inside the image (one Spring Boot process):  │                      │
│     ┌─────────────────────────────────────┐   │                      │
│     │  HTTP layer (REST + OpenAPI + MCP)  │   │                      │
│     ├─────────────────────────────────────┤   │                      │
│     │   Public Plug-in API  (api.v1.*)    │◄──┤  loaded from         │
│     │   — the only stable contract        │   │  ./plugins/*.jar     │
│     ├─────────────────────────────────────┤   │  via PF4J            │
│     │   Core PBCs (modular monolith):     │   │                      │
│     │   identity · catalog · partners ·   │   │                      │
│     │   inventory · warehousing ·         │   │                      │
│     │   orders-sales · orders-purchase ·  │   │                      │
│     │   production · quality · finance    │   │                      │
│     ├─────────────────────────────────────┤   │                      │
│     │   Cross-cutting:                    │   │                      │
│     │   • Flowable (workflows-as-data)    │   │                      │
│     │   • Metadata store (Doctype-style)  │   │                      │
│     │   • i18n (ICU MessageFormat)        │   │                      │
│     │   • Reporting (JasperReports)       │   │                      │
│     │   • Job scheduler (Quartz)          │   │                      │
│     │   • Audit, security, events         │   │                      │
│     └─────────────────────────────────────┘   │                      │
│                                                ▼                      │
│                                       PostgreSQL  (mandatory)        │
│                                       File store  (local or S3)      │
└──────────────────────────────────────────────────────────────────────┘

Optional sidecars for larger deployments (off by default):
   • Keycloak (OIDC)        • Redis (cache + queue)
   • OpenSearch (search)    • SMTP relay

The PBC names above are illustrative core capabilities; none is printing-specific. Printing-specific behavior lives in plug-ins under ./plugins/.


4. Two-tier extensibility (the "Clean Core" model)

The framework supports two extension paths, modeled on SAP S/4HANA's clean-core extensibility levels.

Tier 1 — Key user, no-code

Business analysts customize the system through the web UI. Everything they create is stored as rows in the metadata tables, scoped to their tenant, and tagged source = 'user' so it's preserved across plug-in install/uninstall and core upgrades.

Capability Stored in
Custom field on an existing entity metadata__custom_field → JSONB ext column at runtime
Custom form layout metadata__form (JSON Schema + UI Schema)
Custom list view, filter, column set metadata__list_view
Custom workflow metadata__workflow → deployed to Flowable as BPMN
Simple "if X then Y" automation metadata__rule
Custom entity (Doctype-style) metadata__entity → auto-generated table at apply time
Custom report metadata__report
Translations override metadata__translation

No build, no restart, no deploy. The OpenAPI spec, the AI-agent function catalog, and the REST API auto-update from the metadata.

Tier 2 — Developer, pro-code

Software developers (the customer's IT, an integrator, or vibe_erp itself) ship a PF4J plug-in JAR. The plug-in:

  • Sees only org.vibeerp.api.v1.* — the public, semver-governed contract
  • Cannot import org.vibeerp.platform.* or any PBC's internal classes (rejected by the plug-in linter at install time)
  • Lives in its own classloader, its own Spring child context, its own DB schema namespace (plugin_<id>__*), its own metadata-source tag
  • Can register: new entities, new REST endpoints, new workflow tasks, new form widgets, new report templates, new event listeners, new permissions, new menu entries, new React micro-frontends

Extension grading (borrowed from SAP)

Grade Definition Upgrade safety
A Tier 1 only (metadata) Always safe across any core version
B Tier 2, uses only api.v1 stable surface Safe within a major version
C Tier 2, uses deprecated-but-supported api.v1 symbols Safe until next major; loader emits warnings
D Tier 2, reaches into internal classes via reflection UNSUPPORTED; loader rejects unless --allow-grade-d is set; will break

A core principle: anything a Tier 2 plug-in does should also be possible to do as a Tier 1 customization eventually. Tier 2 is the escape hatch where Tier 1 isn't expressive enough yet.


5. Module structure (Gradle multi-project)

vibe-erp/
├── api/
│   └── api-v1/                 ← THE CONTRACT (semver-governed)
│
├── platform/                   ← Framework runtime (internal)
│   ├── platform-bootstrap/
│   ├── platform-http/          REST + OpenAPI + MCP host
│   ├── platform-security/      AuthN/AuthZ, tenant resolution, OIDC
│   ├── platform-persistence/   JPA, multi-tenant routing, RLS, Liquibase
│   ├── platform-metadata/      Doctype-equivalent metadata store
│   ├── platform-workflow/      Flowable host
│   ├── platform-i18n/          ICU MessageFormat, locale resolution
│   ├── platform-events/        In-process bus + outbox
│   ├── platform-jobs/          Quartz scheduler
│   ├── platform-reporting/     JasperReports
│   ├── platform-files/         Local + S3 abstraction
│   └── platform-plugins/       PF4J host, lifecycle, classloader isolation
│
├── pbc/                        ← Core PBCs (each = bounded context)
│   ├── pbc-identity/
│   ├── pbc-catalog/
│   ├── pbc-partners/
│   ├── pbc-inventory/
│   ├── pbc-warehousing/
│   ├── pbc-orders-sales/
│   ├── pbc-orders-purchase/
│   ├── pbc-production/
│   ├── pbc-quality/
│   └── pbc-finance/
│
├── reference-customer/         ← NOT shipped in core
│   └── plugin-printing-shop/   Real PF4J plug-in expressing the
│                               raw/业务流程设计文档/ workflow.
│                               Built and tested in CI; not loaded by default.
│
├── web/                        ← React + TypeScript SPA
│
└── docs/                       ← Framework documentation

Dependency rule (strictly enforced)

api/api-v1            depends on: nothing (Kotlin stdlib + jakarta.validation only)
platform/*            depends on: api/api-v1 + Spring + libs
pbc/*                 depends on: api/api-v1 + platform/*  (NEVER another pbc)
plugins (incl. ref)   depend on:  api/api-v1 only

PBCs communicate only through (a) the event bus and (b) service interfaces declared in api.v1.ext.<pbc>. This is the rule that makes "modular monolith now, splittable later" real.

Per-PBC layout (every PBC follows this)

pbc-orders-sales/
├── api/                ← service contracts re-exported by api.v1
├── domain/             ← entities, value objects, domain services
├── application/        ← use cases / application services
├── infrastructure/     ← Hibernate mappings, repositories
├── http/               ← REST controllers
├── workflow/           ← BPMN files, task handlers
├── metadata/           ← seed metadata (default forms, rules)
├── i18n/               ← message bundles
└── migrations/         ← Liquibase changesets (own table prefix)

6. The api.v1 package

The single most important contract in the codebase. Everything in api.v1 is binary-stable within the 1.x line. Everything not in api.v1 is internal and can change in any release.

org.vibeerp.api.v1
├── core/         Tenant, Locale, Money, Quantity, Id<T>, Result<T,E>
├── entity/       Entity, Field, FieldType, EntityRegistry
├── persistence/  Repository<T>, Query, Page, Transaction
├── workflow/     WorkflowTask, WorkflowEvent, TaskHandler
├── form/         FormSchema, UiSchema
├── http/         @PluginEndpoint, RequestContext, ResponseBuilder
├── event/        DomainEvent, EventListener, EventBus
├── security/     Principal, Permission, PermissionCheck
├── i18n/         MessageKey, Translator, LocaleProvider
├── reporting/    ReportTemplate, ReportContext
├── plugin/       Plugin, PluginManifest, ExtensionPoint
└── ext/          Typed extension interfaces a plug-in implements

api.v1 is published as api-v1.jar to Maven Central so plug-in authors can build against it without pulling the entire vibe_erp source tree.


7. Plug-in lifecycle

1. Boot           ./plugins/*.jar scanned by platform-plugins
2. Manifest       plugin.yml read: id, version, requires-api, deps
3. Compatibility  rejected if requires-api ≠ current api.v1 major
4. Lint           rejected if it imports anything outside api.v1.*
5. Classload      PF4J creates an isolated classloader per plug-in
6. Register       plug-in's entry class implements api.v1.plugin.Plugin
                  and registers Extensions via @Extension
7. Wire           Spring child context per plug-in; plug-in's @Components
                  live there only
8. Migrate        plug-in's Liquibase changesets run in plugin_<id>__*
9. Seed metadata  plug-in's metadata YAML is upserted, tagged with plug-in id
10. Ready         endpoints, workflow tasks, forms, reports, listeners live
11. Disable       deregister, drop child context; data preserved
12. Uninstall     explicit operator action; only then is the schema dropped

8. Data model and multi-tenancy

Schema namespacing

PBCs and plug-ins use table name prefixes, not Postgres schemas:

identity__user, identity__role
catalog__item, catalog__item_attribute
inventory__stock_item, inventory__movement
orders_sales__order, orders_sales__order_line
production__work_order, production__operation
plugin_printingshop__plate_spec  (reference plug-in)
metadata__custom_field, metadata__form, metadata__workflow
flowable_*  (Flowable's own tables, untouched)

This keeps Hibernate, RLS policies, and migrations all in one logical schema (public), avoids search_path traps, and gives clean uninstall semantics.

Tenant isolation

  • Every business table has tenant_id, NOT NULL
  • Hibernate @TenantId filters every query at the application layer
  • Postgres Row-Level Security policies filter every query at the database layer
  • Two independent walls; a bug in one is not a data leak

Self-hosted single-customer = one tenant row called default. Hosted multi-tenant = many tenant rows. Same code path.

Custom fields

Every business table has:

ext       jsonb   not null  default '{}',
ext_meta  text    generated

Custom fields are JSON keys inside ext. A GIN index on ext makes them queryable. The metadata__custom_field table describes the JSON shape per entity per tenant. The form designer, list views, OpenAPI generator, and AI-agent function catalog all read from this table.

For the rare hot-path custom field, an operator can promote a JSON key to a real generated column via an auto-generated Liquibase changeset. This is an optimization, not the default.

The metadata store

metadata__entity         metadata__form           metadata__permission
metadata__custom_field   metadata__list_view      metadata__role_permission
metadata__workflow       metadata__rule           metadata__menu
metadata__report         metadata__translation    metadata__plugin_config

Every row carries tenant_id, source (core / plugin:<id> / user), version, is_active. The source column makes uninstall/upgrade safe: removing a plug-in cleans up its metadata; user-created metadata is sacred.

Migrations

  • Each PBC owns a Liquibase changelog under pbc-<name>/migrations/
  • Plug-ins ship their own changelogs inside their JAR
  • Forward-only and idempotent by default
  • Rollback blocks mandatory; CI rejects PRs without them
  • Tenant onboarding is INSERT INTO identity__tenant + seed metadata, not a migration — sub-second

Data sovereignty (sold worldwide)

  • Self-hosted is automatically compliant — customer chose where Postgres lives
  • Hosted supports per-region tenant routing: each tenant row carries a region; platform-persistence routes connections to the right regional Postgres cluster
  • PII tagging on field metadata (pii: true) drives auto-generated DSAR exports and erasure jobs (GDPR Articles 15/17)
  • Audit log (platform__audit, append-only, monthly partitions) records access to PII fields when audit-strict mode is on

9. Cross-cutting concerns

Concern Approach
Security PermissionCheck declared in api.v1.security; plug-ins register their own permissions, auto-listed in role editor
Transactions Spring @Transactional at application-service layer; plug-ins use api.v1.persistence.Transaction, never Spring directly
Audit created_at, created_by, updated_at, updated_by, tenant_id on every entity, applied by JPA listener; plug-ins inherit by extending api.v1.entity.AuditedEntity
Events Typed DomainEvents on every state change; in-process bus by default; outbox table in Postgres for cross-crash reliability and as the seam where Kafka/NATS plugs in later without changing PBC code
AI-agent surface Same business operations exposed through REST are exposable through an MCP server; v1.1 ships the MCP endpoint, v1.0 architects the seam

10. Packaging and deployment

Shipping artifact

One Docker image (ghcr.io/vibeerp/vibe-erp:1.0.0), plus an optional fat JAR for non-container environments.

/app/vibe-erp.jar
/app/api-v1.jar
/app/migrations/, /app/i18n/, /app/reports/   ← read-only
/opt/vibe-erp/                                ← customer-mounted volume
  ├── config/vibe-erp.yaml                    single config file
  ├── plugins/                                drop *.jar to install
  ├── i18n-overrides/
  ├── files/                                  if not using S3
  └── logs/

Single config file (closed key set)

vibe-erp.yaml covers: instance mode, database, file store, auth, i18n, plugins, observability. Plug-ins read their own config from metadata__plugin_config, not from the YAML.

Install (3 commands)

docker run -d --name vibe-erp \
  -p 8080:8080 \
  -v /srv/vibeerp:/opt/vibe-erp \
  -e DB_PASSWORD=... \
  ghcr.io/vibeerp/vibe-erp:1.0.0

First boot: connect → migrate → create default tenant → bootstrap admin → ready. Under 30 seconds.

Upgrade (1 command)

docker rm + docker run with the new image tag. Within a major version, all plug-ins continue to load. Across a major version, api.v1 and api.v2 ship side by side for at least one major release. Customer data is never destroyed by an upgrade by default.

Upgrade contract

Change Allowed within 1.x?
Add a class to api.v1 yes
Add a method to an api.v1 interface (with default impl) yes
Remove or rename anything in api.v1 no — major bump
Change behavior of an api.v1 symbol in a way plug-ins can observe no — major bump
Anything in platform.* or pbc.*.internal.* yes — that's why it's internal

11. v1.0 cut line

v1.0 ships

  • Single Docker image, fat JAR alternative
  • Core PBCs: identity, catalog, partners, inventory, warehousing, orders-sales, orders-purchase, production (basic), quality (basic), finance (basic)
  • api.v1 published to Maven Central
  • PF4J plug-in loader with classloader isolation, manifest validation, lifecycle
  • Metadata store: custom fields, forms, list views, simple rules
  • Embedded Flowable + BPMN designer in web UI
  • JSON Schema form designer in web UI
  • Built-in JWT auth + OIDC SSO
  • React web SPA covering all core PBCs and customization UIs
  • REST + OpenAPI on every endpoint
  • ICU i18n with shipping locales: en-US, zh-CN, de-DE, ja-JP, es-ES
  • Reference printing-shop plug-in (built and CI-tested, not loaded by default)
  • Liquibase migrations with mandatory rollback blocks
  • Audit log, PII tagging, basic DSAR export
  • Documentation site
  • One-command install, one-command upgrade
  • Health, metrics, structured logs

v1.0 deferred (architecturally accommodated)

  • React Native mobile app (v2)
  • MCP server for AI agents (v1.1)
  • Hosted multi-tenant deployment with per-region routing, billing, tenant provisioning UI (v2)
  • Plug-in marketplace / signed plug-ins (v2)
  • Webhooks-out and Kafka/NATS event streaming (v1.1, outbox seam already exists)
  • Advanced finance: tax engines, multi-currency revaluation (v1.2+)
  • Production scheduling / APS (v1.2+)
  • Hot plug-in reload without restart (v1.2+)
  • Full-text search beyond Postgres tsvector (v1.2+)

Release policy

  • Semver on api.v1. Major bumps overlap with previous major for ≥1 major release window
  • Semver on the core image
  • Plug-ins declare requires-api: "1.x"; mismatches fail at install, never at runtime
  • Minor releases every 6 weeks
  • LTS on every other major (1.x, 3.x, 5.x), supported 3 years

12. Risks and how the design addresses them

Risk Mitigation
Core gradually accreting printing-specific concepts The dependency rule + the reference plug-in: anything printing-specific that creeps into core breaks the build of plugin-printing-shop only if it's wrong; reviewers must reject any printing terminology in pbc/*
Plug-in API churn breaks the ecosystem api.v1 is the only supported surface; plug-in linter rejects internal imports at install time; semver discipline + 1-major deprecation window
Cross-PBC coupling silently appears Gradle dependency rule enforced by the build (pbc-orders-sales cannot declare pbc-inventory as a dependency); CI fails on violations
Multi-tenancy bug causes data leak in hosted version Two independent walls (Hibernate filter + Postgres RLS); integration tests with multiple tenants in every PBC
"Workflows as data" turns into a custom DSL BPMN 2.0 standard via Flowable; the temptation to invent a vibe_erp-only workflow language must be rejected
Metadata store becomes a write-once, read-by-no-one configuration graveyard Every consumer (form renderer, list view, OpenAPI generator, AI function catalog, role editor) reads from it; no parallel sources of truth
JVM RAM cost makes self-hosting on small shops painful Minimum spec documented (2 GB RAM, 1 vCPU); GraalVM native image evaluated for v2
Customer wants a different DB Hibernate makes Postgres-only a soft constraint; JSONB and RLS make it harder; we explicitly do not support other DBs in v1.0 and document this

13. Verification (how the design will be proved out)

The design is verified by building the framework AND simultaneously building the reference printing-shop plug-in. The plug-in is the executable acceptance test:

  • If the plug-in can express the workflows in raw/业务流程设计文档/ using only api.v1, the framework is sufficient
  • If the plug-in needs to reach into a platform.* or pbc.* internal class, the seam is wrong and api.v1 needs to grow (deliberately)
  • If a feature in pbc/* is only there to make the printing plug-in work, the design is failing guardrail #1 and the feature must move into the plug-in

CI runs the full vibe_erp test suite and loads plugin-printing-shop in an integration test environment, exercising its key flows end-to-end against a real Postgres.


14. What happens after this spec

This spec is the architecture-level design. It is NOT an implementation plan. The next steps are:

  1. The user reviews this document and either approves it or requests changes
  2. On approval, hand off to the writing-plans skill to produce a sequenced implementation plan, broken into work units (each PBC, each platform module, each major capability)
  3. CLAUDE.md is updated to reflect the named patterns adopted here (Clean Core, two-tier extensibility, PBCs, api.v1, AI-agent seam)
  4. The plan is executed incrementally, with the reference printing-shop plug-in built alongside the framework so the abstraction is constantly stress-tested