vibe_erp — Architecture Design (v1)
Status: Approved (brainstorm output, pre-implementation plan) Date: 2026-04-07 Scope: High-level architecture for the entire framework. Implementation plans for individual PBCs and the v1 cut will be written separately.
1. Context and intent
vibe_erp is an ERP/EBC framework (not an ERP application) targeting the printing industry, intended to be sold worldwide and deployed self-hosted-first with a managed/hosted version added later. The reference business documentation under raw/业务流程设计文档/ describes one example printing shop and is treated as a fixture / acceptance test, never as a specification — no part of its workflow is hard-coded into the core.
The design satisfies — and in several cases establishes — the architectural guardrails in CLAUDE.md. The pre-existing guardrails (1–6) plus documentation discipline:
- Core stays domain-agnostic (no printing terms in the core)
- Workflows are data, not code
- Extensibility seams come first
- The reference customer is a test, not a requirement
- Multi-tenant from day one in spirit
- Global / i18n from day one (plus "Documentation discipline" as a separate section in CLAUDE.md)
This design adds five more guardrails to CLAUDE.md (numbered 7–11), derived from validating the current design against 2026 SOTA (Gartner's Composable ERP frame, the MACH principles, SAP S/4HANA's Clean Core extension model, and ERPNext / Frappe's metadata-driven Doctype system):
- Clean Core (extensions never modify the core; A/B/C/D extension grading)
- Two-tier extensibility (key-user no-code metadata + developer pro-code plug-ins, both first-class)
- PBC boundaries are sacred (modular monolith with strict bounded contexts; PBCs never import each other)
-
api.v1is the only stable contract (semver-governed; everything else is internal) - AI agents are a first-class client (REST/OpenAPI surface must be MCP-callable; v1.0 architects the seam, v1.1 ships the endpoint)
2. Foundational decisions
| Decision | Choice | Why |
|---|---|---|
| Deployment model | Self-hosted-first, hosted later, same artifact for both | User requirement; matches Odoo/ERPNext/Tryton/SAP S/4HANA self-host story |
| Architecture style | Modular monolith with strict bounded contexts (PBCs) | MACH allows "modularity OR microservices"; modular monolith is operationally sane for self-host; every successful self-hostable ERP made the same call |
| Backend language | Kotlin on the JVM | Mature ERP ecosystem (Hibernate, Flowable, ICU4J, JasperReports, PF4J), modern ergonomics, large global hiring pool |
| Backend framework | Spring Boot | De facto JVM application framework; PF4J integrates cleanly; Spring Data JPA, Spring Security, Actuator are all standard |
| Workflow engine | Embedded Flowable (BPMN 2.0) | Workflows-as-data is non-negotiable; BPMN is the standard; embedding avoids extra processes |
| Persistence | PostgreSQL as the only mandatory external dependency | Matches every modern open-source ERP; excellent JSONB + RLS support, both critical to this design |
| Multi-tenancy | Row-level tenant_id + Postgres RLS (defense in depth) |
Same code path for self-host (one tenant) and hosted (many tenants); no schema explosion; two independent walls against data leaks |
| Custom fields |
JSONB ext column on every business table, described by metadata rows |
One row, one read, indexable via GIN, no migrations needed for additions, no joins; EAV is the wrong tool |
| Plug-in framework | PF4J + Spring Boot child contexts | Classloader isolation, manifest-based lifecycle, cleanest plug-in story on the JVM |
| Web client (v1) | React + TypeScript SPA | Single SPA covers desktop and tablet office workflows |
| Mobile client | React Native (v2, not v1) | Defer until core API is stable; reuses TS types from web |
| API style | REST + OpenAPI, MCP-callable surface | OpenAPI is the universal integration standard; the MCP server is a separate v1.1 deliverable, the seam exists in v1.0 |
| Reporting | JasperReports | Mature, customer-skinnable, JVM-native |
| i18n | ICU MessageFormat (ICU4J) + Spring MessageSource |
Plurals, gender, number/date formatting, locale fallback — all required for "sold worldwide" |
| Auth | Built-in JWT + OIDC (Keycloak-compatible) | Self-hosters get something out of the box; enterprise customers get SSO from day one |
3. Topology
┌──────────────────────────────────────────────────────────────────────┐
│ Customer's network │
│ │
│ Browser (React SPA) ─┐ │
│ AI agent (MCP, v1.1)─┼─► Reverse proxy ──► vibe_erp backend (1 image)│
│ 3rd-party system ─┘ │ │
│ │ │
│ Inside the image (one Spring Boot process): │ │
│ ┌─────────────────────────────────────┐ │ │
│ │ HTTP layer (REST + OpenAPI + MCP) │ │ │
│ ├─────────────────────────────────────┤ │ │
│ │ Public Plug-in API (api.v1.*) │◄──┤ loaded from │
│ │ — the only stable contract │ │ ./plugins/*.jar │
│ ├─────────────────────────────────────┤ │ via PF4J │
│ │ Core PBCs (modular monolith): │ │ │
│ │ identity · catalog · partners · │ │ │
│ │ inventory · warehousing · │ │ │
│ │ orders-sales · orders-purchase · │ │ │
│ │ production · quality · finance │ │ │
│ ├─────────────────────────────────────┤ │ │
│ │ Cross-cutting: │ │ │
│ │ • Flowable (workflows-as-data) │ │ │
│ │ • Metadata store (Doctype-style) │ │ │
│ │ • i18n (ICU MessageFormat) │ │ │
│ │ • Reporting (JasperReports) │ │ │
│ │ • Job scheduler (Quartz) │ │ │
│ │ • Audit, security, events │ │ │
│ └─────────────────────────────────────┘ │ │
│ ▼ │
│ PostgreSQL (mandatory) │
│ File store (local or S3) │
└──────────────────────────────────────────────────────────────────────┘
Optional sidecars for larger deployments (off by default):
• Keycloak (OIDC) • Redis (cache + queue)
• OpenSearch (search) • SMTP relay
The PBC names above are illustrative core capabilities; none is printing-specific. Printing-specific behavior lives in plug-ins under ./plugins/.
4. Two-tier extensibility (the "Clean Core" model)
The framework supports two extension paths, modeled on SAP S/4HANA's clean-core extensibility levels.
Tier 1 — Key user, no-code
Business analysts customize the system through the web UI. Everything they create is stored as rows in the metadata tables, scoped to their tenant, and tagged source = 'user' so it's preserved across plug-in install/uninstall and core upgrades.
| Capability | Stored in |
|---|---|
| Custom field on an existing entity |
metadata__custom_field → JSONB ext column at runtime |
| Custom form layout |
metadata__form (JSON Schema + UI Schema) |
| Custom list view, filter, column set | metadata__list_view |
| Custom workflow |
metadata__workflow → deployed to Flowable as BPMN |
| Simple "if X then Y" automation | metadata__rule |
| Custom entity (Doctype-style) |
metadata__entity → auto-generated table at apply time |
| Custom report | metadata__report |
| Translations override | metadata__translation |
No build, no restart, no deploy. The OpenAPI spec, the AI-agent function catalog, and the REST API auto-update from the metadata.
Tier 2 — Developer, pro-code
Software developers (the customer's IT, an integrator, or vibe_erp itself) ship a PF4J plug-in JAR. The plug-in:
- Sees only
org.vibeerp.api.v1.*— the public, semver-governed contract - Cannot import
org.vibeerp.platform.*or any PBC's internal classes (rejected by the plug-in linter at install time) - Lives in its own classloader, its own Spring child context, its own DB schema namespace (
plugin_<id>__*), its own metadata-source tag - Can register: new entities, new REST endpoints, new workflow tasks, new form widgets, new report templates, new event listeners, new permissions, new menu entries, new React micro-frontends
Extension grading (borrowed from SAP)
| Grade | Definition | Upgrade safety |
|---|---|---|
| A | Tier 1 only (metadata) | Always safe across any core version |
| B | Tier 2, uses only api.v1 stable surface |
Safe within a major version |
| C | Tier 2, uses deprecated-but-supported api.v1 symbols |
Safe until next major; loader emits warnings |
| D | Tier 2, reaches into internal classes via reflection | UNSUPPORTED; loader rejects unless --allow-grade-d is set; will break |
A core principle: anything a Tier 2 plug-in does should also be possible to do as a Tier 1 customization eventually. Tier 2 is the escape hatch where Tier 1 isn't expressive enough yet.
5. Module structure (Gradle multi-project)
vibe-erp/
├── api/
│ └── api-v1/ ← THE CONTRACT (semver-governed)
│
├── platform/ ← Framework runtime (internal)
│ ├── platform-bootstrap/
│ ├── platform-http/ REST + OpenAPI + MCP host
│ ├── platform-security/ AuthN/AuthZ, tenant resolution, OIDC
│ ├── platform-persistence/ JPA, multi-tenant routing, RLS, Liquibase
│ ├── platform-metadata/ Doctype-equivalent metadata store
│ ├── platform-workflow/ Flowable host
│ ├── platform-i18n/ ICU MessageFormat, locale resolution
│ ├── platform-events/ In-process bus + outbox
│ ├── platform-jobs/ Quartz scheduler
│ ├── platform-reporting/ JasperReports
│ ├── platform-files/ Local + S3 abstraction
│ └── platform-plugins/ PF4J host, lifecycle, classloader isolation
│
├── pbc/ ← Core PBCs (each = bounded context)
│ ├── pbc-identity/
│ ├── pbc-catalog/
│ ├── pbc-partners/
│ ├── pbc-inventory/
│ ├── pbc-warehousing/
│ ├── pbc-orders-sales/
│ ├── pbc-orders-purchase/
│ ├── pbc-production/
│ ├── pbc-quality/
│ └── pbc-finance/
│
├── reference-customer/ ← NOT shipped in core
│ └── plugin-printing-shop/ Real PF4J plug-in expressing the
│ raw/业务流程设计文档/ workflow.
│ Built and tested in CI; not loaded by default.
│
├── web/ ← React + TypeScript SPA
│
└── docs/ ← Framework documentation
Dependency rule (strictly enforced)
api/api-v1 depends on: nothing (Kotlin stdlib + jakarta.validation only)
platform/* depends on: api/api-v1 + Spring + libs
pbc/* depends on: api/api-v1 + platform/* (NEVER another pbc)
plugins (incl. ref) depend on: api/api-v1 only
PBCs communicate only through (a) the event bus and (b) service interfaces declared in api.v1.ext.<pbc>. This is the rule that makes "modular monolith now, splittable later" real.
Per-PBC layout (every PBC follows this)
pbc-orders-sales/
├── api/ ← service contracts re-exported by api.v1
├── domain/ ← entities, value objects, domain services
├── application/ ← use cases / application services
├── infrastructure/ ← Hibernate mappings, repositories
├── http/ ← REST controllers
├── workflow/ ← BPMN files, task handlers
├── metadata/ ← seed metadata (default forms, rules)
├── i18n/ ← message bundles
└── migrations/ ← Liquibase changesets (own table prefix)
6. The api.v1 package
The single most important contract in the codebase. Everything in api.v1 is binary-stable within the 1.x line. Everything not in api.v1 is internal and can change in any release.
org.vibeerp.api.v1
├── core/ Tenant, Locale, Money, Quantity, Id<T>, Result<T,E>
├── entity/ Entity, Field, FieldType, EntityRegistry
├── persistence/ Repository<T>, Query, Page, Transaction
├── workflow/ WorkflowTask, WorkflowEvent, TaskHandler
├── form/ FormSchema, UiSchema
├── http/ @PluginEndpoint, RequestContext, ResponseBuilder
├── event/ DomainEvent, EventListener, EventBus
├── security/ Principal, Permission, PermissionCheck
├── i18n/ MessageKey, Translator, LocaleProvider
├── reporting/ ReportTemplate, ReportContext
├── plugin/ Plugin, PluginManifest, ExtensionPoint
└── ext/ Typed extension interfaces a plug-in implements
api.v1 is published as api-v1.jar to Maven Central so plug-in authors can build against it without pulling the entire vibe_erp source tree.
7. Plug-in lifecycle
1. Boot ./plugins/*.jar scanned by platform-plugins
2. Manifest plugin.yml read: id, version, requires-api, deps
3. Compatibility rejected if requires-api ≠ current api.v1 major
4. Lint rejected if it imports anything outside api.v1.*
5. Classload PF4J creates an isolated classloader per plug-in
6. Register plug-in's entry class implements api.v1.plugin.Plugin
and registers Extensions via @Extension
7. Wire Spring child context per plug-in; plug-in's @Components
live there only
8. Migrate plug-in's Liquibase changesets run in plugin_<id>__*
9. Seed metadata plug-in's metadata YAML is upserted, tagged with plug-in id
10. Ready endpoints, workflow tasks, forms, reports, listeners live
11. Disable deregister, drop child context; data preserved
12. Uninstall explicit operator action; only then is the schema dropped
8. Data model and multi-tenancy
Schema namespacing
PBCs and plug-ins use table name prefixes, not Postgres schemas:
identity__user, identity__role
catalog__item, catalog__item_attribute
inventory__stock_item, inventory__movement
orders_sales__order, orders_sales__order_line
production__work_order, production__operation
plugin_printingshop__plate_spec (reference plug-in)
metadata__custom_field, metadata__form, metadata__workflow
flowable_* (Flowable's own tables, untouched)
This keeps Hibernate, RLS policies, and migrations all in one logical schema (public), avoids search_path traps, and gives clean uninstall semantics.
Tenant isolation
- Every business table has
tenant_id, NOT NULL - Hibernate
@TenantIdfilters every query at the application layer - Postgres Row-Level Security policies filter every query at the database layer
- Two independent walls; a bug in one is not a data leak
Self-hosted single-customer = one tenant row called default. Hosted multi-tenant = many tenant rows. Same code path.
Custom fields
Every business table has:
ext jsonb not null default '{}',
ext_meta text generated
Custom fields are JSON keys inside ext. A GIN index on ext makes them queryable. The metadata__custom_field table describes the JSON shape per entity per tenant. The form designer, list views, OpenAPI generator, and AI-agent function catalog all read from this table.
For the rare hot-path custom field, an operator can promote a JSON key to a real generated column via an auto-generated Liquibase changeset. This is an optimization, not the default.
The metadata store
metadata__entity metadata__form metadata__permission
metadata__custom_field metadata__list_view metadata__role_permission
metadata__workflow metadata__rule metadata__menu
metadata__report metadata__translation metadata__plugin_config
Every row carries tenant_id, source (core / plugin:<id> / user), version, is_active. The source column makes uninstall/upgrade safe: removing a plug-in cleans up its metadata; user-created metadata is sacred.
Migrations
- Each PBC owns a Liquibase changelog under
pbc-<name>/migrations/ - Plug-ins ship their own changelogs inside their JAR
- Forward-only and idempotent by default
- Rollback blocks mandatory; CI rejects PRs without them
- Tenant onboarding is
INSERT INTO identity__tenant+ seed metadata, not a migration — sub-second
Data sovereignty (sold worldwide)
- Self-hosted is automatically compliant — customer chose where Postgres lives
-
Hosted supports per-region tenant routing: each tenant row carries a region;
platform-persistenceroutes connections to the right regional Postgres cluster -
PII tagging on field metadata (
pii: true) drives auto-generated DSAR exports and erasure jobs (GDPR Articles 15/17) -
Audit log (
platform__audit, append-only, monthly partitions) records access to PII fields when audit-strict mode is on
9. Cross-cutting concerns
| Concern | Approach |
|---|---|
| Security |
PermissionCheck declared in api.v1.security; plug-ins register their own permissions, auto-listed in role editor |
| Transactions | Spring @Transactional at application-service layer; plug-ins use api.v1.persistence.Transaction, never Spring directly |
| Audit |
created_at, created_by, updated_at, updated_by, tenant_id on every entity, applied by JPA listener; plug-ins inherit by extending api.v1.entity.AuditedEntity
|
| Events | Typed DomainEvents on every state change; in-process bus by default; outbox table in Postgres for cross-crash reliability and as the seam where Kafka/NATS plugs in later without changing PBC code |
| AI-agent surface | Same business operations exposed through REST are exposable through an MCP server; v1.1 ships the MCP endpoint, v1.0 architects the seam |
10. Packaging and deployment
Shipping artifact
One Docker image (ghcr.io/vibeerp/vibe-erp:1.0.0), plus an optional fat JAR for non-container environments.
/app/vibe-erp.jar
/app/api-v1.jar
/app/migrations/, /app/i18n/, /app/reports/ ← read-only
/opt/vibe-erp/ ← customer-mounted volume
├── config/vibe-erp.yaml single config file
├── plugins/ drop *.jar to install
├── i18n-overrides/
├── files/ if not using S3
└── logs/
Single config file (closed key set)
vibe-erp.yaml covers: instance mode, database, file store, auth, i18n, plugins, observability. Plug-ins read their own config from metadata__plugin_config, not from the YAML.
Install (3 commands)
docker run -d --name vibe-erp \
-p 8080:8080 \
-v /srv/vibeerp:/opt/vibe-erp \
-e DB_PASSWORD=... \
ghcr.io/vibeerp/vibe-erp:1.0.0
First boot: connect → migrate → create default tenant → bootstrap admin → ready. Under 30 seconds.
Upgrade (1 command)
docker rm + docker run with the new image tag. Within a major version, all plug-ins continue to load. Across a major version, api.v1 and api.v2 ship side by side for at least one major release. Customer data is never destroyed by an upgrade by default.
Upgrade contract
| Change | Allowed within 1.x? |
|---|---|
Add a class to api.v1
|
yes |
Add a method to an api.v1 interface (with default impl) |
yes |
Remove or rename anything in api.v1
|
no — major bump |
Change behavior of an api.v1 symbol in a way plug-ins can observe |
no — major bump |
Anything in platform.* or pbc.*.internal.*
|
yes — that's why it's internal |
11. v1.0 cut line
v1.0 ships
- Single Docker image, fat JAR alternative
- Core PBCs: identity, catalog, partners, inventory, warehousing, orders-sales, orders-purchase, production (basic), quality (basic), finance (basic)
-
api.v1published to Maven Central - PF4J plug-in loader with classloader isolation, manifest validation, lifecycle
- Metadata store: custom fields, forms, list views, simple rules
- Embedded Flowable + BPMN designer in web UI
- JSON Schema form designer in web UI
- Built-in JWT auth + OIDC SSO
- React web SPA covering all core PBCs and customization UIs
- REST + OpenAPI on every endpoint
- ICU i18n with shipping locales:
en-US,zh-CN,de-DE,ja-JP,es-ES - Reference printing-shop plug-in (built and CI-tested, not loaded by default)
- Liquibase migrations with mandatory rollback blocks
- Audit log, PII tagging, basic DSAR export
- Documentation site
- One-command install, one-command upgrade
- Health, metrics, structured logs
v1.0 deferred (architecturally accommodated)
- React Native mobile app (v2)
- MCP server for AI agents (v1.1)
- Hosted multi-tenant deployment with per-region routing, billing, tenant provisioning UI (v2)
- Plug-in marketplace / signed plug-ins (v2)
- Webhooks-out and Kafka/NATS event streaming (v1.1, outbox seam already exists)
- Advanced finance: tax engines, multi-currency revaluation (v1.2+)
- Production scheduling / APS (v1.2+)
- Hot plug-in reload without restart (v1.2+)
- Full-text search beyond Postgres
tsvector(v1.2+)
Release policy
- Semver on
api.v1. Major bumps overlap with previous major for ≥1 major release window - Semver on the core image
- Plug-ins declare
requires-api: "1.x"; mismatches fail at install, never at runtime - Minor releases every 6 weeks
- LTS on every other major (
1.x,3.x,5.x), supported 3 years
12. Risks and how the design addresses them
| Risk | Mitigation |
|---|---|
| Core gradually accreting printing-specific concepts | The dependency rule + the reference plug-in: anything printing-specific that creeps into core breaks the build of plugin-printing-shop only if it's wrong; reviewers must reject any printing terminology in pbc/*
|
| Plug-in API churn breaks the ecosystem |
api.v1 is the only supported surface; plug-in linter rejects internal imports at install time; semver discipline + 1-major deprecation window |
| Cross-PBC coupling silently appears | Gradle dependency rule enforced by the build (pbc-orders-sales cannot declare pbc-inventory as a dependency); CI fails on violations |
| Multi-tenancy bug causes data leak in hosted version | Two independent walls (Hibernate filter + Postgres RLS); integration tests with multiple tenants in every PBC |
| "Workflows as data" turns into a custom DSL | BPMN 2.0 standard via Flowable; the temptation to invent a vibe_erp-only workflow language must be rejected |
| Metadata store becomes a write-once, read-by-no-one configuration graveyard | Every consumer (form renderer, list view, OpenAPI generator, AI function catalog, role editor) reads from it; no parallel sources of truth |
| JVM RAM cost makes self-hosting on small shops painful | Minimum spec documented (2 GB RAM, 1 vCPU); GraalVM native image evaluated for v2 |
| Customer wants a different DB | Hibernate makes Postgres-only a soft constraint; JSONB and RLS make it harder; we explicitly do not support other DBs in v1.0 and document this |
13. Verification (how the design will be proved out)
The design is verified by building the framework AND simultaneously building the reference printing-shop plug-in. The plug-in is the executable acceptance test:
- If the plug-in can express the workflows in
raw/业务流程设计文档/using onlyapi.v1, the framework is sufficient - If the plug-in needs to reach into a
platform.*orpbc.*internal class, the seam is wrong andapi.v1needs to grow (deliberately) - If a feature in
pbc/*is only there to make the printing plug-in work, the design is failing guardrail #1 and the feature must move into the plug-in
CI runs the full vibe_erp test suite and loads plugin-printing-shop in an integration test environment, exercising its key flows end-to-end against a real Postgres.
14. What happens after this spec
This spec is the architecture-level design. It is NOT an implementation plan. The next steps are:
- The user reviews this document and either approves it or requests changes
- On approval, hand off to the writing-plans skill to produce a sequenced implementation plan, broken into work units (each PBC, each platform module, each major capability)
- CLAUDE.md is updated to reflect the named patterns adopted here (Clean Core, two-tier extensibility, PBCs,
api.v1, AI-agent seam) - The plan is executed incrementally, with the reference printing-shop plug-in built alongside the framework so the abstraction is constantly stress-tested