You are opening our English language website. You can keep reading or switch to other languages.
03.11.2025
8 min read

Speed at the Edges, Trust at the Core: Why Data Mesh and Lakehouse Aren't Competing Capital Markets Architectures

Speed at the Edges, Trust at the Core: Why Data Mesh and Lakehouse Aren't Competing Capital Markets Architectures

Article by

Ed Simmons
Ed Simmons

The false choice holding trading firms back

A persistent misconception plagues data architecture discussions: teams believe they must choose between Data Mesh and Lakehouse. This framing misunderstands what each approach actually solves.

Lakehouse is an infrastructure layer. It addresses how you technically store, manage, and query data with warehouse-like capabilities on lake-scale storage.

Data Mesh is an organizational paradigm. It addresses who owns data, how teams collaborate, and how governance scales across domains.

They're not alternatives. They're complementary. For trading firms balancing speed against regulatory requirements, combining both approaches isn't just possible – it's necessary.

Why the confusion exists

Both emerged as critiques of centralized data warehouses, but they critique different failures:

  • Lakehouse fixes technical fragmentation between lakes and warehouses – the costly duplication, inconsistent state, and complex ETL.
  • Mesh fixes organizational bottlenecks where central data teams can't keep pace with domain needs, creating shadow pipelines and ungoverned sprawl.

The real power comes from using Lakehouse as the technical foundation that makes Data Mesh principles practical at scale.

How they work together in trading

The Lakehouse provides the unified substrate:

  • Single storage layer for structured trades, semi-structured logs, and unstructured alternative data
  • ACID transactions ensure consistent P&L calculations during high-velocity updates
  • Centralized governance enforces security, retention, lineage, and access controls for BCBS 239, MiFID II, and SOX
  • Elastic compute handles petabyte-scale backtests without data movement

Data Mesh principles run on top:

  • Each trading desk owns its domain data products, such as equities features, FX signals, and credit risk models
  • Domains publish to the shared Lakehouse substrate with contracts, SLAs, and documentation
  • Federated governance council sets enterprise standards while domains choose tools and iteration speed
  • Cross-domain discovery happens through a unified catalog, all pointing to the same underlying storage

A global trading firm described their implementation: "We maintain vetted market and trade data in the Lakehouse as the firm's golden source. Each desk builds its alpha datasets as domain products on that foundation. Time to deploy new signals dropped by half, and audit findings decreased because lineage flows end-to-end through a single substrate."

Why Apache Iceberg is critical

Open table formats, particularly Apache Iceberg, make this hybrid architecture practical by solving problems that enable Data Mesh on Lakehouse infrastructure.

Schema evolution without breaking consumers

Trading data schemas change constantly: new alternative data feeds add fields, risk models require additional calculations, and regulations introduce new columns. Iceberg enables safe evolution through:

  • Additive changes: Domains add columns without affecting existing queries. An FX desk adds a volatility metric; consumers who do not use it see no impact.
  • Type promotion: Evolve integer fields to longs as precision requirements grow.
  • Schema versioning: Every change creates a new version. Historical queries automatically use the schema from that time period.

When the equities desk owns its feature product, it needs to evolve it independently without coordinating downtime across consuming domains. Iceberg makes that possible.

Hidden partitioning eliminates coupling

Traditional data lakes force consumers to understand physical partitioning. Changing the partitioning strategy causes every consumer query to break.

The Iceberg hides partitioning behind the table abstraction. Domains define logical queries, and Iceberg automatically prunes to relevant partitions. When a domain reorganizes its partitioning for performance, users typically notice no difference. This decouples producers from consumers, which is critical for Data Mesh autonomy.

Time travel and audit trails

Iceberg maintains snapshots of every table version with zero storage overhead for unchanged data:

-- What did positions look like at market close yesterday?

SELECT * FROM positions
FOR SYSTEM_TIME AS OF '2025-10-19 16:00:00'

This supports operational debugging and regulatory requirements for reconstructing decisions. When domains own their data products, built-in time travel enables them to provide audit trails without needing to build versioning infrastructure.

ACID transactions across distributed domains

Data Mesh distributes ownership, but some operations span domains. A trade execution updates positions, impacts the P&L, triggers risk checks, and creates compliance events – all of which require consistency.

Iceberg provides serializable isolation. Multiple domains can read and write concurrently without seeing partial updates. This enables domains to maintain strict boundaries while participating in cross-domain transactions when needed.

Medallion architecture within the hybrid model

The medallion architecture – bronze (raw), silver (cleansed), gold (curated) – naturally maps onto the Lakehouse-Mesh hybrid:

Bronze layer (centralized in Lakehouse core): Raw market data, trade feeds, and reference data land here with minimal transformation. Stored as Iceberg tables with full-time travel capability for regulatory reconstruction. The central platform team manages ingestion and retention.

Silver layer (boundary between core and domains): Cleansed, validated data with consistent schemas and quality checks. Core datasets, such as canonical trades and positions, remain centrally governed. Domain-specific transformations begin here – FX desk standardizes its volatility calculations, while the equities desk normalizes corporate actions.

Gold layer (owned by domains as Mesh products): Business-ready datasets optimized for consumption. Domains publish these as certified products with SLAs and contracts:

  • Equities features for alpha models
  • FX signals for trading strategies
  • Risk aggregations for compliance reporting

Each gold table is an Iceberg table with schema contracts, hidden partitioning, and time travel enabled. Domains evolve independently while the platform team ensures interoperability standards.

The medallion pattern provides clarity: bronze and most silver are centralized for consistency; silver-to-gold transformations and gold products are domain-owned for speed and efficiency. The Lakehouse substrate (Iceberg) enables this split without data duplication.

Implementation blueprint

1. Standardize on Iceberg across all layers

Bronze, silver, and gold all use Iceberg tables. This creates a common substrate for schema evolution, time travel, and ACID guarantees. Define policies for:

  • Schema evolution rules (what changes require version bumps)
  • Snapshot retention (90 days operational, 7 years regulatory)
  • Partitioning strategies (time-based for tick data, entity-based for reference)

2. Establish federated governance

Create a council with domain owners and platform leads, setting enterprise policies for:

  • Which datasets live in centralized silver versus domain-owned gold
  • Schema evolution standards and breaking change procedures
  • Metadata requirements (tags, documentation, SLA definitions)
  • Quality gates before promoting to production

3. Build enabling platform capabilities

  • Unified catalog built on Iceberg metadata showing ownership, contracts, lineage
  • End-to-end lineage tracking across Iceberg snapshots and medallion layers
  • Automated schema validation against contracts at publish time
  • Self-service tools for time travel queries and partition analysis
  • Role-based access control enforced at the Iceberg table level

4. Define clear domain boundaries

Map your data landscape: which datasets demand centralized control (bronze and core silver) versus domain ownership (domain silver and gold)? That boundary defines where platform responsibility ends and domain autonomy begins.

Real outcomes

Speed: New data products move from concept to production in days. Schema evolution eliminates coordination overhead, eliminating the need for cross-domain schema reviews.

Quality: ACID guarantees prevent inconsistent states. Time travel enables instant debugging of production issues.

Reuse: Certified gold products replace one-off extracts. Hidden partitioning means consumers don't need producer expertise to query efficiently.

Auditability: Lineage flows through medallion layers. Time travel reconstructs any historical state for regulatory inquiries.

Autonomy: Domains evolve gold products independently. The platform team focuses on bronze/silver infrastructure and governance standards.

Pilot approach

Scope: Select one domain, like FX trading. Establish Lakehouse substrate with Iceberg for bronze (raw market data) and silver (validated trades). Build a domain workspace for gold products (features and signals).

Deliverables:

  • Bronze: Raw FX tick data, immutable with time travel
  • Silver: Validated FX trades with canonical schema
  • Gold: FX volatility surfaces and trading signals owned by the FX desk

Success metrics:

  • Time from idea to production signal
  • Schema changes that broke consumers (target: zero)
  • Audit query performance using time travel
  • Cross-domain reuse of gold products

The clarity you need

Stop debating Lakehouse versus Mesh. Ask instead: Do we have infrastructure supporting unified, governed storage with safe schema evolution? Do we have an organizational model that enables domain autonomy without sacrificing interoperability?

You need Lakehouse technology, built on Iceberg, to consolidate fragmented storage and provide the capabilities that domains require – including schema evolution, time travel, and ACID guarantees. You need Mesh principles to distribute ownership and accelerate delivery without creating chaos. You need medallion architecture to clarify what stays centralized (bronze, core silver) versus what becomes domain-owned (gold products).

Iceberg bridges these worlds. It provides the technical substrate making federated ownership practical. Domains evolve independently because schemas evolve safely. Consumers trust products because ACID transactions guarantee consistency. Auditors get answers because time travel reconstructs any historical state.

The architecture that emerges isn't Lakehouse or Mesh. It's Lakehouse with Mesh – infrastructure designed for governance, organization designed for speed, and medallion architecture providing clarity, with Iceberg as the technical foundation that enables all three.

Your challenge: Map your current landscape. Which data lives in centralized medallion layers for consistency? Which becomes domain-owned gold products for speed? Do you have the table format capabilities to support safe evolution across that boundary?

 

Interested in implementing this model? Contact DataArt's capital markets team to design an architecture matching your trading domains, data volumes, and regulatory requirements.