Skip to content

Runtime Infrastructure

EDP should be designed to run repeatably. The goal is to package platform services, connectors, data models, dashboards, and operational applications so they can be rebuilt from source control and deployed into a target environment with minimal manual work.

The ideal pattern is: code and configuration define the platform, containers run the workloads, GitOps applies the desired state, and credentials are supplied by the target environment.

Runtime Goals

The platform runtime should support:

  • Repeatable deployments from Git
  • Containerized services and jobs
  • Prebuilt connector and model packages
  • Environment-specific credentials and endpoints
  • Clear separation between code, configuration, and secrets
  • Development, test, and production environments
  • Backup, restore, and disaster recovery
  • Observability, logging, and operational alerting
  • Secure ingress, TLS, identity integration, and access control

Use containers as the packaging boundary for EDP services.

Each major platform component should run as a containerized workload where practical:

  • Connectors
  • Airflow workers and schedulers
  • dbt runners
  • Superset
  • Grafana
  • FastAPI services
  • Vue or static web frontends
  • Metadata and observability services
  • Supporting jobs such as migrations, seed loads, and validation checks

For production-like environments, Kubernetes is the recommended orchestration layer. It provides a common way to schedule services, run jobs, manage configuration, expose services, scale workloads, and recover failed containers.

For local development and small proof-of-concept environments, Docker Compose can be useful. It should be treated as a developer convenience, not the long-term production operating model.

Operating System Environment

The preferred operating system environment is a minimal Linux host designed for container workloads.

Strong options include:

  • Talos Linux for an immutable, Kubernetes-focused operating system
  • Flatcar Container Linux for container-focused infrastructure
  • Ubuntu Server or Debian for teams that need a familiar general-purpose Linux environment
  • Enterprise Linux variants when organizational standards require them

The production host operating system should be kept boring, minimal, patched, observable, and dedicated to running platform workloads. Avoid treating the host as an application server where tools are manually installed and configured outside source control.

Infrastructure Placement

EDP can run in several infrastructure models:

Runtime LocationWhen It Fits
Existing virtualization infrastructureGood for early adoption when the organization already has reliable VM operations
Dedicated on-premises hardwareGood when data locality, cost control, or isolated platform capacity matters
Managed KubernetesGood when cloud operations and managed control planes are available
Hybrid deploymentGood when some connectors must run near on-prem systems while dashboards or apps run elsewhere
Developer workstation or labGood for local testing, demos, and connector development

The architecture should remain infrastructure-agnostic where possible. EDP should not depend on one specific hosting provider unless that is an intentional organizational decision.

GitOps Model

GitOps means the desired platform state is declared in Git and applied automatically or semi-automatically to the runtime environment.

EDP should store these items in version control:

  • Kubernetes manifests or Helm charts
  • Container build definitions
  • Airflow DAGs
  • dbt projects and model definitions
  • Database migration scripts
  • Connector code and connector configuration templates
  • Superset dashboard exports or provisioning definitions where practical
  • Grafana dashboards and datasources
  • API and application source code
  • Documentation, runbooks, and architecture decisions

GitOps tools such as Argo CD or Flux can continuously reconcile the target environment with the desired state in Git.

Credentials-Only Connection Setup

Connectors and prebuilt data models should be reusable across deployments. A new environment should not require rewriting connector logic or model code just to point at different systems.

Prefer this pattern:

  • Connector code is packaged in containers.
  • Source-specific settings are supplied through environment configuration.
  • Secrets are supplied through the target environment's secret management system.
  • Connection tests validate credentials before scheduled jobs run.
  • dbt sources, schemas, and model variables are parameterized per environment.
  • Dashboards and applications depend on governed views or marts, not hardcoded source details.

The platform should distinguish between:

  • Code: connector logic, transformations, APIs, applications
  • Configuration: endpoints, schedules, feature flags, schema names, environment identifiers
  • Secrets: passwords, tokens, certificates, API keys, private keys

Secrets should never be committed to Git.

Prebuilt Models and Connectors

Repeatability improves when EDP provides reusable starter packages:

  • Connector templates for common API and database patterns
  • Standard raw landing tables and metadata fields
  • ODS entity patterns for identities, assets, tickets, licenses, memberships, and source records
  • Data Vault hub, link, and satellite templates
  • Data Mart templates for common operational views
  • dbt tests for freshness, uniqueness, accepted values, relationships, and source reconciliation
  • Dashboard starter packs for operational health and common pilot use cases

The target environment should only need to supply credentials, endpoints, schedule choices, and organization-specific mapping decisions.

Document source-specific connector status, required credentials, supported modes, and prebuilt packages in the Connector Catalog.

Environment Strategy

At minimum, plan for:

  • Local development for connector and model work
  • Development or sandbox environment for integration testing
  • Production environment for trusted operational use

As the platform matures, add:

  • Test data patterns
  • Promotion gates
  • Database migration review
  • Backup and restore testing
  • Rollback procedures
  • Release notes
  • Environment-specific access policies

Storage and Stateful Services

Containers make services repeatable, but data remains stateful. Plan carefully for:

  • PostgreSQL persistent volumes or managed database storage
  • Backup and restore procedures
  • Point-in-time recovery where required
  • Object storage for large raw history or Parquet archives
  • Database migration tooling
  • Storage capacity monitoring
  • Retention and archival policies

Do not treat stateful platform data as disposable just because the services are containerized.

Network and Access

Runtime infrastructure should include:

  • Ingress and reverse proxy configuration
  • TLS certificate management
  • Internal service DNS
  • Network segmentation where appropriate
  • Egress rules for connector access to source systems
  • Identity provider integration for user-facing tools
  • Service accounts for machine access
  • Audit logging for administrative and data access

For an early production-capable EDP environment:

  • Run workloads in containers.
  • Use Kubernetes when operationally feasible.
  • Use Docker Compose only for local development or prototypes.
  • Use PostgreSQL as the primary stateful database.
  • Use object storage later for large raw history.
  • Store deployment definitions in Git.
  • Use GitHub Actions for build and validation.
  • Use Argo CD or Flux when the deployment process is ready for GitOps reconciliation.
  • Use SOPS plus age, Vault, OpenBao, or a managed secret store for credentials.
  • Treat host operating systems as minimal container infrastructure, not manually managed application servers.

This approach keeps the platform portable, repeatable, and easier to rebuild when hardware, hosting, or organizational requirements change.