Skip to content

Active Directory

Active Directory is the first documented EDP source system. It provides foundational identity, group, computer, organizational unit, and membership data for identity lifecycle visibility and cross-system correlation.

This page describes an on-premises Active Directory Domain Services source. Entra ID or Microsoft 365 should be documented as separate source systems because their APIs, identifiers, permissions, and data shapes differ.

Overview

Active Directory is commonly used as a system of record or synchronization source for users, groups, computers, service accounts, organizational units, and access control structures.

EDP should use Active Directory data to support:

  • Identity lifecycle reporting
  • Onboarding and offboarding validation
  • User and service account inventory
  • Group and nested group membership visibility
  • Computer and endpoint correlation
  • Organizational unit and directory structure visibility
  • Access review support
  • Cross-system identity matching
  • Audit and governance evidence

Ownership

RoleResponsibility
Source system ownerOwns Active Directory service health, access policies, and operational use
Technical ownerSupports connector access, directory queries, and network reachability
Data stewardDefines identity, group, computer, and membership meaning for EDP consumers
Platform ownerRuns the connector, raw landing, transformations, quality checks, and downstream models

Data Domains

Initial Active Directory data domains:

  • Users
  • Groups
  • Group memberships
  • Computers
  • Organizational units
  • Contacts where useful
  • Service accounts where distinguishable
  • Directory metadata such as domains, naming contexts, and collection timestamps

Later domains may include:

  • Group policy metadata
  • Password and account policy metadata
  • Delegation and administrative control metadata
  • Selected security descriptors where governance requirements justify the sensitivity

Connector Options

Connector OptionWhen It FitsNotes
LDAPS queryPreferred production pattern when direct directory queries are allowedUses encrypted LDAP; supports repeatable scheduled collection
LDAP queryUseful only in trusted internal environments where LDAPS is unavailableShould be avoided when credentials or sensitive attributes may traverse the network
PowerShell exportUseful when a Windows-based collection host is easier to approveCan export CSV or JSON into a landing location for ingestion
Scheduled file exportUseful when direct connector access is not allowedRequires strong file naming, validation, and transfer controls
Manual seedAcceptable only for early testingShould not be used as a production identity source

Prebuilt Package Status

Status: prebuilt planned.

A reusable Active Directory connector package should include:

  • Containerized connector runtime where feasible
  • Optional Windows-based export pattern if direct Linux container LDAP access is not approved
  • Configuration schema for domain controllers, base DNs, object classes, attributes, and schedules
  • Credential requirements for a least-privilege read-only service account
  • Connection test
  • Airflow DAG template
  • Raw landing tables for users, groups, computers, OUs, and memberships
  • dbt source definitions
  • ODS identity, group, device, and membership mappings
  • Data Vault hub, link, and satellite templates
  • Data quality tests
  • Operational runbook

Required Access

The connector should use a read-only service account with the minimum directory permissions required to read approved objects and attributes.

Document these items before production use:

  • Domain or forest scope
  • Domain controllers or discovery method
  • Base distinguished names
  • Object classes to collect
  • Attribute allowlist
  • Authentication method
  • Network path and firewall requirements
  • TLS or certificate requirements for LDAPS
  • Account lockout and rotation expectations
  • Approval from the directory owner

Sensitive attributes should be excluded unless there is a clear operational need and an approved governance decision.

Raw Landing Design

Recommended raw tables:

Raw TablePurpose
raw.ad_usersUser account payloads and source metadata
raw.ad_groupsGroup payloads and source metadata
raw.ad_group_membershipsDirect group membership edges
raw.ad_computersComputer account payloads and source metadata
raw.ad_organizational_unitsOU payloads and directory hierarchy metadata
raw.ad_collection_runsConnector run metadata, counts, timing, and status

Each raw record should include:

  • Domain
  • Distinguished name
  • Object GUID
  • Security identifier when available and approved
  • Source object class
  • Extraction timestamp
  • Ingestion run identifier
  • Connector version
  • Raw payload

ODS Mapping

Recommended ODS entities:

  • ods.identity
  • ods.account
  • ods.group
  • ods.group_membership
  • ods.device
  • ods.org_unit
  • ods.source_record

Common mapping rules:

  • Use immutable source identifiers such as object GUIDs for source identity.
  • Preserve distinguished names as source attributes because they can change.
  • Normalize enabled or disabled account state.
  • Normalize user, service account, and shared account classifications where possible.
  • Model direct group memberships first.
  • Add nested membership expansion as a derived model with clear lineage.
  • Keep source-specific attributes available without making every attribute part of the canonical identity model.

Data Vault Mapping

Recommended Data Vault structures:

  • Hub: identity
  • Hub: account
  • Hub: group
  • Hub: device
  • Hub: organizational unit
  • Link: account to identity
  • Link: account to group
  • Link: group to group for nested membership
  • Link: account or device to organizational unit
  • Satellites: descriptive attributes, status history, source metadata, and classification history

The Data Vault should preserve historical changes in memberships, account state, OU placement, and key descriptive attributes when those changes are collected over time.

Data Mart Outputs

Initial Active Directory marts:

  • Identity lifecycle mart
  • Disabled and stale accounts mart
  • Group membership mart
  • Nested membership mart
  • Privileged group visibility mart
  • Computer inventory mart
  • OU structure mart
  • Access review support mart

These marts should be designed for dashboards, access reviews, operational reconciliation, and downstream correlation with service desk, endpoint, collaboration, licensing, and HR data.

Quality Checks

Recommended checks:

  • Collection run completed successfully.
  • User, group, computer, and membership counts are within expected ranges.
  • Object GUIDs are present and unique within each object type.
  • Distinguished names are present.
  • Group membership references resolve to known users, groups, contacts, or computers where possible.
  • Required attributes are populated for active accounts.
  • Disabled, expired, locked, or stale account flags are normalized consistently.
  • Nested group expansion does not produce cycles without detection.
  • Freshness meets the expected schedule.

Refresh Schedule

Start with daily collection for most directory inventory and membership data.

Use more frequent collection when the use case requires near-current operational visibility, such as onboarding, offboarding, privileged group review, or access remediation.

Historical changes should be preserved in the Data Vault or snapshot structures when source history is not available directly.

Operational Runbook

The Active Directory connector runbook should include:

  • How to validate service account access
  • How to test LDAPS connectivity
  • How to run a manual collection
  • How to review object and membership counts
  • How to handle domain controller reachability failures
  • How to handle schema or attribute changes
  • How to backfill or reprocess a collection
  • How to rotate connector credentials
  • How to disable the connector safely

Known Limitations

Active Directory may not contain all authoritative identity lifecycle information. HR, Entra ID, Microsoft 365, endpoint management, service desk, and application systems may each hold different parts of the lifecycle.

Distinguished names and account names can change. Use immutable source identifiers for durable correlation.

Nested group memberships require careful expansion and cycle handling.

Some useful attributes may be sensitive. Attribute collection should be allowlisted and approved.