Intelligence Stack · Construction Defect

How Machine Learning Tracks the Same Builder Across Dozens of Legal Entities

DAIS Analytics June 2026 8 min read

A major residential builder rarely goes to court under one name. By the time a construction-defect case is filed, the defendant is typically a subsidiary -- a development-phase entity with a project-specific name -- while the parent company and dozens of sibling entities remain invisible in that record. Tracking what that enterprise actually does, across its full legal footprint, is an entity resolution problem. This is how machine learning solves it.

The entity name problem in construction defect litigation

Large residential builders do not operate as a single legal entity. The standard structure involves a holding company, one or more regional operating entities organized by geography or division, project-specific LLCs formed for individual subdivisions or phases, and in many cases successor entities that carry new names following acquisitions or internal reorganizations. Each layer of that structure is a distinct legal entity with its own public record footprint. And the naming conventions across those layers are rarely consistent.

When a construction-defect case is filed against a project-specific LLC -- a name that might reference a street, a subdivision, or a phase number -- the court record reflects that one entity. The parent company’s history, the sibling entities’ records across other markets, and the same principals’ development activities under earlier or later corporate names are not in that record. They exist in other public records, under other names, in other jurisdictions. Nothing in the defendant’s court file connects them.

An attorney doing manual research who searches for the defendant name gets one picture: the thin record of a project-level LLC that may have been incorporated specifically for that development and has no activity outside it. The full enterprise picture -- the builder’s aggregate litigation history, its permit volume across markets, its registered-agent footprint, the principals who appear across related entities -- requires connecting dozens of records that share no common identifier. The defendant name is the least useful place to start.

What entity resolution does

Entity resolution is the machine-learning technique of determining whether records from different sources refer to the same real-world entity, and grouping those records accordingly. In the context of builder intelligence, it means probabilistic matching across multiple fields simultaneously: entity names in all their variations, registered agent names, principal officer names, mailing and registered addresses, state of formation, and related-party disclosures that appear in licensing and corporate filings.

The approach is probabilistic rather than deterministic because no single field is a reliable identifier. A registered agent might represent hundreds or thousands of unrelated entities. A mailing address might resolve to a corporate service firm or a law office. A principal officer name might be common enough that it appears in dozens of unrelated filings. Each individual field produces weak evidence. The algorithm’s job is to evaluate how many independent weak signals align across a candidate entity pair, and to translate that alignment into a match probability.

The model is trained against ground-truth relationships -- known parent-subsidiary pairs, known successor entities, known related-party disclosures -- and the match probability threshold for each candidate pair is tuned against that ground truth. Entity pairs that exceed the threshold are grouped into a cluster. The cluster represents a unified enterprise profile: a single node in a connected graph that aggregates all the records that belong to the same economic actor, regardless of how many distinct legal names they carry. The output is not a flat list of names but a structured graph of relationships, with confidence scores attached to each edge.

The data inputs that make resolution possible

The quality of entity resolution depends directly on the breadth and variety of the input records. Different source types contribute different signals, and the signals are most powerful in combination.

State contractor licensing records often include principal officer names, DBA registrations, and license numbers that persist across corporate reorganizations. A license number assigned to an individual qualifier can link entities that share no common name -- the same qualifier may appear across multiple corporate licensee records over time. Certificate-of-authority filings and secretary-of-state records add registered agent relationships, officer overlaps, and state-of-formation data. These records also capture dates: when an entity was formed, when it changed its registered agent, when it was administratively dissolved. Timing patterns are signals.

County permit records contribute project addresses and contractor license numbers that frequently link entities across name variations. A permit application for a subdivision filed by entity A may carry a license number that also appears in permits filed by entity B in a different county under a different name. That shared license number is a strong signal of corporate relationship even when the entity names are entirely dissimilar. Circuit court civil dockets add defendant aliases, related-party disclosures, and in some cases corporate-structure admissions that appear in pleadings or discovery. The resolution model draws on all of these source types, weighting each field’s signal according to its discriminative power and combining them into a per-pair match probability. The model is validated against independently verified ground-truth relationships and updated as new source records are ingested.

What a resolved enterprise profile makes possible

The analytical possibilities that open up once entity relationships are resolved are qualitatively different from what per-entity records allow. The most direct application is aggregate exposure: with a unified enterprise profile, it becomes possible to see the builder’s total construction-defect footprint across all subsidiaries, not just the one entity that appears on the complaint in front of you. A project LLC that has no prior litigation history on its own may belong to an enterprise with hundreds of resolved records across a dozen states.

Related-entity patterns also become visible. It becomes possible to identify whether the builder has been sued by the same plaintiff-side attorney in other jurisdictions under a different entity name -- a pattern that carries useful strategic information about how the builder has responded to that attorney in prior matters. It becomes possible to see whether a settlement in one case was followed by an uptick in defect filings across sibling entities in the same time window, which may indicate that the underlying construction issue was systemic rather than isolated. And it becomes possible to benchmark the enterprise’s conduct against the broader market of builders operating at similar permit volume -- to situate one builder’s aggregate posture in the context of what the market as a whole shows for comparable claim types and project categories.

None of that analysis is accessible at the subsidiary level. All of it emerges from the unified enterprise view. The resolved profile is the precondition for the intelligence layer above it.

“A subsidiary might have been created specifically for the project at issue and have no assets, no prior history, and no apparent pattern of conduct. The enterprise behind it may have thousands of resolved records. Entity resolution is what makes the difference visible.”

Why this matters for construction defect cases specifically

Construction-defect litigation has a structural information problem that makes enterprise-level intelligence particularly valuable. The defendant in a CD case is almost always a project entity whose existence predates the claim by years. By the time the case is filed, that entity may have completed its development purpose and have no ongoing operations. Its financial picture, litigation posture, and strategic resources are not meaningfully described by its own thin record. What describes them is the enterprise behind it -- the parent and sibling entities that share its ownership, its management, and its institutional approach to claims.

When you are evaluating a case, the defendant’s entity name is often the least important fact about who you are dealing with. What matters is the enterprise: its financial depth, its litigation history across markets, and its pattern of conduct in response to similar claims over time. Those questions are not answerable from the defendant’s record alone. They become answerable when entity resolution has connected that defendant to its enterprise context -- when the project LLC is understood as one node in a graph that includes the regional operating entity, the parent, the sibling projects, and the principals who appear across all of them.

That enterprise context is what experienced defense counsel already has. They represent the same client across every entity, in every jurisdiction, and they carry institutional knowledge of how the enterprise has handled similar claims before. Entity resolution is what gives the plaintiff side access to the same structural picture -- not from inside the enterprise, but from the public record, assembled by a system built specifically to surface what the fragmented record alone does not show.

Builder Intelligence, powered by entity resolution.

DAIS resolves builder enterprise profiles across permit records, court filings, and licensing databases -- so you see what the enterprise does, not just what the named defendant has on its public record.

See Builder Intelligence

Builder Shell Entities and the Public Record

How project-specific LLCs are structured, what they reveal, and what public records say about the enterprise behind them.

Read Construction Defect

How Pre-Suit Rules Create a Construction-Defect Data Trail

How permit, licensing, and docket records combine into a rich public intelligence layer.

Read