Privacy & "State of CI Failures" aggregates

How Exlogare derives the public quarterly report on CI root causes — what is aggregated, what we never publish, and how to opt out.

Exlogare publishes a public quarterly report — State of CI Failures — YYYY Qn — describing the dominant root causes of CI failures across our customer base. The report is generated automatically from the same root cause analyses that power your dashboard, but with strict privacy rules baked into the aggregation pipeline.

This page documents exactly what is included, what is excluded, and how to opt out.

What we publish

Every issue of State of CI Failures contains:

Top root causes — a representative root-cause string, the number of distinct teams that hit it (k), the total hit count, and the severity bucket.
Top CI providers — a count of failures per provider (github_actions, gitlab, bitbucket, circleci, jenkins, drone, teamcity, generic, …) so readers can see where the failures concentrate.
A narrative section written by an Exlogare team member, summarising the trends and linking to product docs.

Drafts are generated by an automated quarterly job and reviewed manually before they are flipped to published. The draft body is rendered from the aggregated data only — humans add interpretation, never additional rows.

What we never publish

The aggregation pipeline is constrained by three hard rules. None of these can be bypassed by the human reviewer.

Opt-out is honoured at aggregation time, not at draft time. A team that toggles Share anonymised stats off will be excluded from the next quarterly run. The check is performed live against tenants.share_anonymized_stats; we do not cache the flag.
k-anonymity of 5. A root cause cluster only enters the report if it was seen by at least 5 distinct teams. Anything below that ceiling is dropped silently. This is the same k-anonymity threshold that governs Top providers implicitly via the same input set.
No identifiers. The aggregation never emits team ids, project ids, project paths, repository slugs, branch names, commit SHAs, user emails or pipeline URLs. Failures are grouped by a SHA-1 fingerprint of the normalised root cause string + severity, which is computed at ingestion time.

The fingerprint is a one-way hash. Even Exlogare staff cannot recover the original root cause text from a fingerprint without already having access to the underlying analysis row in your tenant.

How the aggregation runs

The quarterly task runs at 10:00 UTC on the 2nd of January, April, July and October. It aggregates the previous calendar quarter (≈92 days) of failure_clusters rows whose tenants have share_anonymized_stats = true. The output is two BlogPost rows — state-of-ci-YYYY-qN for lang=en and lang=ru — both with published_at = NULL.

Drafts that already exist for the same slug are skipped, so re-running the job is idempotent.

How to opt out

In the dashboard: Settings → Privacy → toggle off Share anonymised stats.
From the API: PATCH /api/tenants/me/privacy with body {"share_anonymized_stats": false}. Available to any authenticated tenant member.

The change takes effect immediately. If you toggle off mid-quarter, you are excluded from the upcoming quarterly aggregation as well as every future one until you toggle back on. Past published reports are not regenerated retroactively (they are markdown rows in the blog), but they never contained your team’s identifiers in the first place.

The first time a user from your team visits the dashboard after the opt-in flag was introduced, a banner appears on the Overview page asking them to confirm or opt out. Clicking Keep enabled, Opt out, or X records the choice and stamps share_anonymized_stats_acknowledged_at. The banner never reappears for that team.

Audit trail

Every flip of the privacy flag is recorded in your team’s audit log under the action tenant_privacy_updated, with the actor email and the previous / new values. The audit log lives in the dashboard under Settings → Audit log (admin role required) and can be exported to CSV at any time.

Cross-references

Failure clusters — where the per-team count and fingerprint_hash fields come from.
API tokens — programmatic access to the privacy endpoint via the read scope is not required; PATCH uses your normal session credentials.