toolsautomationcost

Tool sprawl auditor: Python scripts to analyze SaaS usage and recommend consolidation

UUnknown

2026-02-06

9 min read

Automated Python scripts to pull billing and usage, score overlap, and recommend SaaS consolidation for engineering and product teams.

You're paying for 27 SaaS products but actively use 7 — here's an automated way to know which ones to keep

Tool sprawl drains engineering time, fragments data, and inflates recurring costs. If your product and engineering teams spend more time deciding which app to use than shipping features, you have a problem common in 2026: fragmented SaaS stacks, opaque usage telemetry, and rising usage-based pricing. This article gives a practical, reusable set of Python scripts and connector templates that pull billing and usage data from SaaS APIs, score overlap, and generate prioritized consolidation recommendations for engineering and product leaders.

Quick summary — what you'll get

Audit pattern: inventory → billing → usage → identity → scoring → recommendations.
Reusable Python library (async) to fetch and normalize billing/usage from diverse SaaS APIs.
Scoring model (Jaccard + cost-per-active-user + feature overlap) with tunable weights.
Connector templates (Python + examples for Bash/JS) and deployment notes for scale, security, and FinOps integration.
Actionable checklist and a sample consolidation report you can run in hours.

Why this matters in 2026

Since late 2023 and through 2025 the market exploded with vertical AI SaaS—adding dozens of point solutions into teams' workflows. By 2026, enterprises are evolving FinOps practices beyond cloud infrastructure into SaaS procurement and usage governance. This means:

Hybrid pricing models—seat-based + usage-based billing is common; unclear usage drivers increase waste.
Decentralized purchasing—business units buy tools with corporate cards; central visibility is limited.
Identity and provisioning are the most reliable signal for active users (SSO, SCIM, IdP logs).
APIs are improving, but no standard billing schema exists; normalization is required.

Audit approach — the inverted pyramid

Inventory: Gather vendor names, billing emails, and invoice IDs (procurement, corporate cards, employee-submitted receipts).
Billing: Pull invoices/subscription lines, cost per period, and billing model (seat vs usage).
Usage: Gather active users, feature usage, API calls, storage consumed.
Identity: Map users via SSO/SCIM/IdP logs to measure adoption and duplicates across tools.
Score & recommend: Compute overlap and consolidation potential. Prioritize by savings and operational risk.

Reusable Python scripts — architecture

The library is intentionally modular so teams can add connectors. Key modules:

connectors: one file per SaaS vendor implementing a standard interface (fetch_billing, fetch_users, fetch_features).
normalizer: maps vendor-specific fields to canonical schema (cost, period, seats, active_users, feature_tags).
scorer: computes consolidation_score and breakdowns.
reporter: writes CSV/JSON, generates human-readable recommendations.
runner: orchestrates concurrent runs, rate-limit handling, and caching.

Core connector interface (contract)

class BaseConnector:
    async def fetch_billing(self) -> dict:  # returns {'vendor': '', 'lines': [ { 'sku': '', 'cost': 123.45, 'period_start': '', 'period_end': '' } ]}
      raise NotImplementedError

    async def fetch_users(self) -> list:  # returns [{'user_id': '', 'email': '', 'status': 'active', 'last_active': '2026-01-12'}]
      raise NotImplementedError

    async def fetch_features(self) -> dict:  # returns {'feature_tags': ['chat', 'analytics'], 'usage_metrics': {...}}
      raise NotImplementedError

Example async runner (abridged)

import asyncio
  import aiohttp
  from connectors import connector_factory

  async def run_connector(name, cfg):
      conn = connector_factory(name, cfg)
      billing = await conn.fetch_billing()
      users = await conn.fetch_users()
      features = await conn.fetch_features()
      return {'vendor': name, 'billing': billing, 'users': users, 'features': features}

  async def main(config):
      tasks = [run_connector(n, cfg) for n, cfg in config.items()]
      return await asyncio.gather(*tasks)

  if __name__ == '__main__':
      # load config.yaml with endpoints and creds
      results = asyncio.run(main(config))

Scoring model — how to measure overlap and consolidation potential

We combine signals into a consolidation_score (0..100). Components and rationale:

User overlap (40%): Jaccard similarity between active user sets. If two tools share most active users, consolidation is feasible.
Feature overlap (30%): Intersection of feature_tags — e.g., both provide 'in-app chat', 'analytics', 'forms'.
Cost inefficiency (20%): Cost per active user vs category median.
Integration overhead (10%): Number of custom integrations (webhooks/ETL) — more integrations increase consolidation risk/cost.

Scoring functions (Python)

def jaccard(a: set, b: set) -> float:
      if not a and not b:
          return 1.0
      return len(a & b) / len(a | b)

  def consolidation_score(a, b, weights=None):
      weights = weights or {'user': 0.4, 'feature': 0.3, 'cost': 0.2, 'integration': 0.1}
      user_score = jaccard(set(a['active_users']), set(b['active_users']))
      feature_score = jaccard(set(a['features']), set(b['features']))
      # cost: 1 means perfectly inefficient (big saving), 0 best fit — normalize by observed max
      cost_per_user_a = a['cost'] / max(1, len(a['active_users']))
      cost_per_user_b = b['cost'] / max(1, len(b['active_users']))
      cost_score = abs(cost_per_user_a - cost_per_user_b) / max(cost_per_user_a, cost_per_user_b, 1)
      integration_score = 1 - min(1, (a['integrations'] + b['integrations']) / 10)
      raw = (weights['user'] * user_score + weights['feature'] * feature_score + weights['cost'] * cost_score + weights['integration'] * integration_score)
      return round(raw * 100, 1)

Connector templates — examples you can copy

Below are compact connector templates to adapt. Replace secrets with your secrets manager calls.

Stripe (billing) — Python

import stripe

  class StripeConnector(BaseConnector):
      def __init__(self, cfg):
          stripe.api_key = cfg['api_key']

      async def fetch_billing(self):
          # stripe python client is sync — run in thread for asyncio
          from asyncio import to_thread
          def get_invoices():
              invoices = stripe.Invoice.list(limit=100)
              lines = []
              for inv in invoices.auto_paging_iter():
                  for li in inv.lines.data:
                      lines.append({'sku': li.description, 'cost': li.amount_paid/100.0, 'period_start': li.period.start, 'period_end': li.period.end})
              return {'vendor': 'stripe', 'lines': lines}
          return await to_thread(get_invoices)

      async def fetch_users(self):
          return []  # Stripe doesn't provide app users; combine with product connectors

      async def fetch_features(self):
          return {}

Okta (users via SCIM / Admin Logs) — Python

import aiohttp

  class OktaConnector(BaseConnector):
      def __init__(self, cfg):
          self.base = cfg['base_url']
          self.token = cfg['token']

      async def fetch_users(self):
          hdr = {'Authorization': f'SSWS {self.token}', 'Accept': 'application/json'}
          users = []
          async with aiohttp.ClientSession() as s:
              url = f'{self.base}/api/v1/users'
              async with s.get(url, headers=hdr) as r:
                  data = await r.json()
                  for u in data:
                      users.append({'user_id': u['id'], 'email': u['profile']['email'], 'status': u['status']})
          return users

Slack (admin API) — JS snippet to list workspace members

// Node.js example using @slack/web-api
  const { WebClient } = require('@slack/web-api');
  const client = new WebClient(process.env.SLACK_TOKEN);

  async function listMembers() {
    const res = await client.users.list();
    return res.members.map(m => ({ id: m.id, email: m.profile.email, is_bot: m.is_bot }));
  }
  module.exports = { listMembers };

Quick Bash example — kick off the Python audit

#!/usr/bin/env bash
  set -euo pipefail
  CONFIG_FILE=${1:-config.yaml}
  python3 -m venv .venv && source .venv/bin/activate
  pip install -r requirements.txt
  python run_audit.py --config "$CONFIG_FILE" --output report.json

Handling real-world friction points

Pagination and rate limits

Many vendor SDKs are synchronous; wrap them in thread pools for concurrency. Respect rate limits — implement exponential backoff and token buckets. Cache invoices and user snapshots to avoid repeated expensive calls.

Data normalization

Map vendor fields to canonical terms: cost_usd, period, active_users_count, feature_tags, integrations_count. Keep a transformation pipeline so new connectors can reuse normalization rules.

Identity mapping

Use emails as primary keys but account for aliases and SSO-linked addresses. Implement heuristics to merge duplicates (lowercase, strip domains if needed), and flag ambiguous entries for manual review. Treat identity incidents with the same urgency as any large-scale compromise — see enterprise playbooks for response expectations.

Security and compliance

Store secrets in Vault/Secrets Manager, not in configs.
Use least-privilege API tokens (read-only billing and SCIM scopes when available).
Log access and use immutable audit trails for recommendations.
Redact PII in any reports shared outside procurement or legal.

Example run — sample output and interpretation

Below is a condensed sample of what the reporter produces. This is an illustrative run based on synthesized data to demonstrate interpretation.

[{
    'pair': ['AcmeChat', 'TeamChat+'],
    'user_overlap': 0.82,
    'feature_overlap': 0.75,
    'cost_savings_est': 24000.0,
    'consolidation_score': 86.4,
    'recommendation': 'Consolidate TeamChat+ into AcmeChat; migrate integrations; estimated 6-8 weeks dev effort.'
  },
  {
    'pair': ['DocStoreX', 'BlobDocs'],
    'user_overlap': 0.12,
    'feature_overlap': 0.2,
    'consolidation_score': 18.0,
    'recommendation': 'Low priority — different audiences and low overlap.'
  }]

Interpretation guidance:

Scores > 75: high consolidation potential — add to roadmap after stakeholder review.
Scores 40–75: moderate — consider pilot and check integrations and data migration complexity.
Scores < 40: low priority — focus on governance (access review, offboarding).

Operationalizing recommendations

Stakeholder review: product, engineering, security, procurement.
Data migration plan and test exports (ensure no data loss for core workflows).
Integration plan: move webhooks, reconfigure OAuth, map APIs.
Offboarding: revoke keys, cancel subscriptions after validation and final invoice reconciliation.
Post-migration metrics: adoption, defect regressions, cost realization.

Advanced strategies and 2026 predictions

Trends and strategies to watch as SaaS landscapes continue to evolve:

AI-assisted feature mapping: In 2026, expect vendor-agnostic ML models to map feature semantics from docs and usage logs to reduce manual taxonomy work — pair this with explainability and model provenance tooling like live explainability APIs.
Procurement + FinOps integrations: Automated actions (pause seats, downgrade plans) triggered by consolidation score thresholds will become best practice—paired with approval workflows (procurement playbooks).
Unified usage telemetry: Emerging standards for billing/usage exports (inspired by cloud cost APIs) will reduce normalization work; adopt these as vendors support them — see notes on data fabric and live APIs.
Dynamic purchasing: Usage-based pricing will push teams to autoscale features and centralize procurement to avoid bill surprises.

Checklist — run a SaaS consolidation audit in one week

Collect invoice exports from procurement and corporate cards (Day 0–1).
Inventory vendors and prioritize top 80% of spend (Day 1).
Wire connectors for the top vendors (Day 2–3).
Run the auditor and review the consolidation report (Day 4).
Run stakeholder reviews and create 90-day consolidation backlog (Day 5).

Security, licensing, and trust

When reusing third-party scripts, validate license compatibility (MIT/Apache vs closed-source). Vet any community connectors for secure secret handling and minimal scopes. For production, prefer running connectors in a centrally managed environment (CI, serverless job with restricted runtime).

Practical takeaways

Don’t guess — measure: Use identity (SSO/SCIM) plus billing to compute active usage and cost-per-active-user.
Score by multiple signals: User overlap + feature overlap + cost capture consolidation potential more accurately than cost alone.
Automate, but validate: Generate recommendations automatically; require human signoff for any cancellation or migration.
Integrate with FinOps: Feed consolidation outputs into procurement workflows and budget planners.

Getting started — code & next steps

Clone a starter repo that contains the connector contracts, an async runner, and sample connectors for Stripe and Okta. Use the runner to prioritize the top 10 vendors by spend. Tune the scoring weights to reflect business priorities (e.g., compliance risk may lower consolidation appetite for certain tools).

Example: a product org used this approach in a pilot and identified three consolidations representing 22% of annual SaaS spend with a 10-week migration plan. Your results will vary; start small and iterate.

Call to action

If you want the starter script bundle and a one-page checklist to run your first audit this week, download the repo and adapt the connector templates. Run the scripts against your top 10 vendors and share the consolidated report with procurement and engineering for a 30-minute review — you’ll be surprised how quickly waste becomes actionable savings.

Ready to start? Download the starter audit scripts, or reach out to our team for a hands-on workshop to run a pilot and integrate the output into your FinOps pipeline.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.