Technology audit: when to scale and when to optimize what you have
Symptoms that look like scaling problems but aren’t
The most expensive mistake in infrastructure decision-making is scaling a system that doesn’t need to scale. It needs to be fixed. In our consulting practice, roughly 70% of the “we need to scale” conversations we have with CTOs turn out to be optimization problems, not capacity problems. The system isn’t slow because it lacks resources. It’s slow because it wastes the resources it has.
Here are the patterns we see most often.
N+1 queries. A page that loads a list of 50 items makes 51 database queries: one to fetch the list, and one per item to fetch related data. The page takes 3 seconds. The team’s instinct is to add a read replica or a bigger database instance. The actual fix is an eager-loading join that reduces 51 queries to 1. Response time drops to 200ms. The existing database was never the bottleneck.
Missing indexes. A query that scans 10 million rows to return 50 results will be slow regardless of how powerful the database server is. We routinely find tables with millions of rows and no indexes on columns used in WHERE clauses, ORDER BY, or JOIN conditions. Adding the correct index turns a 5-second query into a 5-millisecond query. No infrastructure change needed.
Synchronous where async works. A request handler that calls three external APIs sequentially, waiting for each response before starting the next, takes the sum of all three latencies. Making those calls concurrently takes the maximum of the three. If each API responds in 200ms, sequential processing takes 600ms and concurrent processing takes 200ms. Same infrastructure, three times faster.
Over-fetching. An API that returns the full user object (with all 40 fields, nested relationships, and audit history) when the frontend only needs the user’s name and email is transferring 100x more data than necessary. This wastes bandwidth, serialization time, database load, and memory. Trimming the response to what’s actually needed often eliminates the performance complaint entirely.
Untuned connection pools. A service with 10 instances, each maintaining a connection pool of 20 connections to a PostgreSQL database with a max_connections of 100, has a problem. 200 potential connections competing for 100 slots means connection timeouts under load. The fix isn’t a bigger database. It’s right-sizing the pools or adding PgBouncer as a connection pooler.
Every one of these problems presents symptoms that look like infrastructure limitations: slow responses, timeouts, high resource utilization. But none of them are solved by adding more hardware. The diagnostic step — actually measuring where time and resources are spent — is what separates effective engineering from expensive guesswork.
The diagnostic framework: measure before deciding
Before making any scaling or optimization decision, you need data. Not intuition, not vendor benchmarks, not architectural diagrams. Actual measurements from your production system under your actual workload.
Step 1: Profile the request path. For the endpoints that are slow or failing, trace the full execution path. Where is time spent? Database queries? External API calls? Serialization? Computation? Use APM tools (Datadog, New Relic, or open-source alternatives like OpenTelemetry with Jaeger) to break down latency by component. The bottleneck is almost never where the team assumes it is.
Step 2: Analyze resource utilization patterns. High CPU? Check if it’s application code or garbage collection. High memory? Check for leaks or oversized caches. High disk I/O? Check query patterns and index coverage. High network? Check payload sizes and connection reuse. Each resource type has different optimization strategies, and “the server is at 90% CPU” is a symptom, not a diagnosis.
Step 3: Calculate cost-per-request economics. What does each request cost in infrastructure terms? If your monthly infrastructure bill is $10,000 and you serve 30 million requests, your cost is $0.00033 per request. Now model the alternatives: if optimization reduces the cost per request by 40%, that’s $4,000/month saved. If scaling doubles capacity at $10,000/month more, the cost per request stays flat. This economic framing makes the decision concrete.
Step 4: Identify the actual bottleneck. Amdahl’s Law applies: optimizing a component that accounts for 5% of total latency will improve total performance by at most 5%, regardless of how dramatic the optimization is. Focus on the component that dominates the execution time. If 70% of request latency is database queries, that’s where optimization effort belongs.
When to actually scale
Sometimes, after honest measurement, the answer genuinely is “we need more capacity.” Here are the indicators that scaling is the correct response.
Genuine load growth. Traffic has doubled in six months and the trend continues. You’ve already optimized the hot paths. The system is efficient but approaching its capacity ceiling. This is healthy scaling — the business is growing and the infrastructure needs to keep pace.
Latency requirements that optimization can’t meet. You need p99 latency under 50ms, and the physics of a single-server architecture prevent it regardless of optimization. Geographic distribution, read replicas for query parallelism, or edge caching may be necessary. This is scaling driven by latency constraints, not throughput.
Availability targets that single-instance architecture can’t provide. A 99.99% availability SLO (52 minutes of downtime per year) requires redundancy. A single database server, no matter how well optimized, will have maintenance windows and occasional failures that exceed that budget. Scaling for availability — replicas, multi-region deployment, failover mechanisms — is a fundamentally different problem from scaling for performance.
Isolation requirements. When one tenant’s workload impacts another’s performance, workload isolation through separate instances or resource quotas may be necessary. This is scaling for fairness, and it’s particularly relevant in multi-tenant SaaS architectures.
The optimization-first approach
When the diagnostic reveals optimization opportunities (which it usually does), here is the priority order that delivers the best return on engineering investment.
Query optimization. Add missing indexes. Rewrite N+1 patterns as joins or batch queries. Add appropriate LIMIT clauses. Review execution plans for full table scans. This is almost always the highest-ROI optimization category. A single index can eliminate a scaling discussion.
Caching strategy. Identify data that’s read frequently and written infrequently. Application-level caching with Redis or Memcached for API responses, database query results, or computed values can reduce database load by 80% or more. But cache invalidation must be designed carefully — stale data is often worse than slow data. Time-based TTLs work for data that can tolerate eventual consistency. Event-driven invalidation works for data that needs immediacy.
Connection management. Connection pooling for databases (PgBouncer for PostgreSQL, ProxySQL for MySQL), HTTP connection reuse with keep-alive, and gRPC for service-to-service communication with persistent connections. Connection establishment is expensive. Reusing connections is cheap.
Async processing. Move work that doesn’t need to happen in the request path to background queues. Email sending, report generation, analytics events, webhook delivery — none of these need to block the user’s response. A message queue (RabbitMQ, SQS, or Redis Streams for simpler cases) decouples the request from the processing.
Payload optimization. Compress responses. Paginate large lists. Use GraphQL or sparse fieldsets to return only requested data. Implement ETags for client-side caching. Reduce serialization overhead with efficient formats (Protocol Buffers or MessagePack for internal services, compressed JSON for external APIs).
The decision tree: scale, optimize, or get help
Here is the decision framework we use with our clients.
Start: Is the system measurably slow or failing?
If no: you don’t have a problem yet. Monitor and revisit when you do. Premature scaling is waste.
If yes: Have you profiled the actual bottleneck?
If no: profile first. No decisions without data.
If yes: Is the bottleneck in application code or infrastructure?
If application code: optimize. Query patterns, caching, async processing, connection management. This is cheaper, faster, and more durable than scaling.
If infrastructure: Is current utilization above 70% after optimization?
If no: there’s still optimization headroom. Revisit the application bottleneck analysis.
If yes: Is growth projected to continue?
If no: optimize to buy time. The load may plateau.
If yes: scale. But scale the specific bottleneck, not everything. If the database is the constraint, add read replicas or shard. If compute is the constraint, add application instances behind a load balancer. If network is the constraint, add edge caching or CDN.
When to bring in external help. If your team has been debating “scale vs optimize” for more than two sprints without resolution, an external technology audit pays for itself. A fresh pair of eyes with cross-industry experience can identify patterns that are invisible to the team that built the system. We typically deliver a diagnostic in one to two weeks that includes: a bottleneck analysis with measurements, a prioritized optimization roadmap, a scaling recommendation with cost projections, and a technical debt assessment that identifies time bombs.
The most common outcome of our audits is not “scale everything.” It’s “fix these three things and you have 18 months of headroom.” That’s a much better answer than a larger infrastructure bill.
About the author
abemon engineering
Engineering team
Multidisciplinary engineering, data and AI team headquartered in the Canary Islands. We build, deploy and operate custom software solutions for companies at any scale.

