Skip to content

Computer Vision in Industry: Real Cases and ROI

A
abemon
| | 12 min read | Written by practitioners
Share

Beyond the proof of concept

Computer vision has a credibility problem. There are thousands of impressive demos on LinkedIn: a model detecting cracks in bridges, another counting people in a store, another identifying defects on a production line. What is missing from those demos is three things: the real cost of operating it, the real accuracy under production conditions (not on the curated dataset), and the numbers proving the investment pays back.

This article documents three real implementations across different sectors, with their actual numbers. These are not hypothetical projects. They are systems that process images every day and generate measurable value.

Case 1: Construction progress tracking

The problem

A property developer with 6 simultaneous construction sites needed to verify actual progress against the schedule. The existing method: a site manager visits each site weekly, takes photos, writes a report, and sends it by email. The developer receives 6 subjective reports with 3-5 days of delay.

The consequences of not detecting delays early: contractual penalties of up to EUR 1,500 per day per site, cost overruns from re-planning, and tension with buyers expecting delivery dates.

The solution

Fixed cameras installed at each site (one per main facade, typically 4 per site) capture images every hour. An object detection model trained with YOLOv8 identifies structural elements: pillars, floor slabs, cladding, roofing, window frames. A second model compares detected state against the project BIM and calculates completion percentage by phase.

The pipeline: camera -> S3 storage -> GPU inference (an AWS T4 instance) -> BIM comparison -> Grafana dashboard with progress percentage by site and phase -> alerts if actual progress diverges from planned by more than 5%.

Production accuracy: 91.4% in structural element detection (measured against quarterly manual inspection). Error concentrates in early phases (foundations) where elements are visually ambiguous from fixed cameras. In structure and finishing phases, accuracy rises to 95.8%.

The ROI

Costs:

  • Hardware: 24 cameras (4 per site x 6 sites) at EUR 280 each = EUR 6,720.
  • Cloud infrastructure: T4 GPU spot instance + storage = EUR 340/month.
  • Development and integration: EUR 35,000 (one-time).
  • Maintenance: EUR 2,000/month (monitoring, quarterly retraining).

Savings:

  • Early delay detection: 40% reduction in contractual penalties. The developer averaged EUR 22,000 annually in penalties. Reduction: EUR 8,800/year.
  • Elimination of 80% of physical site visits by the site manager (only visits when the system detects anomalies). Time and travel savings: EUR 14,400/year.
  • Planning improvement: 15% reduction in re-planning cost overruns. Estimated savings: EUR 45,000/year.

Payback: Total first-year cost (hardware + development + operations) is approximately EUR 65,000. Annual savings are EUR 68,200. Payback in 11.4 months. From the second year, operational cost is EUR 28,000 against EUR 68,200 in savings. Net ROI is 144%.

Case 2: Shelf monitoring (retail)

The problem

A supermarket chain with 45 stores was losing an estimated 3.2% of sales to undetected stockouts. Empty shelves represent direct lost sales. The existing method: scheduled restocking every 4 hours and manual reports from floor staff. Gaps between restocking cycles went undetected until a customer complained or the next cycle arrived.

The solution

Existing security cameras (already installed in all stores) repurposed for shelf monitoring. A void detection model trained on a proprietary dataset of 12,000 images labeled by chain staff. The model detects empty shelf sections and cross-references with the planogram to identify which product is missing.

The pipeline processes one image per aisle every 15 minutes. When it detects a void persisting across two consecutive captures (30 minutes), it generates an alert to the restocking team with the location and identified product.

Accuracy: 88.7% in void detection, 76.3% in missing product identification. Product identification is the weak point because security cameras lack the optimal resolution for reading labels. To improve this, we combine visual detection with inventory data: if the shelf is empty and backroom stock for the product assigned to that position is above zero, the probability it is that product is high.

The ROI

Costs:

  • Additional hardware: zero (existing cameras reused; an NVR with export capability added per store, EUR 800 x 45 = EUR 36,000).
  • Cloud infrastructure: image processing = EUR 1,200/month.
  • Development: EUR 42,000 (one-time, including dataset labeling).
  • Maintenance: EUR 1,500/month.

Savings:

  • Stockout reduction from 3.2% to 1.8%. For a chain averaging EUR 180,000 monthly revenue per store, this represents recovered sales of EUR 2,520/store/month. Across 45 stores: EUR 113,400/month.
  • 25% reduction in restocking hours through better prioritization. Estimated savings: EUR 18,000/month across the chain.

Payback: First-year cost: EUR 110,400 (hardware + development + 12 months of operations). First-year savings: EUR 1,576,800. Payback in 25 days. These numbers explain why computer vision in retail is one of the fastest-adopting use cases.

Case 3: Package inspection (logistics)

The problem

A logistics operator processing 8,000 packages daily needed to detect visible damage (crushing, tears, moisture) before delivery to recipients. The existing method: manual visual inspection in the dispatch area. With 8,000 packages and 12 operators, each operator had an average of 6 seconds per package. The damage detection rate was 62% (measured against random audits).

Damaged packages delivered generated claims, returns, and re-shipping costs averaging EUR 18,000 monthly.

The solution

Four high-speed cameras installed in the dispatch tunnel, capturing each package from four angles as it passes along the conveyor belt. An anomaly detection model trained with ResNet-50 as backbone and fine-tuned on a dataset of 8,500 package images (4,200 with damage, 4,300 without). The model classifies each package as “OK” or “requires review” in 180 milliseconds.

Packages flagged as “requires review” are automatically diverted to a manual inspection line where an operator verifies the condition and decides whether to deliver, repackage, or return.

Production accuracy: 94.1% recall (detects 94.1% of actual damage) with an 8.3% false positive rate (packages flagged as damaged that are fine). The false positive rate is acceptable because the cost of reviewing a package unnecessarily is low (30 seconds of an operator’s time), while the cost of missing damage is high (claim, re-shipping, dissatisfied customer).

The ROI

Costs:

  • Hardware: 4 industrial cameras + tunnel lighting + inference PC with GPU = EUR 12,800.
  • Conveyor belt integration (automatic diversion) = EUR 8,500.
  • Development and training: EUR 28,000.
  • Maintenance: EUR 1,200/month.

Savings:

  • 52% reduction in damage claims (from EUR 18,000 to EUR 8,640/month). Savings: EUR 9,360/month.
  • Reduction of 4 operators dedicated to visual inspection (reassigned to other tasks). Opportunity cost savings: EUR 7,200/month.
  • 35% reduction in re-shipping costs. Savings: EUR 3,150/month.

Payback: First-year cost: EUR 63,700. First-year savings: EUR 236,520. Payback in 3.2 months.

Common implementation patterns

All three cases share patterns that recur in any production computer vision deployment.

Edge vs. cloud. The decision to process at the edge (on location) or in cloud depends on required latency and image volume. Package inspection requires 180ms latency: edge is mandatory. Construction monitoring can tolerate minutes: cloud is more efficient. Retail is an intermediate case where processing can be local (on the NVR) or in cloud with batching every 15 minutes.

Continuous retraining. Models degrade when visual conditions change (new lighting, new product types, seasonal changes). A quarterly retraining pipeline with fresh production images maintains accuracy. Automating assisted labeling (the model proposes, a human validates) reduces retraining cost by 60%.

Business metrics, not model metrics. The model’s mAP is relevant for the technical team. The business wants to know: how many damage events did we catch, how many stockouts did we prevent, how much did we save in penalties. Model metrics feed business metrics, but the dashboard the CFO sees shows euros, not precision percentages.

Start simple. All three projects started with pre-trained models and fine-tuning on relatively small datasets (between 4,000 and 12,000 images). None required training a model from scratch. The investment in data labeling is significant but predictable. The investment in exotic model architectures is unpredictable and rarely justified in industrial applications.

Computer vision in industry is no longer experimental technology. It is production technology with demonstrable ROI. The limiting factor is not the model. It is the engineering that connects the camera to the business decision.

For the retail-specific case, our article on smart retail and AI-powered stock management goes deeper into shelf monitoring. And to take these models to production with a reliable pipeline, see our MLOps guide.

About the author

A

abemon engineering

Engineering team

Multidisciplinary engineering, data and AI team headquartered in the Canary Islands. We build, deploy and operate custom software solutions for companies at any scale.