Monday, June 8, 2026
  • Subscribe
  • Advertise
  • Advertising Specifications
  • Editorial
  • Editorial Features
  • About Us
  • Contact
Data Centres Africa
  • Magazine Topics
  • Sectors
  • Magazine Issues
  • Editorial Features 2026
No Result
View All Result
  • Magazine Topics
  • Sectors
  • Magazine Issues
  • Editorial Features 2026
No Result
View All Result
Networking+
No Result
View All Result
Home Sectors Critical Communications

Datadog announces GPU Monitoring to help businesses optimise spend and performance as they aim to scale AI projects

May 9, 2026
Reading Time: 3 mins read
Datadog announces GPU Monitoring to help businesses optimise spend and performance as they aim to scale AI projects
Share on LinkedInShare on Twitter

Datadog, Inc., the AI-powered observability and security platform, has announced that GPU Monitoring is available to customers everywhere. The new product addresses one of the most prevalent issues facing organisations today as they look for a scalable and effective way to manage expanding AI costs.

“GPU instances account for 14 percent of compute costs – which is a huge issue as companies are struggling to build AI-first technology in scalable and smart ways. While these companies can see their costs climbing, they can’t chargeback GPU spend across business units, see workload context or identify clear next steps for improvement. As a result, it is very challenging to budget and plan in thoughtful ways,” said Yanbing Li, Chief Product Officer at Datadog. 

The launch of GPU Monitoring marks one of the first times a single solution provides unified visibility across the AI stack – giving customers a single view linking GPU fleet health, cost, and performance directly to the teams relying on them for faster troubleshooting of slow workloads and cost savings. 

“Smartly managing AI spend becomes a board-level conversation when capacity is misallocated, training and inference workloads stall, and costs escalate. We all know managing GPU costs is a huge problem we need to solve, but most companies are experimenting with solutions and it is still very difficult to get a single view of what is happening across the stack. GPU Monitoring fixes that with efficiency and reliability that we haven’t seen before,” said Li.

Today, most GPU tools provide high-level device health metrics, but they don’t surface cross-functional resource contention issues, explain why training and inference workloads fail, or provide visibility into which devices are idle or ineffectively used. This lack of visibility slows down investigations and means that teams overprovision as the safest default – leading to wasted spend.

GPU Monitoring streamlines this work by linking fleet telemetry directly to the workloads consuming those resources, and gives platform engineering and machine learning teams a shared view to investigate together, enabling them to:

  • Scale AI without overspending: With visibility and forecasting based on the usage patterns of fleets and direct guidance on whether to buy new GPUs or free up existing ones, platform teams avoid expensive purchases and long procurement cycles, machine learning teams get capacity faster, and leadership gets better ROI with predictable spend.
  • Accelerate AI delivery: Stalled workloads are correlated directly to the underlying GPUs, pods and processes running them so that teams can troubleshoot performance bottlenecks in minutes instead of hours, allowing engineers to focus on shipping AI projects.
  • Avoid costly disruptions: Unhealthy GPUs are proactively identified before failures cascade across a cluster and cause training and inference delays.
  • Maximize ROI on GPU spend: Teams are empowered and accountable for their GPU utilisation and costs, and can easily pinpoint where they are overreserving or underutilizing their GPUs. This allows teams to reclaim and reallocate resources in order to reduce wasted spend.

Kai Huang, head of product at Hyperbolic, said: “Datadog GPU Monitoring has made it easy for us to stay on top of our multi-tenant GPU infrastructure. We get per-instance, per-device visibility into core utilisation, memory, power and thermals right out of the box with no extra setup. The dashboards are rich out of the gate and simple to customise, and standing up isolated views per customer takes minutes.

“Layering on LLM Observability ties it all together. We can go from a model latency spike straight to the underlying GPU metrics without switching tools. Full stack AI observability in one platform means both our team and our customers can move faster with confidence.”

Related Posts

AI is reshaping jobs faster than companies are reshaping work
Business Continuity

AI is reshaping jobs faster than companies are reshaping work

June 5, 2026
AI investment boom across the UK is fuelled more by fear of missing out than actual results, new research finds
Cloud & Virtualisation

AI investment boom across the UK is fuelled more by fear of missing out than actual results, new research finds

June 4, 2026
Infoblox launches Infoblox IQ to power the next era of agentic AI operations for networking and security
Featured

Infoblox launches Infoblox IQ to power the next era of agentic AI operations for networking and security

June 4, 2026
Absolute Security unveils Lenovo ThinkShield TraceLock, helping customers secure and control Offline PCs 
Security

Absolute Security unveils Lenovo ThinkShield TraceLock, helping customers secure and control Offline PCs 

June 4, 2026

Subscribe

Get the latest networking news and insights delivered to your inbox.

SIGN UP

READ THE LATEST ISSUE

Networking+ is the premier independent resource for communications, network, IT, and data centre professionals. We provide an in-depth look at the rapidly evolving digital infrastructure landscape, covering everything from fixed and wireless LANs to complex enterprise WANs and MANs across both the public and private sectors.

By delivering breaking news, expert analysis, and strategic insights across our print publication, website, and e-newsletters, Networking+ offers a powerful, ‘one-stop’ media combination. Our multi-channel platform is dedicated to keeping industry decision-makers connected, informed, and equipped to future-proof their networks.

Follow Us

Content

  • Magazine
  • Sectors
  • Subscribe
  • Editorial
  • Advertise
  • About Us
  • Features List
  • Privacy Policy
  • Cookies Policy
  • Terms & Conditions

© 2026 Networking+ - A Denyan Media Ltd Publication.

No Result
View All Result
  • Magazine Topics
  • Sectors
  • Magazine Issues
  • Advertise
  • Advertising Specifications
  • Editorial
  • Editorial Features
  • About Us

© 2026 Networking+ - A Denyan Media Ltd Publication.

We use cookies to analyse site traffic and improve your experience with the latest data centre insights. By clicking 'I Agree', you consent to our use of cookies in accordance with our Privacy Policy.