logo
Casa Casos

WEKA Integrates NeuralMesh with NVIDIA STX to Address AI Inference Memory Bottlenecks

Certificado
China Beijing Qianxing Jietong Technology Co., Ltd. Certificações
China Beijing Qianxing Jietong Technology Co., Ltd. Certificações
Revisões do cliente
A equipe de vendas da tecnologia Co. de Qianxing Jietong do Pequim, Ltd é muito profissional e paciente. Podem fornecer cotações rapidamente. A qualidade e o empacotamento dos produtos são igualmente muito bons. Nossa cooperação é muito lisa.

—— LLC do》 de Festfing DV do 《

Quando eu procurava o processador central de intel e o SSD de Toshiba urgentemente, Sandy da tecnologia Co. de Qianxing Jietong do Pequim, Ltd deu-me muita ajuda e obteve-me os produtos que eu precisei rapidamente. Eu aprecio-a realmente.

—— Kitty Yen

Sandy da tecnologia Co. de Qianxing Jietong do Pequim, Ltd é um vendedor muito cuidadoso, que possa me lembrar de erros da configuração a tempo quando eu compro um servidor. Os coordenadores são igualmente muito profissionais e podem rapidamente terminar o processo de teste.

—— Strelkin Mikhail Vladimirovich

Estamos muito satisfeitos com a nossa experiência de trabalho com a Beijing Qianxing Jietong. A qualidade do produto é excelente e a entrega é sempre pontual. A equipe de vendas é profissional, paciente e muito prestativa com todas as nossas perguntas. Agradecemos muito o seu apoio e esperamos uma parceria de longo prazo. Altamente recomendado!

—— Ahmad Navid

Qualidade: Ótima experiência com o meu fornecedor. O MikroTik RB3011 já estava usado, mas estava em muito bom estado e tudo funcionava perfeitamente.E todas as minhas preocupações foram resolvidas rapidamente.Fornecedor muito fiável, altamente recomendado.

—— Geran Colesio

Estou Chat Online Agora

WEKA Integrates NeuralMesh with NVIDIA STX to Address AI Inference Memory Bottlenecks

April 10, 2026
WEKA has announced the integration of its NeuralMesh platform with the NVIDIA STX reference architecture, establishing its Augmented Memory Grid as a key building block for next-generation AI infrastructure. The combined solution addresses one of the most significant bottlenecks in large-scale inference environments: memory constraints that directly affect performance, total cost of ownership, and scalable growth.

Operating through NeuralMesh, WEKA’s Augmented Memory Grid expands GPU memory by externalizing and persisting key-value caches. When deployed with NVIDIA STX, this architecture delivers high-throughput context memory storage for agentic AI workloads, supporting long-context reasoning across sessions, tools, and end-to-end workflows. According to the company, configurations combining NVIDIA Vera Rubin NVL72 systems, BlueField-4 DPUs, and Spectrum-X Ethernet can boost context memory token throughput by 4x to 10x. The platform is also projected to deliver at least 320 GB/s read and 150 GB/s write throughput, more than doubling the performance of traditional AI storage architectures.

mais recente caso da empresa sobre WEKA Integrates NeuralMesh with NVIDIA STX to Address AI Inference Memory Bottlenecks  0

Memory Infrastructure Becomes the Inference Bottleneck


WEKA centers this integration on the growing memory wall challenge in modern AI deployments. Within today’s inference pipelines, limited high-bandwidth GPU memory forces frequent KV cache evictions, leading to repeated recomputation and diminished operational efficiency. As system concurrency rises, these inefficiencies multiply, increasing infrastructure expenses and reducing performance predictability.

The company promotes shared KV cache infrastructure as the solution. By preserving persistent context across users and sessions, shared caching eliminates redundant processing and stabilizes token throughput. NVIDIA STX provides the validated reference architecture for this model, while WEKA delivers the storage and memory extension layer.

NeuralMesh and Augmented Memory Grid Architecture


NeuralMesh acts as WEKA’s distributed storage platform, built to integrate seamlessly across the full NVIDIA STX stack. It delivers high-performance data services optimized for AI workloads, while the Augmented Memory Grid serves as a dedicated memory expansion layer that consolidates KV cache outside of GPU memory.

This design allows inference environments to sustain long-context sessions without overloading GPU resources. By retaining cache state and enabling reuse across workloads, the platform maintains high utilization and consistent performance as deployments scale.

WEKA notes that the Augmented Memory Grid, first unveiled at GTC 2025 and now generally available, has been validated on NVIDIA Grace CPU platforms paired with BlueField DPUs. The architecture delivers measurable gains in inference efficiency, including drastically faster time-to-first-token, higher per-GPU token throughput, and stable performance under increased concurrency. Offloading the data path to BlueField-4 also reduces CPU overhead and alleviates I/O bottlenecks.

Performance and Efficiency Gains


In production-like environments, the platform is engineered to enhance responsiveness and infrastructure efficiency. WEKA states that the Augmented Memory Grid can reduce time-to-first-token by 4x to 20x, while increasing per-GPU token output by up to 6.5x. These improvements stem from higher KV cache hit rates and fewer recomputation cycles, enabling systems to maintain performance as context sizes and user counts expand.

Firmus, an AI infrastructure provider, is highlighted as an early adopter leveraging NeuralMesh with NVIDIA-based infrastructure. The firm reports improved token throughput and lower latency at scale, with gains coming from more efficient use of existing GPUs rather than additional hardware deployments.

Implications for AI Infrastructure Design


This integration highlights a shift in AI system design, where memory and storage strategies increasingly define overall performance and cost efficiency. As agentic AI workloads expand and context windows widen, DRAM-only approaches become unsustainable due to rising recomputation costs and underutilized GPUs.

WEKA positions persistent, shared KV cache as a foundational capability for AI factories. Organizations adopting this model can achieve higher GPU utilization, lower energy consumption per inference task, and more predictable scaling. In contrast, environments relying exclusively on local GPU memory will likely face rising operational costs and diminishing returns as workloads grow.

Beijing Qianxing Jietong Technology Co., Ltd.
Sandy Yang/Global Strategy Director
WhatsApp / WeChat: +86 13426366826
Email: yangyd@qianxingdata.com
Website: www.qianxingdata.com/www.storagesserver.com
Business Focus:
ICT Product Distribution/System Integration & Services/Infrastructure Solutions
With 20+ years of IT distribution experience, we partner with leading global brands to deliver reliable products and professional services.
“Using Technology to Build an Intelligent World”Your Trusted ICT Product Service Provider!
Contacto
Beijing Qianxing Jietong Technology Co., Ltd.

Pessoa de Contato: Ms. Sandy Yang

Telefone: 13426366826

Envie sua pergunta diretamente para nós (0 / 3000)