As AI moves from massive data centers to local “Edge” deployments, the industry is adopting a new vocabulary. We sat down with Matt Williams from Cornelis Networks to demystify the terminology and explore why the network is the secret to getting more “tokens” out of your AI hardware.

Understanding the “Token”

In the world of AI inference, everything boils down to tokens. But what exactly are they?

  • Definition: A token is a numerical representation of data. When you type a prompt, the model breaks words or phrases into numbers (tokens) to “digest” them.
  • The Process: You feed the model tokens (input), and it generates new tokens (output) to form an answer.
  • The Goal: The ultimate metric for AI efficiency is Tokens per Second. If your network is slow, your expensive GPUs sit idle, waiting for data. A better network “gets out of the way,” allowing your system to generate more tokens in less time.
Q&A: Inference and the Edge

Q: Is high-performance networking only for massive supercomputers?

Matt: Not at all. Five years ago, the most challenging applications were traditional High-Performance Computing (HPC) simulations. Today, AI is essentially a specialized form of HPC. The requirements are identical: low latency, high bandwidth, and total losslessness. Whether you have a massive cluster or a single rack at the edge, if you’re running AI, you need a fabric that can keep up.

Q: How does Cornelis optimize for “Edge” AI specifically?

Matt: Edge deployments are often compact—maybe just half a rack. Our architecture is incredibly flexible; we can “bifurcate” (split) our ports to support high density in small form factors. Because we are vendor-neutral, we can support a mix of compute types in one cluster—one GPU optimized for video inference and another for audio—all running on the same high-performance fabric.

Q: What makes your software “Open Source” and why does it matter?

Matt: Most proprietary networks lock you into their ecosystem. We do the opposite.

  • LibFabric: We invented this efficient software stack, and it’s now being adopted as the industry standard for Ultra Ethernet.
  • Upstreamed Drivers: Our drivers are built directly into the Linux kernel (kernel.org). You don’t have to install “magic” third-party software; it just works.
  • Seamless Stacks: We support NVIDIA CUDA and AMD ROCm out of the box. Whether you’re training or running inference, your existing software runs better because the underlying network is faster.
The Roadmap to Ultra Ethernet

Cornelis isn’t just building for today; they are a key player in the Ultra Ethernet Consortium (UEC).

Generation Timing Key Feature Edge Benefit
CN5000 Shipping Now 400Gbps Omni-Path Lowest latency for real-time inference.
CN6000 Late 2026 800Gbps Dual-Protocol One port for AI performance, one for standard Storage.
CN7000 2027 1.6T Ultra Ethernet Full compatibility with integrated RISC-V offloads.
Summary: The Connectivity Advantage

Whether you are dealing with copper cables (up to 3 meters) or optical fibers (tens of meters), Cornelis focuses on bi-directional throughput. Unlike some cards that might struggle with heavy “in and out” traffic, the Cornelis SuperNIC delivers 800 million messages per second in both directions simultaneously.

For the enterprise, this means your AI isn’t just “smart”—it’s fast, reliable, and built on an open foundation that won’t lock you in.

Additional Resources

Cornelis Customer Webinar – ASI Technology Summit

ASI Blog – Cornelis – Network Performance for AI Inference and Edge Computing

ASI Blog – The Rise of Cornelis Networks – Unlocking AI/HPC Performance with Omni-Path and Ultra-Ethernet