Broadcom’s Tomahawk Ultra asks why UALink over Ethernet? • The Register

Chip vendors like AMD may be closing the gap with Nvidia on GPU FLOPS, memory bandwidth, and HBM capacity, but without a high-speed interconnect and switch, like NVLink and NVSwitch, their ability to scale that performance remains limited.

These technologies have allowed Nvidia to build rack-scale systems with 72 GPUs, while Intel and AMD are still stuck at eight. To get around this limitation, many in the industry have thrown their weight behind the emerging Ultra Accelerator Link (UALink) protocol, an open alternative to Nvidia’s NVLink.

But not everyone agrees that a new protocol is necessary or is willing to wait for the first UALink hardware to be taped out. Once a founding member of the UALink consortium, Broadcom now believes that Ethernet is more than capable of getting the job done sooner.

“There is a huge benefit to having the same technology for all parts of the network,” Pete Del Vecchio, product line manager for Broadcom’s Tomahawk line, told El Reg. “There’s a lot of goodness that comes with using Ethernet as far as monitoring, telemetry, and debugging tools. That’s why we just don’t think UALink is going to go anywhere.”

Broadcom hasn’t turned in its UALink membership card just yet. It still has a voice at the table, and Del Vecchio won’t rule out the possibility of a UALink switch down the line. But as things stand, it’s not on the roadmap, he said.

“Our position is you don’t need to have some spec that’s under development that maybe you’ll have a chip a couple of years from now,” Del Vecchio said.

Instead, Broadcom is pushing ahead with a competing technology it’s calling scale-up Ethernet, or SUE for short. The technology, Broadcom claims, will support scale-up systems with at least 1,024 accelerators using any Ethernet platform. For comparison, Nvidia says its NVLink switch tech can support 576 accelerators, though to date we’re not aware of any deployments that scale beyond 72 GPU sockets.

Tomahawk Ultra

Broadcom’s headline silicon for SUE is the newly announced Tomahawk Ultra, a 51.2 Tbps switch ASIC that’s been specifically tuned to compete with Nvidia’s InfiniBand in traditional supercomputers and HPC clusters, as well as NVLink in rack-scale-style deployments akin to Nvidia’s GB200 NVL72 or AMD’s Helios.

In case you’re wondering, while Tomahawk Ultra does share the same package and is pin-compatible with Broadcom’s Tomahawk 5 (TH5), it’s completely different silicon under the hood.

Along with having a relatively large radix with 512 x 100 Gbps serializer deserializers (SerDes), the chipset has been tuned specifically for high-performance networks and will purportedly deliver latency as low as 250 nanoseconds while pushing around 64-byte packets as much as 77 billion times a second.

This is significant as these smaller packets are common in HPC systems and can prove problematic for networking kit not equipped for the higher message rates that come along with them. Tomahawk Ultra gets around this in part by implementing an optimized Ethernet header that allows for larger payloads even when dealing with smaller packets.

The chip also features a full complement of congestion control mechanisms, including forward error correction and credit-based flow control to mitigate packet loss while maintaining compatibility with existing Ethernet NICs and DPUs.

The switch also offers support for in-network collection, which Nvidia calls SHARP in its NVLink switches and allows operations like all-reduce to be offloaded onto the network, which has the benefit of improving network efficiency by reducing the amount of bandwidth required to complete these operations.

Speaking of scale-up switch architectures, compared to Nvidia’s fifth-gen NVLink switches, Tomahawk Ultra offers just under twice the bandwidth at 51.2 Tbps versus 28.8 Tbps. That means using the same number of switches as we see in Nvidia’s 72-GPU NVL systems, Broadcom could support a scale-up fabric with 128 accelerators.

Against UALink, Del Vecchio claims Tomahawk Ultra already offers better latency, though that claim is hard to assess until the first hardware actually ships.

As Kutis Bowman, director of architecture and strategy at AMD and chairman of the UALink Consortium, recently told our sibling site The Next Platform, the consortium expects switch latency in the 100-150 ns range, which, if they can pull it off, could give the protocol an edge in certain applications.

With that said, we’ll have to wait and see just how Broadcom’s latest silicon actually stacks up against NVLink and eventually UALink in the real world. Thankfully, we shouldn’t have to wait long. Broadcom says Tomahawk Ultra ASICs are already shipping to customers, and since they’re pin-compatible with TH5, it should be relatively straightforward to repurpose existing switch chassis.

Best of both worlds?

Of course, just because UALink hardware hasn’t hit the market doesn’t mean the protocol is out of reach for AMD or Intel. Back in April, the UALink consortium published its first spec, and at its Advancing AI event in June, AMD revealed its Helios rack systems, which will use both UALink and Ethernet for its scale-up fabric.

That’s right – for its first rack-scale systems, AMD will be tunneling the UALink protocol over conventional Ethernet switches, which means that AMD will start working out any potential gremlins in the v1.0 spec while its network partners are still bringing their first UALink silicon to market.

“Other transport protocols, such as UALink or Infinity Fabric, can be transported over Ethernet. If you already have silicon that already does the low latency, high reliability, talk whatever you want, just do it over Ethernet,” Robin Grindley, a principal product line manager at Broadcom, told us.

However, tunneling UALink over Ethernet isn’t exactly ideal. Most notably, there’s no way you’re going to be getting anywhere close to UALink’s 100-150 ns target. On the other hand, you can’t ship what you don’t have, and if AMD waited until 2027 to bring its Helios rack to market, it’d have to contend with Nvidia’s 600 kW, 144-GPU-socket Kyber systems instead. ®

Leave a Comment