IBM’s NorthPole chip – nearly ten years in the making – is going from strength to strength and achieved a new milestone with researchers publishing a set of fantastic benchmarking results in the journal Science.
The 12nm chip, built on the TrueNorth architecture, is 25-times more power efficient than commonly used 12nm GPUs and 14nm CPUs. This is according to testing on the ResNet-50 model, and was measured as the number of frames interpreted per joule of power.
NorthPole is also much better in terms of latency and space required to compute, and outperofrms all major architectures, including a GPU implemented using a 4nm process, according to IBM.
Melding computing power with memory
How does it manage to achieve such results? The memory is on the chip itself rather than connected separately – embedded in each of the 256 cores on the chip. NorthPole also comprises 22 billion transistors, and its cores can perform 2,048 operators per core.
Its architecture eliminates the Von Neumann bottleneck, according to the firm, which revolves around the delays caused by the need for data to travel between the CPU and RAM in most systems. As a result, it can perform much faster than the best GPUs out there, including the best AI-centric graphics cards by Nvidia.
“Architecturally, NorthPole blurs the boundary between compute and memory,” said IBM Research’s Dharmendra Modha. “At the level of individual cores, NorthPole appears as memory-near-compute and from outside the chip, at the level of input-output, it appears as an active memory.”
AMD has also tapped into the concept of combining memory and compute on a single component. Building on the theme of processor-in-memory (PIM), Xilinx showcased its Virtex XCVU7P card last month, which had eight accelerator-in-meomry (AiM) modules.
IBM, which adds memory to each compute core in its NorthPole chip, sees this component as perfect for emerging AI use cases, including computer vision-related uses. It was also tested on natural language processing and speech recognition. NorthPole is also suited to edge applications that require massive amounts of data processing in real-time.