Smarter, faster multicore processors

Tuesday, 07 April, 2015

Computer chips’ clocks have stopped getting faster. To keep delivering performance improvements, chipmakers are instead giving chips more processing units, or cores, which can execute computations in parallel.

But the ways in which a chip carves up computations can make a big difference to performance.

In a 2013 paper, Daniel Sanchez, the TIBCO Founders Assistant Professor in MIT’s Department of Electrical Engineering and Computer Science, and his student, Nathan Beckmann, described a system that cleverly distributes data around multicore chips’ memory banks, improving execution times by 18% on average while actually increasing energy efficiency.

Sanchez’s group have now developed an extension of the system that controls the distribution of not only data but computations as well. In simulations involving a 64-core chip, the system increased computational speeds by 46% while reducing power consumption by 36%.

“Now that the way to improve performance is to add more cores and move to larger-scale parallel systems, we’ve really seen that the key bottleneck is communication and memory accesses,” Sanchez said. “A large part of what we did in the previous project was to place data close to computation. But what we’ve seen is that how you place that computation has a significant effect on how well you can place data nearby.”

Disentanglement

The problem of jointly allocating computations and data is very similar to one of the canonical problems in chip design, known as ‘place and route’. The place-and-route problem begins with the specification of a set of logic circuits, and the goal is to arrange them on the chip so as to minimise the distances between circuit elements that work in concert.

This problem is what’s known as NP-hard, meaning that as far as anyone knows, for even moderately sized chips, all the computers in the world couldn’t find the optimal solution in the lifetime of the universe. But chipmakers have developed a number of algorithms that, while not absolutely optimal, seem to work well in practice.

Adapted to the problem of allocating computations and data in a 64-core chip, these algorithms will arrive at a solution in the space of several hours. Sanchez, Beckmann and Po-An Tsai, another student in Sanchez’s group, developed their own algorithm, which finds a solution that is more than 99% as efficient as that produced by standard place-and-route algorithms. But it does so in milliseconds.

“What we do is we first place the data roughly,” Sanchez said. “You spread the data around in such a way that you don’t have a lot of [memory] banks overcommitted or all the data in a region of the chip. Then you figure out how to place the [computational] threads so that they’re close to the data, and then you refine the placement of the data given the placement of the threads. By doing that three-step solution, you disentangle the problem.”

In principle, Beckmann added, that process could be repeated, with computations again reallocated to accommodate data placement and vice versa. “But we achieved 1%, so we stopped,” he said. “That’s what it came down to, really.”

Keeping tabs

The MIT researchers’ system monitors the chip’s behaviour and reallocates data and threads every 25 milliseconds. That sounds fast, but it’s enough time for a computer chip to perform 50 million operations.

During that span, the monitor randomly samples the requests that different cores are sending to memory, and it stores the requested memory locations, in an abbreviated form, in its own memory circuit.

Every core on a chip has its own cache - a local, high-speed memory bank where it stores frequently used data. On the basis of its samples, the monitor estimates how much cache space each core will require, and it tracks which cores are accessing which data.

The monitor does take up about 1% of the chip’s area, which could otherwise be allocated to additional computational circuits. But Sanchez believes that chipmakers would consider that a small price to pay for significant performance improvements.

Reprinted courtesy of MIT

Image: Daniel Sanchez, Nathan Beckmann and Po-An Tsai. Photo by Bryce Vickmark.

Content from other channels on our network

Dynatrace enters partnership with ServiceNow

WA Government funds undersea mesh system to boost defence comms

Government releases national road transport tech strategy

ACSC publishes defensible architecture advice

Government brings more base stations to North West WA

Govt funds wireless connectivity in North West Queensland

Orbital traffic surges, as 13,000 active satellites recorded

Aust to host global workshop on EME exposure testing

Septentrio and Xona to collaborate on next-gen navigation

Triple Zero Custodian Bill passes, as Senate inquiry looms

New nanomaterial takes shape

'Liquid metal' composite for greener electronics

Sodium-ion battery breakthrough boosts energy storage

Discovery in 2D devices reveals hidden cavities

Electronex expo returns to Sydney in 2026

Smarter, faster multicore processors

Disentanglement

Keeping tabs

Building AI success in ANZ organisations

Gartner identifies the top strategic technology trends for 2026

Maximising business value through sustainable IT infrastructure

Content from other channels on our network