Speaker
Description
computer architecture targetting RAM chips
Highly parallel computation algorithms on structured data can remain
inside the memory chip, removing the need to pass all the data across a
bus to a CPU chip and back.
This can save a great deal of power for very little added complexity of
the RAM chip itself.
As a proof of concept, this computer architecture is implemented inside
an FPGA, mapping the FPGA block RAM to a 1024 bit square array, with
1024 bit serial processors, one for each row.
Each processor consists of a single bit full adder, a little more logic,
and a 6 bit stack.
All processors are controlled, SIMD fashion, by a sequencer.
Variable bit width math functions add, subtract, multiply and divide
are implemented. As support operations there are also 8, 16 and 32 bit
transpose, floating point to fixed point conversion, and vice versa.
All these operations are mapped onto the bit-serial processor. Thus all
1024 rows are processed at the same time.
To demonstrate how it might be used, a prime number finding algorithm is
implemented, which is trivial enough for the audience to understand the
workings of the bit serial engine, and a single precision floating point
matrix multiply to demonstrate the architectures utility.
Were this to be realised within a 4 Gbit RAM chip, there would be
space for a million processors, each with 4k bits of storage -
easily sufficient for the matrix multiply algorithm used in the FPGA
demonstrator.
The FPGA demonstrator is for algorithm research - as very few present
day problems have solutions targetting millions of SIMD processors.
| Institute | Cabridge University |
|---|---|
| Presenting Author | Andy Rabagliati |