Listen to

New analytical demands require a paradigm change in computing

New analytical demands require a paradigm change in computing

Even as enterprises embrace algorithmic automation and data-driven decision support, the business of data analysis is continually evolving to address changing and unanticipated analytic requirements.

In order to service the massive and changing computational needs, the computing platforms supporting these data-driven algorithms must be capable of dynamically adjusting their processing capabilities.

Most computing platforms have evolved significantly in cost, form-factor, frequency and method of deployment, yet remain unchanged in Programmability and Time-Scalability.



Programmability is the characteristic that enables engineers to develop applications for a diverse array of computing needs across many industries using the same platform.



Time-Scalability is the ability of a platform to efficiently execute software that was developed years after the platform was developed. It allows today’s smartphone to run apps and launch software ecosystems that were not even imagined when smart-phone chips first appeared.

Houston, we have a problem!


Present and future machine learning algorithms demand new paradigms in compute capabilities.


Existing CPUs and GPUs are programmable but do not provide the necessary computational efficiency required for AI time-scalability.


Solutions focusing on computational efficiency for AI algorithms lack programmability.


At the rapid pace at which AI algorithms continually evolve, these solutions are virtually obsolete on arrival.


The new AI compute solutions coming to the market offer neither programmability NOR time-scalability.


Composable Computing:
the paradigm shift


Composable Computing:
the paradigm shift

SimpleMachines delivers a powerful breakthrough platform that marries computation efficiency with programmability and time-scalability.

The hardware paradigm shift is in the design of Composable Behavior Execution: a clean-slate design breaking away from 30 years of incremental innovations. Traditional CPUs execute one "instruction" or line-of-code at a time, where the chip has no knowledge of data or the global scope of this instruction's role in the entire program.

SimpleMachines chip instead directly manipulates and understands program properties: data size and shape, whole program size, and shape.

With this global information, our software stack transforms the chip's storage and execution mechanisms on-the-fly to match the applications’ data and computation patterns, achieving the same effect of having a custom chip built for that application.

64 person years
of research
6 PhDs
7 best-paper
13 patents
20 invention

Composable Computing :

Our breakthrough technology allows our compiler to decompose any algorithm into four fundamental behaviors:

Our compiler can take any program at the Tensorflow/ONNX/PyTorch graph-level and deconstruct them into these four behaviors. These four behaviors are directly implemented on a chip, creating an engine that runs as efficiently as a customized chip.

This allows us to create a platform that is completely under software control, while running at the efficiency of a fully customized chip designed for an application.

The software stack is directly integrated into standard frameworks like TensorFlow, PyTorch, and ONNX. It also allows for cloud deployment, multi-tenancy, and workload consolidation. For power users, a C/C++ backend is available and for optimization fans, a Julia backend is available as well. Most importantly, plugging into additional frameworks is a straight-forward software engineering project - no physical manufacturing revisions of the chip are required.




Flexibly implement, decode, execute, and writeback as coarse-grained blocks on chip through our composable behavior engine architecture. The program’s machine code provides a list of functions in the sequence that they happen, be it at the same time or independently one after the other. For any application, the relative balance and interaction between these behavior changes are controlled and orchestrated by our dynamic run-time engine.



Leveraging the advances in machine learning (integer linear programming), compiler technology, and chip architecture, automate the design, implementation, and synthesis tasks on-the-fly in real time with our software stack—something that an ASIC designer achieves in months for custom chips.



Deliver the power of custom chips for custom applications—without the traditional time-to-market obsolescence, and the expensive production costs associated with ASICs.