The size and complexity of recent deep learning models continue to increase exponentially, causing a serious amount of hardware overheads for training those models. Contrary to inference-only hardware, neural network training is very sensitive to computation errors; hence, training processors must support high-precision computation to avoid a large performance drop, severely limiting their processing efficiency. This talk will introduce a comprehensive design approach to arrive at an optimal training processor design. More specifically, the talk will discuss how we should make important design decisions for training processors in more depth, including i) hardware-friendly training algorithms, ii) optimal data formats, and iii) processor architecture for high precision and utilization.
|