

## Nhan Tran

+ Javier Duarte, Lindsey Gray, Sergo Jindariani, Kevin Pedro, Bill Pellico, Gabe Perdue, Ryan Rivera, Brian Schupbach, Kiyomi Seiya, Jason St. John, Mike Wang,...

May 10, 2019

















mostly for computing group https://arxiv.org/abs/1904.08986





1 ms



1 s

At > ~1ms (network switching latencies), this hits the domain of CPU/GPU and you're better off going to industry tools.

But...

- no time for CPU
- heavy calculation
- high throughput

Custom real-time detector Al applications are for you!



## HIGH RATE AND INTELLIGENT EDGE

## Traditionally, FPGAs programmed with low-level languages like Verilog and VHDL

## **High level synthesis (HLS)**

New languages C-level programming with specialized preprocessor directives which synthesizes optimized firmware; Drastically reduces development times for firmware



## NEURAL NETWORKS AND LINEAR ALGEBRA



 $\overrightarrow{O}_{j} = \Phi(1; \times W_{ij} + b_{j})$ Nm  $N_M$ M hidden layers ........ output layer

layer *m* 

## **PROJECT OVERVIEW**

## Quantization, Compression, Parallelization made easy with hls4ml! Keras TensorFlow **PyTorch Co-processing kernel** hls 4 $\bullet \bullet \bullet$ model

HLS

conversion



## **Results and outlook:** 4000 parameter network inferred in < 100 ns with 30% of FPGA resources! Muon pT reconstruction with NN reduces rate by 80% Larger networks and different architectures actively developed (CNN, RNN, Graph)

HLS

project

tune configuration

precision

reuse/pipeline









# \_DRD:

# the Booster complex

will be a first for accelerators and critical for future machines A first proof-of-concept, could apply across the accelerator

Tuning the Gradient Magnet Power Supply (GMPS) system for

Add "reinforcement learning" to improve accelerator operations



## FUTURISITIC IDEAS





## FLEXIBILITY











## Edge TPU

| 14 | GIE |
|----|-----|
|    |     |
|    |     |
|    |     |



## FRANKENSTEINS



## Xilinx Versal

| Scalar            |
|-------------------|
| A<br>Dua<br>Corte |
| A<br>Dua<br>Corte |
|                   |
| PCIe<br>CCIX      |



## PHOTONICS



## SUMMARY

Real-time AI brings processing power on-detector Improves losses in efficiency/performance for triggers - gains back physics

Other physics scenarios? A lot of efficiency loss from high bandwidth systems...

Want to demonstrate helps with automation and efficiency of system operation

Futuristic technologies could bring even more front end processing power Hardened vector DSPs, electronics and photonics

