



### FPGA-Based Architectures & DAQ Pathways for Distributed ML Systems

- M.A. Ibrahim , J.R. Berlioz
- EPICS Collaboration Meeting in April 2023
- April 26- 28, 2023

In partnership with:





# **Overview of AL/ML Activitites/ Projects**



https://indico.cern.ch/event/1133593/



- Machine learning for Linac RF Optimization Longitudinal optimization
- Booster Gradient Magnet Power Supply Control
- "Big Data" Booster Control
- Orbit Alignment at PIP2IT Using Bayesian Optimization
- □ AI/ ML for NuMI Target System Monitoring
- □ Real-time quench detection
- FAST/IOTA RF gun stabilization and optimization
- □ Loss minimization vs MI or RR situation
- Stabilization of 8 GeV slow extraction from Muon-C ring
- □ 6D Cooling optics design with ML elements

🚰 Fermilab

# **Beam Loss Monitors (BLMs)**









### **Real-Time Edge AI for Distributed System (READS) Network**



🛟 Fermilab



# **READS Distributed VME Reader Nodes**



### **‡** Fermilab

# **READS Central Deblending Node**



#### M.A. Ibrahim, J.R. Berlioz | EPICS Collaboration in April 2023

**‡** Fermilab

# SoC (System on Chip)

It combines a

- Dual-core ARM Cortex-A9 processor
- Field-Programmable Gate Array (FPGA) fabric on a single chip.



## SoM (System on Module)











### Data Transmission Considerations





#### Through ARM:

- Rapid prototyping
- More flexibility
- C, C++, Rust, etc.
- TCP/IP Stack readily available
- Longer Latency.
- Load images over the network and SSH.

### Through FPGA:

- Needs place and route
- Less flexibility
- VHDL/Verilog
- No IP Stack
- Shorter Latency (Hardware Response).
- Loading image through JTAG is possible.





### **FPGA/HPS** Data Bridges

| 🖃 💶 hps                               | Hard Processor System Intel Arria 10 FPGA IP |
|---------------------------------------|----------------------------------------------|
| f2h_cold_reset_req                    | Reset Input                                  |
| f2h_warm_reset_req                    | Reset Input                                  |
| <ul> <li>emif</li> </ul>              | Conduit                                      |
| hps_io                                | Conduit                                      |
| h2f_reset                             | Reset Output                                 |
| <ul> <li>h2f_axi_clock</li> </ul>     | Clock Input                                  |
| h2f_axi_reset                         | Reset Input                                  |
| <ul> <li>h2f_axi_master</li> </ul>    | AXI Master                                   |
| h2f_lw_axi_clock                      | Clock Input                                  |
| h2f_lw_axi_reset                      | Reset Input                                  |
| <ul> <li>h2f_lw_axi_master</li> </ul> | AXI Master                                   |
| f2h_axi_clock                         | Clock Input                                  |
| f2h_axi_reset                         | Reset Input                                  |
| f2h_axi_slave                         | AXI Slave                                    |
| f2h_irq0                              | Interrupt Receiver                           |
| f2h_irq1                              | Interrupt Receiver                           |



#### Table 2-2: Common Address Space Regions

| Region Name             | Base Address | Size   |  |  |
|-------------------------|--------------|--------|--|--|
| FPGA slaves             | 0xC000000    | 960 MB |  |  |
| Peripheral              | 0xFC000000   | 64 MB  |  |  |
| Lightweight FPGA slaves | 0xFF200000   | 2 MB   |  |  |

Source: Cyclone V Technical Manual

**Exercise**: Read 4 KB (1024 32-bit integers) of data through the h2f AXI bridge and calculate the time it takes to read the data.

HPS program can access HPS onchip memory using memcpy. 10,000 readouts were obtained.

- **1. MIN**: 85.02 us
- **2. MEDIAN**: 85.061 us (~45.8MB/s)
- **3. MAX**: 220.772 us

Compare to period of FPGA with 100Mhz Clock (10 ns) Writing one integer in each clock cycle would be 10ns\*1024 = 10.24us.



## **FPGA Notifications (through PIOs)**

#### Table 2-2: Common Address Space Regions

| Region Name             | Base Address | Size   |  |  |
|-------------------------|--------------|--------|--|--|
| FPGA slaves             | 0xC000000    | 960 MB |  |  |
| Peripheral              | 0xFC000000   | 64 MB  |  |  |
| Lightweight FPGA slaves | 0xFF200000   | 2 MB   |  |  |

Source: Cyclone V Technical Manual

| Register Map for the PIO Core |                           |                |         |                                   |                                                                                                                                 |   |   |   |  |
|-------------------------------|---------------------------|----------------|---------|-----------------------------------|---------------------------------------------------------------------------------------------------------------------------------|---|---|---|--|
| Offset                        | Register Name             |                | R/W     | (n-1)                             |                                                                                                                                 | 2 | 1 | 0 |  |
| 0 data                        | data                      | read<br>access | R       | Data val                          | Data value currently on PIO inputs                                                                                              |   |   |   |  |
|                               | write<br>access           | W              | New val | New value to drive on PIO outputs |                                                                                                                                 |   |   |   |  |
| 1                             | direction (1)             |                | R/W     | port. A v                         | Individual direction control for each I/O<br>port. A value of 0 sets the direction to<br>input; 1 sets the direction to output. |   |   |   |  |
| 2                             | interrupt mask <u>(1)</u> |                | R/W     | Setting a                         | IRQ enable/disable for each input port.<br>Setting a bit to 1 enables interrupts for the<br>corresponding port.                 |   |   |   |  |
| 3                             | edge capture ( <u>1)</u>  |                | R/W     | Edge de                           | detection for each input port.                                                                                                  |   |   |   |  |
| 4                             | outset                    |                | W       |                                   | es which bit of the output port to se<br><b>bhysical register</b>                                                               |   |   |   |  |
| 5                             | outclear                  |                | W       |                                   | pecifies which output bit to clear.<br>ot a physical register.                                                                  |   |   |   |  |





Polling API



### **READS Communications**



### To EPICS/ACNET



The State Machine in the central node manages the incoming DDCP stream and presents it to the HLS model.

The output of the HLS4ML model is presented with the original inputs timestamped to an EPICS API that loads it into ACNET.



### **READS** Publications

- Accelerator Real-time Edge AI for Distributed Systems (READS) Proposal (March 2020) https://arxiv.org/abs/2103.03928
- Real-Time Edge AI for Distributed Systems (READS): Progress on Beam Loss De-Blending for the Fermilab Main Injector and Recycler (August 2021) <u>https://jacow.org/ipac2021/papers/mopab288.pdf</u>
- Optimizing Mu2e Spill Regulation System Algorithms (August 2021) <u>https://jacow.org/ipac2021/papers/THPAB243.pdf</u>
- Synchronous High-Frequency Distributed Readout for Edge Processing at the Fermilab Main Injector and Recycler (August 2022) <u>https://napac2022.vrws.de/papers/mopa15.pdf</u>
- Semantic Regression for Disentangling Beam Losses in the Fermilab Main Injector and Recycler (August 2022) <u>https://napac2022.vrws.de/papers/mopa28.pdf</u>
- Machine Learning for Slow Spill Regulation in the Fermilab Delivery Ring for Mu2e (August 2022) <u>https://napac2022.vrws.de/papers/mopa75.pdf</u>



