### 



## **Microelectronics for next generation of HEP instrumentation**

Farah Fahim Engineering retreat 24 Feb 2021

### **Microelectronics Growth**

Traditionally based on Moore's law

Technology scales 2× every 18 months – sustained by transistor scaling





# More than Moore?

- World is inherently analog (mandates an analog interface)
- System requires various functions (beyond just digital)
- Multi technology platforms



### Why do we need more than Moore



Source: CEA- LETI



### **Microelectronics enabling next generation HEP instrumentation**

- Novel devices
  - Skipper CCD-in-CMOS
- Deep Cryogenic electronics
  - Quantum Communication & Computing
- Hybrid integration
  - Electronic Photonic Integration
  - 3D integration
- Hardware Software codesign to enable edge compute
  - On-chip machine learning





# Novel Devices

Farah Fahim | Microelectronics for next generation HEP instrumentation

## **Future CCD technologies**

- Fermilab has been pioneering Skipper CCD technologies Averaging multiple samples for ultra-low noise performance (~ 1000 averages for << 1e- noise) – Juan Estrada</li>
- 1<sup>st</sup> Step Enable parallel readout with low-cost multiple channels. Translate PCB design to Readout Integrated Circuit (ROIC) with lower noise performance (1/3<sup>rd</sup> CCD noise) and ~8mW power per channel.
- 4"  $\times$  4" board to  $\sim$  4  $\times$  4 mm<sup>2</sup>

T. England, F. Alcalde Bessia, H. Sun, L. Stefana (SCD)

- Midna ASIC

   VREF

   VREF

   VREF

   VREF

   VREF
- Cost reduction of 100 ×

# **Future CCD technologies**

- 2<sup>nd</sup> Method Increase readout speed without increasing noise. SiSeRO approach (Collaboration with MIT Lincoln Lab)
- 10 × speed improvement
- Additional advantages readout is DC coupled at low operating voltages (removes AC coupling capacitors and further increases system integration and reduces footprint)







### M. SofoHaro, A. Birman (TJ)

5≥ Fermilad

# **Future CCD technologies**

- Ultimate approach skipper-in-CMOS
- Utilize a commercial CMOS Image Sensor process for lower noise performance (Collaboration with SLAC and Tower Semiconductor)
- Noise of a pinned photodiode (0.7e-)
- With 10 averages we hope to achieve ~0.2e- noise
- Allows hybrid pixel sensor with fullyparallel (per pixel) readout achieving the ultimate goal of 1kfps readout over large areas ( 6 cm<sup>2</sup>) ~ 2.5 Mpixels per chip



# Deep Cryoelectronics for Quantum Computing

Farah Fahim | Microelectronics for next generation HEP instrumentation

# Quantum communication: Superconducting Nanowire SinglePhoton detectorsParameterGoal by 2025SOA 201

SNSPD best performance – (operating at 1 - 4K) Time-correlated single photon counting from the deep UV to the mid-infrared Extremely low dark counts and very high precision QUANTUM INTERNET - High bandwidth communication

### $1^{st}\,Step-Low$ Noise amplification in collaboration with Georgia Tech and JPL

| Parameter        | Goal by 2025                 | SOA 2019                     |   |
|------------------|------------------------------|------------------------------|---|
| Efficiency       | >80% @ 10 µm                 | 98% @ 1550 nm                | 9 |
| Dark Counts      | < 1e-6 cps / mm <sup>2</sup> | < 1e-4 cps / mm <sup>2</sup> | < |
| Energy Threshold | 12.5 meV (100 µm)            | 0.125 eV (10 µm)             | 0 |
| Timing Jitter    | < 1 ps                       | 2.7 ps                       | 1 |
| Active Area      | 100 cm <sup>2</sup>          | 0.92 mm <sup>2</sup>         | 0 |
| Max Count Rate   | 100 Gcps                     | 1.2 Gcps                     | 0 |
| Pixel Count      | 1.6e7 (4096x4096)            | 1024 (32x32)                 | 6 |





11 2/24/21 Farah Fahim I Microelectronics for next generation HEP instrumentation

# Beyond NISC era QC utilizing cryo-electronics

- Collaboration with industry (Microsoft – High speed ADC)
- Achieving high-speed and high-resolution are often conflicting goals
- Key transistor behavior such as low noise performance improves at cryogenic temperature
- Why National Lab? [Cryogenic electronics for DUNE – full cycle from modelling to testing; very similar approach for radiation environment]
- Modelling is key (Collaboration with EPFL/ TUDELFT)

### Architecture Suitable for 72 Qubit Computer



### Scale by Integrating Control Electronics





## **Benefits of Cryogenics for Trapped-Ion QIP**

J.Chiaverini, MIT LL



- Greatly reduced electric-field noise
  - This noise is a limiting factor in error in trappedion 2QGs in small traps
  - Measured to be much larger than JN ("anomalous")
  - Source unknown
  - Empirically, 2 orders of magnitude lower at ~5K when compared to room temp.
  - This is true when technical noise is under control



**MIT-LL** measurements

Presentation Name - 13 Author Initials MM/DD/YY



J.Chiaverini, MIT LL



Pino et al., arXiv:2003.01293 (2020) [HQS]



JC et al., Quant Inf. Comp. 5, 419 (2005) [NIST]

- Controlled ion motion through variation of electrode potentials
  - Each electrode segment requires a dedicated voltage for ion array control
  - For large arrays, electrical interconnects will become a limit
- Multiplexing can reduce wiring overhead, but at a cost of speed
- On chip analog voltage production can directly address this issue
  - Further level of on-chip control: microprocessor to implement timedependent voltage updates
    - Standard motion subroutines, calibration, etc.

# **Cryo-electronics control for Ion-Traps (QSC - ORNL)**

### Design challenges:

- Low output noise: < 100nV/sqrt(Hz) around a wide frequency range (0.5 - 5 MHz) and at low frequency.
- Low power: < 5 mW/DAC (limited by the cooling power of the cryostat) while driving a wide rage of load capacitance (70 1800 pF) of +/- 10 V full scale at 10 MHz waveform updating rate.</li>
- High resolution: 14-16 bit for precise control and not disturbing RF electrodes.
- Memory: 100 electrodes \* 14 bit \* 5000 points ~ 2.5 MB





# Hybrid Integration

Farah Fahim | Microelectronics for next generation HEP instrumentation

# **Atomic Clock: Joint DOD – DOE project**

Electrode

Layer

Portable optical atomic clock with frequency instability of 10<sup>-16</sup> over 10,000 sec

DOD – Atomic Photonic Integration E – Electronic Photonic Integration

# Create a determined loop compact system



H. Sun, S. Li

# Hardware-Software codesign: Al

Farah Fahim | Microelectronics for next generation HEP instrumentation

# Why do we need data processing on the edge

### POWER: CV<sup>2</sup>f x (data volume) problem

- Total power consumption to move data from pixel to periphery: 1 pJ/bit (~ 5mm distance)
- Total power consumption to move data off-chip: > 0.1 nJ/bit

### Minimize C,V

- 3D Integration (high density, low capacitance interconnect)
- Low voltage signaling

### **Reduce data**

 Typically just zero-suppression for on-detector sparce data

### HL LHC:

Higher granularity, higher occupancy, higher precision

=> needs NEW APPROACH

|                                                                 |                |         | -               |                 |     |               | _ |        |                 |                 |  |
|-----------------------------------------------------------------|----------------|---------|-----------------|-----------------|-----|---------------|---|--------|-----------------|-----------------|--|
| Operation:                                                      | Energy<br>(pJ) | Relativ | e Ene           | rgy C           | ost | Area<br>(µm²) | R | elativ | e Are           | a Cost          |  |
| 8b Add                                                          | 0.03           | 1       |                 |                 |     | 36            | 1 |        |                 |                 |  |
| 16b Add                                                         | 0.05           |         |                 |                 |     | 67            |   |        |                 |                 |  |
| 32b Add                                                         | 0.1            |         |                 |                 |     | 137           |   |        |                 |                 |  |
| 16b FP Add                                                      | 0.4            |         |                 |                 |     | 1360          |   |        |                 |                 |  |
| 32b FP Add                                                      | 0.9            |         |                 |                 |     | 4184          |   |        |                 |                 |  |
| 8b Mult                                                         | 0.2            |         |                 |                 |     | 282           |   |        |                 |                 |  |
| 32b Mult                                                        | 3.1            |         |                 |                 |     | 3495          |   |        |                 |                 |  |
| 16b FP Mult                                                     | 1.1            |         |                 |                 |     | 1640          |   |        |                 |                 |  |
| 32b FP Mult                                                     | 3.7            |         |                 |                 |     | 7700          |   |        |                 |                 |  |
| 32b SRAM Read (8KB)                                             | 5              |         |                 |                 |     | N/A           | 1 |        |                 |                 |  |
| 32b DRAM Read                                                   | 640            |         |                 |                 |     | N/A           | 1 |        |                 |                 |  |
|                                                                 |                | 1 10    | 10 <sup>2</sup> | 10 <sup>3</sup> | 104 |               | 1 | 10     | 10 <sup>2</sup> | 10 <sup>3</sup> |  |
| Memory access is orders of magnitude higher energy than compute |                |         |                 |                 |     |               |   |        |                 |                 |  |
| Viviance Sze (w@eems_mit) [Horowitz, ISSCC 2014]                |                |         |                 |                 |     |               |   |        | 1917            |                 |  |

Power Dominated by Data Movement

# **Deep Neural Network: Autoencoder for data-compression**



- Enable edge compute : Data compression for efficient usage of power and bandwidth
- Programmable and Reconfigurable: ability to reprogram weights to adjust for detector conditions and eventually lead to self-learning intelligent detectors
- Hardware Software codesign : Algorithm driven architectural approach
- Optimized : Low power and Low latency
- Operating in extreme radiation environment: 200 M rad
- Autoencoder for data compression is the first use case towards a DNN based on-chip learning and inference<sup>1</sup>

## **Tool-kit development and Operation in rad-hard environment**

- Integration of HLS generated and expert RTL
- Design code agnostic approach for implementation of various triplication methods



# HL LHC High Granularity Calorimeter\*: Data flow

### **CNN:** Encodes information by correlating spatial features

- **conv2D layer** extract spatially corelated geometric features
- U. Columbia: G. Di Guglielmo **Flatten layer** – Vectorizes the 2D image from the conv2D layer [8 x 4 x 4 = 128 x 1] NU: M.B. Valentin, S. Memik
- **Dense layer** aggregates the various features to provide higher order information ٠
- **ReLU** an activation function which introduces non-linearity by applying thresholds (part of both the ٠ conv2D and dense layers)



J. Hirschauer

SCD: N. Tran, C. Herwig,

# Towards heterogenous system on-chip DOMAIN SPECIFIC COMPILER Resource Tuning



**OPTIMIZATION REQUIREMENTS** 

- Analog Mixed-Signal Kernels
- Neuromorphic computing (event driven)

OPTO-ELECTRONIC COMPUTE

- In-memory compute (**non-Von Neumann approaches**) new materials
- Electronic-Photonic conversion
- Hybrid integration



# **Vector Matrix Multiplication for Neural Networks**

### Vector-by-Matrix Multiplication ...



### **UC Santa Barbara's Metal-Oxide Memristors**

### 64 × 64 passive crossbar circuit



#### H. Kim et al. arXiv 2019

Background work: M. Prezioso et al., Nature 521, 61 2015, M. Prezioso et al. IEDM'15 p. 17.4.1, 2015, F. Merrikh Bayat et al. Nature Comm., 2018

Typical I-V characteristics



### Details:

- Al<sub>2</sub>O<sub>3</sub>/TiO<sub>2-x</sub> active bilayer by reactive sputtering
- CMOS-compatible CMP/dry etching process and TiN/Al electrodes for higher conductance
- ~250 nm wide lines
- The largest functional analog-grade passive memristor crossbar circuit supported by proper statistics



**D. Strukov, UCSB** 

2/24/21  $U_{r} \neq 0$  rah Fahim Microelectronics for next generation HEP instrumentation

# **In-Pixel Al**



Analog – Mixed Signal implementation using floating gates or memristive cross-bar arrays



# Thankyou

Farah Fahim | Microelectronics for next generation HEP instrumentation