Prof.
Norman Christ
(Columbia University)
23/07/2018, 14:00
Algorithms and Machines
The analysis of the Hybrid Monte Carlo (HMC) algorithm developed by Luscher and Schaefer is generalized to include Fourier acceleration. We show for the $\phi^4$ theory examined by Luscher and Schaefer that Fourier acceleration removes the non-renormalizable, singular behavior which they discovered and likely defines a renormalizable theory for the five-dimensional correlation functions in...
Mr
Yidi Zhao
(Columbia University)
23/07/2018, 14:20
Algorithms and Machines
In hybrid Monte Carlo evolution, by imposing a physical gauge condition, simple Fourier acceleration can be used to generate conjugate momenta and potentially reduce critical slowing down. This modified gauge evolution algorithm does not change the gauge-independent properties of the resulting gauge field configurations. We describe this algorithm and present results from our first...
Xiaoyong Jin
(ANL)
23/07/2018, 14:40
Algorithms and Machines
We present a modification of the hybrid Monte Carlo algorithm for
tackling the critical slowing down of generating Markov chains of
lattice gauge configurations towards the continuum limit. We propose
a new method to exchange information between an ensemble of Markov
Chains, and use it to construct an approximate inverse Hessian
matrix of the action inspired from Quasi-Newton algorithms...
Dr
Alessandro Nada
(DESY Zeuthen)
23/07/2018, 15:00
Algorithms and Machines
The computation of hadronic correlation functions in lattice QCD is severely hindered by a signal-to-noise ratio that exponentially decreases with the distance between source and sink.
Recent developments for the factorization of both the fermion propagator and the fermion determinant pave the way for the implementation of multilevel Monte Carlo integration techniques, which are already known...
Dr
Tim Harris
(Milano Bicocca)
23/07/2018, 15:20
Algorithms and Machines
We combine multi-level integration with a variance-reduction technique for the stochastic
estimate of disconnected diagrams of various bilinear operators, and present preliminary
numerical results with O($a$)-improved Wilson fermions.
Jiqun Tu
(Columbia)
23/07/2018, 16:10
Algorithms and Machines
We show that using the multisplitting algorithm as a preconditioner for the conjugate gradient inversion of domain wall fermion Dirac operators effectively reduces the inter-node communication cost, at the expense of performing more on-node floating point operations. Compared to Schwarz domain decomposition solver algorithms our approach enforces Dirichlet boundary conditions consistently on...
Alexei Strelchenko
(FNAL)
23/07/2018, 16:30
Algorithms and Machines
The global all-to-all communications in the Krylov subspace iterative methods is
one of the major performance-limiting factors on large-scale parallel machines.
In this report we give a brief overview of recent algorithmic approaches
to mitigate communication cost in the iterative solvers. We present several variants
of communication-optimized fermion matrix inverters implemented in the...
Mr
Daniel Richtmann
(University of Regensburg)
23/07/2018, 16:50
Algorithms and Machines
With the ever-growing number of computing architectures, performance portability is an important aspect of (Lattice QCD) software.
The Grid library provides a good framework for writing such code, as it thoroughly separates hardware-specific code from algorithmic functionality and already supports many modern architectures.
The Regensburg group (RQCD) decided to deprecate its Xeon Phi...
Dr
Stephan DURR
(University of Wuppertal)
23/07/2018, 17:10
Algorithms and Machines
A simple minded approach to implement three discretizations of the Dirac operator (Brillouin, Wilson, staggered) on two architectures (KNL and core_i7) is presented. The idea is to use a high-level compiler along with OpenMP parallelization and SIMD pragmas, but to stay away from cache-line optimization and/or assembly-tuning. The implementation is for Nv right-hand-sides, and this extra index...
Yuzhi Liu
24/07/2018, 16:10
Algorithms and Machines
We present recent developments on our lattice simulations of fully dynamical $SU(3)\times U(1)$. Including electromagnetic effects is critical for the next level of precision in phenomenology. Examples include calculating the (higher order) QED contributions to the hadronic-vacuum-polarization contribution to the muon anomalous magnetic moment and calculating the QED contributions to meson and...
Prof.
Phiala Shanahan
(Massachusetts Institute of Technology)
24/07/2018, 16:30
Algorithms and Machines
Critical slowing-down of HMC algorithms presents a significant challenge in achieving LQCD calculations at fine lattice spacings. A number of methods have been proposed that circumvent this issue by acting at multiple physical length scales, including perfect actions that aim to achieve almost-continuum physics at finite lattice spacings, and multi-scale thermalisation techniques. Such...
Dr
Patrick Dreher
(NC State University)
24/07/2018, 16:50
Algorithms and Machines
A traditional approach for constructing a gauge field theory on a lattice employs a basic Wilson type procedure with additional enhancements to this formulation in order to improve computational performance and accuracy. This type of lattice gauge formulation has been successfully implemented on many different high performance computing systems and has yielded useful computational results. ...
Dr
Kate Clark
(NVIDIA)
25/07/2018, 16:10
Algorithms and Machines
We report on recent work to integrate and optimize QUDA's adaptive multi-grid solver into Chroma RHMC Wilson-clover gauge evolution. Particular emphasis has been paid to optimization for the new Volta-powered Summit supercomputer. When combined with other recent improvements into Chroma's molecular dynamics implementation, in moving from Titan to Summit we achieve close to an aggregate 100x...
Mr
Ahmed Yousif
(Michigan State University)
25/07/2018, 16:30
Algorithms and Machines
We introduce an OpenCL library for computation of disconnected contributions for application with FPGAs and GPUs. We look at the advantages of FPGAs vs. traditional GPUs for stochastic estimation of disconnected contributions, as well as gains achieved with enhancements such as mixed precision and the truncated solver method. We also prospectively consider variance reduction algorithms and the...
Dr
Jarno Rantaharju
(Swansea Academy of Advanced Computing)
25/07/2018, 16:50
Algorithms and Machines
We publish an extension of openQCD-1.6 with AVX512 vector instructions using Intel intrinsics. Recent Intel processors support extended instruction sets with operations on 512-bit wide vectors, increasing both the capacity for simultaneous floating point operations and of register memory. Optimal use of the new capabilities requires a reorganisation of data and floating point operations into...
Prof.
Ting-Wai Chiu
(National Taiwan University)
25/07/2018, 17:10
Chiral Symmetry
Parallel
We perform hybrid Monte-Carlo simulation of $N_f=2+1+1$ lattice QCD
with domain-wall/overlap quarks at the physical point. The simulation is carried out on a $ 64^4 $ lattice with lattice spacing $a \sim 0.06$ fm, using the Nvidia DGX-1 (8 Volta GPUs interconnected by the NVLink). To attain the maximal chiral symmetry for a finite extent ($N_s =16$) in the fifth dimension, we use the optimal...