Computing Frontier: E1 Cosmic, T2 Astrophysics and Cosmology ----------------------------------------------------------------------------- • There is a continued growth in data from Cosmic Frontiers experiments (currently exceed 1 PB total, 50 PB total in 10 years, 400 PB per year in 10-20 years) • DES at end of lifetime: 1PB of images, 100 TB of catalogues (?) • LSST (2020-2030): 6 PB of images, 1 PB of catalogues • End of lifetime: 100 PB of images, 20 PB of catalogues • CMB • Murchison Wide-Field array (2013-): 15.8 GB/s processed to 400 MB/s • Square Kilometer Array (2020+): PB/s to correlators to synthesize images; 300-1500 PB per year storage • From technology: no flattening of data coming from instruments, will continue to grow in size • Computational resources will have to grow to match the associated data rates (data intensive not just compute) • Data preservation and archiving (including the development of data storage and archiving technologies) • Becomes a challenge • Grow databases to 10-20 PB? • Infrastructure for data analytics applicable to large and small scale experiments will need to grow over the next decade (with an emphasis on sustainable software) • Currently experiments develop and maintain their own software, but this creates problems => sharing of algorithms, approaches and codes seems to be the way forward • Question: role for sharing of frameworks and software: yes • Mostly middleware and map reduce type algorithms => but not only share algorithms but also the infrastructure to run them as well • Question: what about ownership of software and how does this work, for example with LSST => has to be handled on higher level outside of the individual experiments because of limited lifetime of experiments • Question: for planning: 1st: requirements, 2nd: anticipated technology developments, 3rd: how can these advances change the user requirements and how they use the systems: will be talked about later (?) • Catalogues will not be able to use commercial products (like Google, Amazon, Facebook) because of different requirements • Will need massively parallel databases, Richard Gerber comments to get in contact with NERSC people who are also doing research in parallel databases • Simulations (cosmological and instrument) will play a critical role in evaluating and interpreting the capabilities of current and planned experiments • sky surveys will produce synthetic catalogs, different catalogs for different purposes • need more memory: systems with 100 of PB in the next decade are ok • Adapt to ongoing and future architectural changes(different types of complex nodes, end of naive weak scaling as memory/ core reduces, communication bottlenecks, multi-level memory hierarchy, power restrictions, --). • Advent of new programming models -- how to rewrite large Supercomputers for simulation campaigns code bases? • need data intensive analysis and even interactive analysis facilities, worry about data archives and data bases • Development of powerful, easy-to-use remote analysis tools motivated by network bandwidth restictions on large-scale data motion (compute and data-intensive platforms will likely be co- located • Questions: size of community that does simulation? handful teams of in US and Europe that have capability to run these simulations • Questions: is there something similar like in lattice QCD an organized effort to share? is currently starting => needs to be done • Require new computational models for distributed computing (including many-core systems) • Also hadoop (map reduce), etc. • Career paths (including tenure stream) for researchers who work at the forefront of computation techniques and science and the training of the next generation of researchers are critical to data intensive cosmology Computing Frontier: E2 Energy --------------------------------------------- • Computing doesn’t drive the research program, but it does enable it. • Looking at the machine plans. They are all high luminosity machines with potentially very high trigger rates and complicated events • Comparison of Tevatron and LHC in there 3rd year: What it shows you is that new machines can lead to big jumps in some resources • Trigger rate, event size, and reconstruction time all rise by a factor of 10 • Collaborations increase by a factor of 3 • Resources and challenges increase at different rates (factor 100 in most cases) • The processing has increased by a factor of 30 in capacity • This is essentially what would be expected from a Moore’s law increase with a 2 year cycle • Says we spent similar amounts • Storage and networking have both increased by a factor of 100 • 10 times trigger and 10 times event size • LHC increases • LHC Computing adds about 25k processor cores a year • And 34PB of disk • it shows its currently increasing at a sustainable rate • A decade from now would be a factor of 4-5 in capacity • remark: technology changes are important, for example the stepwise increase in network capacity • remark: super high luminosity LHC: need to have a balance between computing capabilities and trigger rates (higher risk in missing a discovery because of tighter triggers) • remark: computing models cannot be based on single core performance, needs to be multi-threaded multi-core • LHC experiments take this into account, planning assumes 75% efficiency for the multi-core case • Cross roads in computing: • In one direction are clouds • Commercial clouds are still very expensive for resources we use a lot • More opportunistic and academic resources will move to cloud provisioning methods • Even sites we control will move to cloud provisioning tools because it simplifies the operations • We should expect our current service architecture will change to new provisioning tools • In the other direction are very specialized systems (High performance, low power, Massively multi-core, GPUs) • Hardware: We buy/get access to specialized gear like a super computer allocation • We need to spend a lot of effort to be able to use them all: we cannot restrict ourselves to one architecture • What about analysis: we don't think that analysts will do anything advanced: affects efficiency • data management • We will have a mix of local, cloud, opportunistic, and specialized resources and we will need a data management system that deals with all • On cloud the concept of data locality begins to lose a lot of meaning • Given the connectivity of our clusters and the expectations of the users, I believe we will have to evolve to content delivery networks • comment: R&D opportunity: the field has to play a role in making the cloud data intensive • strong requirements on networking: • Currently a 10k core cluster (typical for 2020) would require 10Gb/s networking for organized processing like reconstruction • Analysis would require 100Gb/s • technology comments: servers are more and more connected directly through 40 Gbps and soon 100 Gbps (already in the server) • also: "Becoming More Selective" -> "reduce the actively analyzed data" • We can afford a lot of data on tape, but the active dataset is much more expensive • reduce the number of events while we improve our understanding, but put them on tape • New concept of essentially two trigger levels => needs to be exposed to physics side during snowmass (LHCb is already doing this (stripping)) • comments: • data management worries: uses a lot of manpower! • bug improvements, but lots of improvement opportunities • data content delivery network is a transformational thing: need to plan carefully, learn from industry Computing Frontier: E3 Intensity ----------------------------------------------- • In this exercise we targeted the computing model for: charged lepton processes, neutrinos and baryon number violation. • The impact of the US contribution to the physics results of these experiments is strongly correlated to the availability of computing resources and the efficiency of the computing model adopted • Many experiments which can potentially lead to fragmentation of efforts • Broad range of experiments which leads to broad range of needs. • Broad range of experiments inevitably leads to broad range of frameworks • The Fermilab-based IF experiments (from g-2, NOvA to LAr experiments including MicroBooNE and LBNE) have converged on ART as a framework for job control, I/O operations, and tracking of data provenance. • Experiments outside of Fermilab (or before ART) use LHC derived frameworks such as Gaudi or homegrown frameworks like MINOS(+), IceTray and RAT. • ROOT and Geant4 are the bread and butter of all HEP experiments. They are critical to all experiments in IF. Support for these packages is essential. • Geant4 has traditionally focused on EF experimental support. More ties/ stronger support to IF experiments is a requirement. • As an example, Geant4 is barely suitable for large scintillation detectors; given a complex geometry and large number of photons to track. • Community desires improved efficiency for both of these packages. For example better ROOT I/O and Geant parallelization. • Question: can you organize IF in total better to have more impact on GEant4: Yes, this is already happening • And more specialized software packages • Hardware demands of IF experiments and IF R&D modest compared to those of EF experiments. However the needs are NOT insignificant. • Efficient use of available grid resources has had/could have a huge impact on IF experiments and IF R&D • Dedicated storage resources are needed for internationally or university run IF experiments as well as IF R&D • Large difference between FNAL based experiments and not-FNAL-based experiments • Access to grid resources is all important and high on every experiment’s list • Comment: it is more an education problem, resources are out there and can be used, they are just not used efficiently • The use of Fermigrid and Open-Science Grid are essential to all experiments responding to the survey • Fermilab-based experiments indicated that all data is stored on site. • Professional support is required for methods to seamlessly use Fermilab and non-Fermilab resources through job submission protocols • We choose to highlight here three efforts that have interesting commonalities among the three frontiers and that have the potential to have a high impact: • Agencies have recently begun requiring open data policies • For us, there is no clear avenue for sharing multi-TB or PB data samples • There are no additional resources to support these data sets • IF: Chroma is an open source optical photon Monte Carlo, s roughly 200 times faster than the same simulation with Geant4 using GPU's • EF: Geant on HPCs (example: Atlas) • CF: Self-assembling data: multidisciplinary group at the CF proposes to design a fault-tolerant real- time association of information across their large-scale experiment containing distributed sensors by creating a self- assembling data paradigm Computing Frontier: T1 Accelerator Science ---------------------------------------------------------------- Some motivations: the science drivers are developing new techniques and technologies and designing accelerators based on new concepts. Want to maximize the performance of "conventional" techniques and technologies, e.g. optimize operational parameters, beam dynamics in 6D phase space. Desired outcome is higher gradients (shorter accelerators), minimize losses for IF applications. For EF, need to model infrastructure for higher gradients, optimize existing technologies, optimize/test new designs. For IF, understand/control instabilities, minimize/mitigate beam losses, optimize existing technologies, optimize/test designs. Requirements: met a few times, requested white papers (some of which are not in the official repository, will need to clean them up further). EF accelerators are LWFA (10 GeV/M stages for LC), PWFA (10-100 GeV/m), DLA (on a chip, few GeV/m), muon colliders, electron colliders. All require multi-scale modeling, design R&D. IF accelerators are proton linear and circular machines. Some specific examples. High-intensity proton drivers have to deal with a wide range of scales -- an accelerator complex of 10^3 me down to particles bunches of 10^-3 m. Need to model intensity-dependent beam effects through all sorts of elements and many revolutions. Need to model 10^9 particles going through structures many times, interacting with each other and structures at every steps. Roser: do you really have to do the whole machine, or just a section? The former, this is a resonant effect that develops over many passes. Or, LWFA multi-scale physics. Use a plasma structure to maintain a wake field due to laser driver. 10^9 grid cells, 10^10 particles, 10^7 iterations, 1-10 TB of memory. This requires petascale computing. Summary of requirements: IF needs beam loss characterization and control, control-room feedback (fast turnaround of simulations so that operators can study things in real time!), direct comparison between beam diagnostics-detectors and simulated data (needs new tools). EF needs beam stability characterization, ability to produce end to end designs, control room feedback, new physics model capabilities (e.g. radiation and scattering), better numerical models. In general, there is integrated, multi physics modeling requiring massive computing resources. Want common interfaces for geometry description and job submission. Findings: Need algorithms that can make use of massive computing resources. Want consolidation of interfaces and tools. Now going very quickly through slides! Users very much want analysis tools to make sense of the simulation results on their own; right now the computing people work closely with the users. Oli notes the CMS T0 gives feedback in an hour, automated processes. Can't a similar thing be done here? They don't have enough people. Liz: who are the users? Accelerator scientists. Simulations currently take too long to run. It's not the analysis that needs to be done in real time, it's the simulation. But the analyses can take a long time too. Need to evolve tools to new infrastructures. Simulated data volume is increasing, but not a driver in storage or networking, HEP experiment can lead the way there. Scaling to large number of cores currently works well. Trying to take advantage of emerging technology research (GPU, multicore) but requires a lot of changes to working tools. But this could provide solutions to integrated high-fidelity modeling. See slides!! Richard says requirements isn't just "how much" but also what sort of turnaround time is desired. Yes, they know. T4: Perturbative QCD ----------------------- PQCD is needed to interpret LHC data, the Higgs discovery needed it. Good interaction with the experimental community, which demands great accuracy. PQCD enters into all aspects of the scattering process. The community has produced many tools, with the Tevatron and the LHC motivating a burst of activity. But level of sophistication in computing has reached a point where they have to revisit their approaches, and reassess non-optimal use. Would be good to have a uniform environment that could be shared, including with experimenters, adequate computational means and resources. Short terms goals: provide best theoretical predictions, automate the process, facilitate progress of new ideas and techniques. Longer term, take advantage of large-scale computing facilities and work more closely with computing community to pioneer new ideas. See slides for the specific charges about current computing needs and infrastructures needed for future. What can they do with what they have, and what do they need? Start with NLO, well-established. (NNLO is cutting-edge frontier.) Conceptual and technical challenges met, new and old techniques are implemented in several process, can do stuff that couldn't be imagined even five years ago. One-loop connections, MC generators, interface to parton shower MC's. Most codes public, some not. Still working on user-friendliness. Spentz: what is it that you want the casual user to be able to do? Both pre-packaged codes and tools that can generate codes. Ideally, want to be able to calculate any process at one loop. See slides for CPU/storage needed for V+jets calculations. E.g. W+5 jets with Blackhat+Sherpa is 600K CPU hours, 1.5 TB output. This is enough for meaningful comparisons with data, and to give a scale-dependence error estimate. Oli: how long is each individual job? Finding in CMS that single jobs have to run for a week. This assumes parallelization. Will we be looking how we can run the generators? Yes. One could imagine an NLO repository for doing multiple runs. Beyond parton level, add exclusive event generators. See slides for table. Very demanding for high-multiplicity calculations. NNLO is the cutting edge. State of the art is 2 -> 2 with massive particles or 2 -> 1 fully differential. Still big challenges and still building tools. Since it's still evolving, having more computational power will help them try things out. pp to ttbar requires 1M CPU hours, but hard to estimate resources because they are process and method dependent. Being able to prototype more quickly will make a big difference for development. Studying the impact of parallelization through multi-threading, MPI, and local vs. distributed computing. Spentz: what is listed as "not scalable" sounds like perfect scaling? The issue is the number of cores that can share memory. On local vs. distributed, the question is how best to scale up. (local = something like NERSC here) MC simulations can be split into integration and event generation. The former is good for HPC. Could be big memory requirements. DOE gave them 10^6 hours at NERSC for case studies, there is a resulting white paper, plus several presentations and discussions. Online tutorial too! Several people have tested on the basis of that. Used NERSC, OLCF and ALCF, and OSG. Found that porting to Cray was easier, more like Linux environment. Implemented MPI into generator frameworks. In W+5, observed weak scaling to 8192 cores at NERSC and OLCF, needed more cores at ALCF due to lower clock speeds. MPI was more efficient than multithreading. Positive experience with MPI on OSG, limitation being number of cores accessible. Tested GPU's too. Execution time gains very promising, from parallel nature of underlying algorithm. Conclusions: first quantitative look at prototype NLO and NNLO calculations. Hard to estimate at this point what resources will be needed in five years; they are just getting started. Community needs to grow into this; a very useful exercise, thankful to DOE for suggesting it. And now there is a repository of codes at NERSC! Access to HPC will help calculational tools available with greater support in a well-tested coherent framework, and gives new resources for new calculations in development. But it won't substitute for local resources, and could still effectively use distributed systems like OSG. Daniel: Is the code fully ported to GPU, or is there some hybrid? Sounds like not totally clear.