Testbeds for Reproducible Research

Similar documents
Electric Vehicle Cyber Research

Formation Flying Experiments on the Orion-Emerald Mission. Introduction

What s cooking. Bernd Wiswedel KNIME.com AG. All Rights Reserved.

Finite Element Based, FPGA-Implemented Electric Machine Model for Hardware-in-the-Loop (HIL) Simulation

PRODUCT DESCRIPTIONS AND METRICS

Porting Applications to the Grid

Sinfonia: a new paradigm for building scalable distributed systems

Software for Data-Driven Battery Engineering. Battery Intelligence. AEC 2018 New York, NY. Eli Leland Co-Founder & Chief Product Officer 4/2/2018

What s Cooking. Bernd Wiswedel KNIME KNIME.com AG. All Rights Reserved.

Control System for a Diesel Generator and UPS

BLUECAT ENTERPRISE DNS

Towards Realizing Autonomous Driving Based on Distributed Decision Making for Complex Urban Environments

ITD Systems Core Partners Wave 04

2015 The MathWorks, Inc. 1

Virginia Tech Research Center Arlington, Virginia, USA. PPT slides will be available at

UKSM: Swift Memory Deduplication via Hierarchical and Adaptive Memory Region Distilling

DOE s Focus on Energy Efficient Mobility Systems

Institute for Cyber Security. Multi-Tenant Access Control for Collaborative Cloud Services

Virginia Tech Research Center Arlington, Virginia, USA

Institute for Cyber Security. Authorization and Trust in the Cloud

Global EV Charging Stations will rise by 2020

Harris Geospatial Solutions

Welcome to the waitless world. CBU for IBM i. Steve Finnes

Scaling Document Clustering in the Cloud. Robert Gillen Computer Science Research Cloud Futures 2011

JMS Performance Comparison Performance Comparison for Publish Subscribe Messaging

PRACE Virtual Prototyping at BMW Group.

Electric Vehicle Cyber Research

DOE s Focus on Energy Efficient Mobility Systems

IBM CMM Quick Reference Guide

Status & evolutions of Telemetry Services for Profiling Floats. Brice Robert Patrick Bradley (CLS America)

Smart Cities Transformed Using Semtech s LoRa Technology

Facilitated Discussion on the Future of the Power Grid

From Exascale Software to Internet of Things We are thinking to small & Can IOT learn from Exascale?

The Forecast: Global EV Charging Stations will rise by 2020

KNIME Software Pieces KNIME.com AG. All Rights Reserved. 1

FLEXIBILITY FOR THE HIGH-END DATA CENTER. Copyright 2013 EMC Corporation. All rights reserved.

License Model Schedule Actuate License Models for the Open Text End User License Agreement ( EULA ) effective as of November, 2015

Using cloud to develop and deploy advanced fault management strategies

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Setup of a multi-os platform based on the Xen hypervisor. An industral case study. Paolo Burgio

PRODUCT DESCRIPTIONS AND METRICS

ABB MEASUREMENT & ANALYTICS. Predictive Emission Monitoring Systems The new approach for monitoring emissions from industry

Architecture Design For Smart Grid

Open Source Big Data Management for Connected Vehicles

Integrated System Models Graph Trace Analysis Distributed Engineering Workstation

Fluidic Stochastic Modular Robotics: Revisiting the System Design

Model based development of Cruise Control for Mercedes-Benz Trucks

Facilitating Data Set Transfers for International Researchers and Showcasing a perfsonar-based Traceroute Monitoring Tool

Available online at ScienceDirect. Energy Procedia 36 (2013 )

The Session.. Rosaria Silipo Phil Winters KNIME KNIME.com AG. All Right Reserved.

Measurement made easy. Predictive Emission Monitoring Systems The new approach for monitoring emissions from industry

OStrich: Fair Scheduler for Burst Submissions of Parallel Jobs. Krzysztof Rzadca Institute of Informatics, University of Warsaw, Poland

Smart grids in European Union. Andrej GREBENC European Commission "Energy Awarness Seminar Villach

KNIME Server Workshop

Real-Time Simulation of A Modular Multilevel Converter Based Hybrid Energy Storage System

RIMRES: A project summary

Virginia Tech Research Center Arlington, Virginia, USA. PPT slides will be available at

Use of the ERD for administrative monitoring of Theta:

Optimal Vehicle to Grid Regulation Service Scheduling

Practical Resource Management in Power-Constrained, High Performance Computing

Developing an adaptable and flexible electric vehicle charging station

What s Cooking. Bernd Wiswedel KNIME KNIME AG. All Rights Reserved.

PRODUCT DESCRIPTIONS AND METRICS

PRODUCT PORTFOLIO. Electric Vehicle Infrastructure ABB Ability Connected Services

Microgrid solutions Delivering resilient power anywhere at any time

Innovative System on Chip solutions enabling Smart grid deployments

Scaling industrial control technologies for food & beverage industry

European Conference on Nanoelectronics and Embedded Systems for Electric Mobility. Internet of Energy Ecosystems Solutions

WP5 - Computational Mechanics B5 - Temporary Vertical Concrete Safety Barrier MAIN REPORT Volume 1 of 1

Frequently Asked Questions Trico Proposed Net Metering Tariff Modifications

Self-Driving Vehicles and Transportation Markets

Multi-level Feeder Queue Dispatch based Electric Vehicle Charging Model and its Implementation of Cloud-computing

Holistic Range Prediction for Electric Vehicles

ATLAS PUBLIC POLICY WASHINGTON, DC USA PUBLISHED MAY 2017 VERSION 2.0

ABB June 19, Slide 1

PART II. Experimentation FACILITIES Best Practices and Flagship Projects

Experience the Hybrid Drive

MAX PLATFORM FOR AUTONOMOUS BEHAVIORS

DESY and NAF. Andreas Gellrich, DESY. 8th Belle II Computing Workshop May 2013, Leinsweiler, Germany

THE EROAD ELD SOLUTION

generate + manage + store + share

Ampl2m. Kamil Herman Author of Ampl2m conversion tool. Who are you looking at

What s new. Bernd Wiswedel KNIME.com AG. All Rights Reserved.

In-Place Associative Computing:

ZEPHYR FAQ. Table of Contents

Virtual Flow Bench Test of a Two Stroke Engine

Survey Report Informatica PowerCenter Express. Right-Sized Data Integration for the Smaller Project

White Paper: Pervasive Power: Integrated Energy Storage for POL Delivery

ISO INTERNATIONAL STANDARD

Energy and Mobility Transition in Metropolitan Areas

Ibergrid Transition to EGI

European Conference on Nanoelectronics and Embedded Systems for Electric Mobility. EV recharging ecosystem and services for a sustainable e_mobility

Certification Memorandum. Approved Model List Changes

Performance Analysis with Vampir

GRID Structure Based Processing of Geographical and Environment Data

Pre Commercial Procurement Austrian Pilot Calls

Formal Methods will not Prevent Self-Driving Cars from Having Accidents

The future grid. Engineering Dreams

Hybrid Power System Power Flow Analysis

Smart Grid Automation and Centralized FISR

Transcription:

Testbeds for Reproducible Research Lucas Nussbaum lucas.nussbaum@loria.fr Lucas Nussbaum Testbeds for reproducible research 1 / 26

Outline 1 Presentation of Grid 5000 2 A look at two recent testbeds: CloudLab Chameleon Lucas Nussbaum Testbeds for reproducible research 2 / 26

The Grid 5000 testbed Lille World-leading testbed for HPC & Cloud 10 sites, 1200 nodes, 7900 cores Rennes Luxembourg Reims Nancy Dedicated 10-Gbps backbone network 550 users and 100 publications per year Bordeaux Lyon Grenoble Toulouse Sophia Lucas Nussbaum Testbeds for reproducible research 3 / 26

The Grid 5000 testbed Lille World-leading testbed for HPC & Cloud 10 sites, 1200 nodes, 7900 cores Rennes Luxembourg Reims Nancy Dedicated 10-Gbps backbone network 550 users and 100 publications per year Bordeaux Lyon Grenoble Not a typical grid / cluster / Cloud: Toulouse Used by CS researchers for HPC / Clouds / Big Data research No users from computational sciences Design goals: Large-scale, shared infrastructure Support high-quality, reproducible research on distributed computing Sophia Lucas Nussbaum Testbeds for reproducible research 3 / 26

Outline 1 Description and verification of the environment 2 Resources selection and reservation 3 Reconfiguring the testbed to meet experimental needs 4 Monitoring experiments, extracting and analyzing data Lucas Nussbaum Testbeds for reproducible research 4 / 26

Description and verification of the environment Typical needs: How can I find suitable resources for my experiment? How sure can I be that the actual resources will match their description? What was the hard drive on the nodes I used six months ago? Lucas Nussbaum Testbeds for reproducible research 5 / 26

Description and verification of the environment Typical needs: How can I find suitable resources for my experiment? How sure can I be that the actual resources will match their description? What was the hard drive on the nodes I used six months ago? Selection and reservation of resources (OAR) OAR properties Description of resources (Reference API) nodes description Verification of resources (g5k-checks) OAR commands and API requests API requests Users High-level tools Lucas Nussbaum Testbeds for reproducible research 5 / 26

Description of resources Describing resources understand results Detailed description on the Grid 5000 wiki Machine-parsable format (JSON) Archived (State of testbed 6 months ago?) Lucas Nussbaum Testbeds for reproducible research 6 / 26

Verification of resources Inaccuracies in resources descriptions dramatic consequences: Mislead researchers into making false assumptions Generate wrong results retracted publications! Happen frequently: maintenance, broken hardware (e.g. RAM) Lucas Nussbaum Testbeds for reproducible research 7 / 26

Verification of resources Inaccuracies in resources descriptions dramatic consequences: Mislead researchers into making false assumptions Generate wrong results retracted publications! Happen frequently: maintenance, broken hardware (e.g. RAM) Our solution: g5k-checks Runs at node boot (can also be run manually by users) Retrieves current description of node in Reference API Acquires information on node using OHAI, ethtool, etc. Compares with Reference API Lucas Nussbaum Testbeds for reproducible research 7 / 26

Outline 1 Description and verification of the environment 2 Resources selection and reservation 3 Reconfiguring the testbed to meet experimental needs 4 Monitoring experiments, extracting and analyzing data Lucas Nussbaum Testbeds for reproducible research 8 / 26

Resources selection and reservation Roots of Grid 5000 in the HPC community Obvious idea to use a HPC Resource Manager OAR (developed in the context of Grid 5000) http://oar.imag.fr/ Supports resources properties ( tags) Can be used to select resources (multi-criteria search) Generated from Reference API Supports advance reservation of resources In addition to typical HPC resource managers s batch mode Request resources at a specific time On Grid 5000: used for special policy: Large experiments during nights and week-ends Experiments preparation during day Lucas Nussbaum Testbeds for reproducible research 9 / 26

Using properties to reserve specific resources Reserving two nodes for two hours. Nodes must have a GPU and power monitoring: oarsub -p "wattmeter= YES and gpu= YES " -l nodes=2,walltime=2 -I Reserving one node on cluster a, and two nodes with a 10 Gbps network adapter on cluster b: oarsub -l "{cluster= a }/nodes=1+{cluster= b and eth10g= Y }/nodes=2,walltime=2" Advance reservation of 10 nodes on the same switch with support for Intel VT (virtualization): oarsub -l "{virtual= ivt }/switch=1/nodes=10,walltime=2" -r 2014-11-08 09:00:00 Lucas Nussbaum Testbeds for reproducible research 10 / 26

Visualization of usage Lucas Nussbaum Testbeds for reproducible research 11 / 26

Outline 1 Description and verification of the environment 2 Resources selection and reservation 3 Reconfiguring the testbed to meet experimental needs 4 Monitoring experiments, extracting and analyzing data Lucas Nussbaum Testbeds for reproducible research 12 / 26

Reconfiguring the testbed Typical needs: How can I install $SOFTWARE on my nodes? How can I add $PATCH to the kernel running on my nodes? Can I run a custom MPI to test my fault tolerance work? How can I experiment with that Cloud/Grid middleware? Can I get a stable (over time) software environment for my experiment? Lucas Nussbaum Testbeds for reproducible research 13 / 26

Reconfiguring the testbed Operating System reconfiguration with Kadeploy: Provides a Hardware-as-a-Service Cloud infrastructure Enable users to deploy their own software stack & get root access Scalable, efficient, reliable and flexible: 200 nodes deployed in ~5 minutes (120s with Kexec) Customize networking environment with KaVLAN Deploy intrusive middlewares (Grid, Cloud) Protect the testbed from experiments Avoid network pollution By reconfiguring VLANS almost no overhead Recent work: support several interfaces site A default VLAN routing between Grid 5000 sites SSH gw local, isolated VLAN only accessible through a SSH gateway connected to both networks global VLANs all nodes connected at level 2, no routing routed VLAN separate level 2 network, reachable through routing Lucas Nussbaum Testbeds for reproducible research 14 / 26 site B

Creating and sharing Kadeploy images Avoid manual customization: Easy to forget some changes Difficult to describe The full image must be provided Cannot really reserve as a basis for future experiments (similar to binary vs source code) Kameleon: Reproducible generation of software appliances Using recipes (high-level description) Persistent cache to allow re-generation without external resources (Linux distribution mirror) self-contained archive Supports Kadeploy images, LXC, Docker, VirtualBox, qemu, etc. http://kameleon.imag.fr/ Lucas Nussbaum Testbeds for reproducible research 15 / 26

Changing experimental conditions I Reconfigure experimental conditions with Distem Introduce heterogeneity in an homogeneous cluster Emulate complex network topologies 1 2 CPU cores 3 4 5 6 7 CPU performance 0 n1 if0 1 1 M Mb ps bp,3 s, 0m 30 s ms 5 Mbps, 10ms n2 VN 1 VN 2 VN 3 s 3m s, bp s 0 M, 1m s 10 Mbp 00 if0 1 10 Mbps, 5ms if0 n3 if1 4 Mbps, 12ms 6 Mbps, 16ms 0 ms if 00, 2 s ps m Kb 00 10,1 bps K 20 20 0 51 2 K Kb ps bp,3 s, 0m 40 s ms if0 n4 n5 Virtual node 4 http://distem.gforge.inria.fr/ Lucas Nussbaum Testbeds for reproducible research 16 / 26

Outline 1 Description and verification of the environment 2 Resources selection and reservation 3 Reconfiguring the testbed to meet experimental needs 4 Monitoring experiments, extracting and analyzing data Lucas Nussbaum Testbeds for reproducible research 17 / 26

Monitoring experiments Goal: enable users to understand what happens during their experiment Power consumption CPU memory disk Network backbone Internal networks Lucas Nussbaum Testbeds for reproducible research 18 / 26

Kwapi: a new framework to monitor experiments I Initially designed as a power consumption measurement framework for OpenStack then adapted to Grid 5000 s needs and extended I For energy consumption and network traffic I Measurements taken at the infrastructure level (SNMP on network equipment, power distribution units, etc.) I High frequency (aiming at 1 measurement per second) I Data visualized using web interface I Data exported as RRD, HDF5 and Grid 5000 REST API 8000 Night or weekends Day and weekdays Global consumption (W) 7000 6000 5000 4000 3000 2000 1000 0 Jan 29 2015 Feb 01 2015 Feb 04 2015 Feb 07 2015 Lucas Nussbaum Date Feb 10 2015 Feb 13 2015 Testbeds for reproducible research Feb 16 2015 Feb 19 2015 19 / 26

Kwapi: example output 18:39:28 machines are turned off 18:40:28 machines are turned on again and generate network traffic as they boot via PXE 18:49:28 machines reservation is terminated, causing a reboot to the default system Lucas Nussbaum Testbeds for reproducible research 20 / 26

Other testbeds Two recent projects (Oct. 2014 Sep. 2017) Funded by the National Science Foundation, for 10 M$ each All information below TTBOMK: please correct me! Lucas Nussbaum Testbeds for reproducible research 21 / 26

Other testbeds Two recent projects (Oct. 2014 Sep. 2017) Funded by the National Science Foundation, for 10 M$ each All information below TTBOMK: please correct me! Chameleon Led by Kate Keahey (ANL / Univ. Chicago) https://www.chameleoncloud.org/ CloudLab Led by Robert Ricci (Univ. Utah) http://www.cloudlab.us Federated with GENI CloudLab can be used with a GENI account, and vice-versa Lucas Nussbaum Testbeds for reproducible research 21 / 26

Comparison Software stack used as a base Grid 5000: mostly their own Chameleon: OpenStack CloudLab: Emulab Resources description and verification Grid 5000: reference API + g5k-checks (+ human-readable description) Chameleon: same as Grid 5000 CloudLab: machine-readable description using RSpec advertisement format (less detailed than Grid 5000 s, though) + human-readable description in the docs verification: nothing similar to g5k-checks, but LinkTest 1 can validate network configuration 1 D.S. Anderson et al. Automatic Online Validation of Network Configuration in the Emulab Network Testbed. In: ICAC 06. Lucas Nussbaum Testbeds for reproducible research 22 / 26

Comparison (2) Resources reservation Grid 5000: batch scheduler with advance reservation Chameleon: leases using OpenStack Blazar CloudLab: experiments start immediately, default duration of a few hours, can be extended on demand (no advance reservations) Resources reconfiguration / software Grid 5000: Kadeploy Chameleon: OpenStack Ironic CloudLab: Emulab s Frisbee Network reconfiguration and Software Defined Networking Grid 5000: KaVLAN (+ higher level tools) Chameleon: planned, using OpenFlow CloudLab: yes: Emulab s network emulation features OpenFlow access on switches 2 Interconnection to Internet2 s AL2S 2 http://cloudlab-announce.blogspot.com/2015/06/using-openflow-in-cloudlab.html Lucas Nussbaum Testbeds for reproducible research 23 / 26

Comparison (3) Monitoring Grid 5000: Kwapi (power + network) Chameleon: planned, using OpenStack Ceilometer CloudLab: planned 3 Long term storage between experiments Grid 5000: storage5k (file-based and block-based) Chameleon: object store (OpenStack Swift) available soon CloudLab: yes 4, with snapshots (using ZFS) to version data (the snapshots features are not documented yet) 3 http://docs.cloudlab.us/planned.html 4 http://cloudlab-announce.blogspot.fr/2015/04/persistant-dataset.html Lucas Nussbaum Testbeds for reproducible research 24 / 26

Conclusions We are moving From small testbeds, on a per-team/per-lab basis To large-scale shared infrastructures built with reproducibility in mind A bright and exciting future Paving the way to Open Science of HPC and Cloud! (Also: you can get accounts on all of them through Open Access / Preview / Early users programs) One could determine the age of a science by looking at the state of its measurement tools. Gaston Bachelard La formation de l esprit scientifique, 1938 Lucas Nussbaum Testbeds for reproducible research 25 / 26

Bibliography Resources management: Resources Description, Selection, Reservation and Verification on a Large-scale Testbed. http://hal.inria.fr/hal-00965708 Kadeploy: Kadeploy3: Efficient and Scalable Operating System Provisioning for Clusters. http://hal.inria.fr/hal-00909111 KaVLAN, Virtualization, Clouds deployment: Adding Virtualization Capabilities to the Grid 5000 testbed. http://hal.inria.fr/hal-00946971 Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid 5000 Testbed. http://hal.inria.fr/hal-00907888 Kameleon: Reproducible Software Appliances for Experimentation. https://hal.inria.fr/hal-01064825 Distem: Design and Evaluation of a Virtual Experimental Environment for Distributed Systems. https://hal.inria.fr/hal-00724308 Kwapi: A Unified Monitoring Framework for Energy Consumption and Network Traffic. https://hal.inria.fr/hal-01167915 XP management tools: A survey of general-purpose experiment management tools for distributed systems. https://hal.inria.fr/hal-01087519 XPFlow: A workflow-inspired, modular and robust approach to experiments in distributed systems. https://hal.inria.fr/hal-00909347 Using the EXECO toolbox to perform automatic and reproducible cloud experiments. https://hal.inria.fr/hal-00861886 Expo: Managing Large Scale Experiments in Distributed Testbeds. https://hal.inria.fr/hal-00953123 Lucas Nussbaum Testbeds for reproducible research 26 / 26