The Session.. Rosaria Silipo Phil Winters KNIME KNIME.com AG. All Right Reserved.

Similar documents
KNIME Server Workshop

What s new. Bernd Wiswedel KNIME.com AG. All Rights Reserved.

What s cooking. Bernd Wiswedel KNIME.com AG. All Rights Reserved.

Professor Dr. Gholamreza Nakhaeizadeh. Professor Dr. Gholamreza Nakhaeizadeh

What s Cooking. Bernd Wiswedel KNIME KNIME AG. All Rights Reserved.

KNIME Spring Summit Opening -

From Developing Credit Risk Models Using SAS Enterprise Miner and SAS/STAT. Full book available for purchase here.

What s Cooking. Bernd Wiswedel KNIME KNIME.com AG. All Rights Reserved.

A REPORT ON THE STATISTICAL CHARACTERISTICS of the Highlands Ability Battery CD

Using Asta Powerproject in a P6 World. Don McNatty, PSP July 22, 2015

Optimal Vehicle to Grid Regulation Service Scheduling

KNIME Software Pieces KNIME.com AG. All Rights Reserved. 1

What s Cooking. Bernd Wiswedel KNIME KNIME AG. All Rights Reserved.

WHITE PAPER. Preventing Collisions and Reducing Fleet Costs While Using the Zendrive Dashboard

Using cloud to develop and deploy advanced fault management strategies

Preface... xi. A Word to the Practitioner... xi The Organization of the Book... xi Required Software... xii Accessing the Supplementary Content...

LET S ARGUE: STUDENT WORK PAMELA RAWSON. Baxter Academy for Technology & Science Portland, rawsonmath.

DATA QUALITY ASSURANCE AND PERFORMANCE MEASUREMENT OF DATA MINING FOR PREVENTIVE MAINTENANCE OF POWER GRID

Supervised Learning to Predict Human Driver Merging Behavior

Integrating remote sensing and ground monitoring data to improve estimation of PM 2.5 concentrations for chronic health studies

What s New. Bernd Wiswedel KNIME KNIME AG. All Rights Reserved.

Software for Data-Driven Battery Engineering. Battery Intelligence. AEC 2018 New York, NY. Eli Leland Co-Founder & Chief Product Officer 4/2/2018

What s new. Bernd Wiswedel KNIME.com AG. All Rights Reserved.

Data Mining Approach for Quality Prediction and Improvement of Injection Molding Process

PARTIAL LEAST SQUARES: WHEN ORDINARY LEAST SQUARES REGRESSION JUST WON T WORK

Relating your PIRA and PUMA test marks to the national standard

Relating your PIRA and PUMA test marks to the national standard

Technical Manual for Gibson Test of Cognitive Skills- Revised

Investigation of Relationship between Fuel Economy and Owner Satisfaction

AGENT-BASED MODELING, SIMULATION, AND CONTROL SOME APPLICATIONS IN TRANSPORTATION

Oxford case study on storing and sharing solar-generated electricity: Insights from Project ERIC. Energy Storage Summit, 28 April 2016 Twickenham

Automated Driving - Object Perception at 120 KPH Chris Mansley

Your web browser (Safari 7) is out of date. For more security, comfort and. the best experience on this site: Update your browser Ignore

Survey Report Informatica PowerCenter Express. Right-Sized Data Integration for the Smaller Project

Criticism of Romney s Campaign Grows; Six in 10 Rate His Efforts Negatively

Five Cool Things You Can Do With Powertrain Blockset The MathWorks, Inc. 1

Analyzing Uber s Ride-sharing Economy

Discovery of Design Methodologies. Integration. Multi-disciplinary Design Problems

Antonio Olmos Priyalatha Govindasamy Research Methods & Statistics University of Denver

BUILDING A ROBUST INDUSTRY INDEX BASED ON LONGITUDINAL DATA

The Midas Touch Guide for Communication Management, Research and Training/ Education Divisions Page 2

Predicting Solutions to the Optimal Power Flow Problem

Introduction: Problem statement

Orientation and Conferencing Plan Stage 1

Busy Ant Maths and the Scottish Curriculum for Excellence Foundation Level - Primary 1

FALL 2007 MBA EXIT SURVEY (Sample size of 29: 15 responses from the San Marcos location and 14 responses from the RRHEC location)

The Self-Driving Network : How to Realize It Kireeti Kompella, CTO, Engineering

Automatic Traffic Enforcement Strategies. UNECE November 26, 2009

Part 1 What Do I Want/Need in a Vehicle?

Asian paper mill increases control system utilization with ABB Advanced Services

Intelligent Fault Analysis in Electrical Power Grids

Quality Control in Mineral Exploration

David A. Ostrowski Global Data Insights and Analytics

State-of-the-Art and Future Trends in Testing of Active Safety Systems

2018 Linking Study: Predicting Performance on the NSCAS Summative ELA and Mathematics Assessments based on MAP Growth Scores

Scholastic Big Day for PreK. Arkansas Early Childhood Education Framework for Three & Four Year Old Children 2011

Long-term trends in road safety in Finland - evaluation of scenarios towards 2020 and beyond

CRSM: Crowdsourcing based Road Surface Monitoring

Sustainable Mobility Project 2.0 Project Overview. Sustainable Mobility Project 2.0 Mobilitätsbeirat Hamburg 01. July 2015

2018 Linking Study: Predicting Performance on the TNReady Assessments based on MAP Growth Scores

Designing for Reliability and Robustness with MATLAB

Using Telematics Data Effectively The Nature Of Commercial Fleets. Roosevelt C. Mosley, FCAS, MAAA, CSPA Chris Carver Yiem Sunbhanich

Are you as confident and

DOE s Focus on Energy Efficient Mobility Systems

What s Cooking. Bernd Wiswedel KNIME KNIME AG. All Rights Reserved.

Aria Etemad Volkswagen Group Research. Key Results. Aachen 28 June 2017

Agenda. Industrial software systems at ABB. Case Study 1: Robotics system. Case Study 2: Gauge system. Summary & outlook

Scholastic s Early Childhood Program Correlated to the Minnesota Pre-K Standards

2018 Linking Study: Predicting Performance on the Performance Evaluation for Alaska s Schools (PEAKS) based on MAP Growth Scores

Group 3 Final Project Paper

OPTIMIZATION STUDIES OF ENGINE FRICTION EUROPEAN GT CONFERENCE FRANKFURT/MAIN, OCTOBER 8TH, 2018

Commitment to Innovation Leads Fairchild International to Launch New AC Scoop Powered by Baldor Products

Lesson 1: Introduction to PowerCivil

Improving CERs building

Chapter 5 ESTIMATION OF MAINTENANCE COST PER HOUR USING AGE REPLACEMENT COST MODEL

Distribution Forecasting Working Group

Airborne Collision Avoidance System X U

A game theory analysis of market incentives for US switchgrass ethanol

Engineering Entrepreneurship. Ron Lasser, Ph.D. EN 0062 Class #

NO. D - Language YES. E - Literature Total 6 28

NON-FATAL ELECTRICAL INJURIES AT WORK

LECTURE 6: HETEROSKEDASTICITY

Commercial Distributor Incentive Project: Regional Roll-Out

Busy Ant Maths and the Scottish Curriculum for Excellence Year 6: Primary 7

COUNT, CLASSIFICATION & SPEED SAMPLE REPORTS

Tactical Wheeled Vehicle (TWV) Fuel Economy Improvement Breakeven Analysis. Presented at SCEA/IPSA

What do autonomous vehicles mean to traffic congestion and crash? Network traffic flow modeling and simulation for autonomous vehicles

Predictive diagnostics for vehicle battery management

SCI ON TRAC ENCEK WITH

United Power Flow Algorithm for Transmission-Distribution joint system with Distributed Generations

TomTom WEBFLEET Contents. Let s drive business TM. Release note

Lampiran IV. Hasil Output SPSS Versi 16.0 untuk Analisis Deskriptif

Smartdrive SmartIQ Pro packs

Improvement Curves: Beyond The Basics

ME scope Application Note 29 FEA Model Updating of an Aluminum Plate

Motor-CAD End Winding Spray Cooling Model

H LEASE MARKET REPORT

Technical Papers supporting SAP 2009

Linking the Georgia Milestones Assessments to NWEA MAP Growth Tests *

Protecting Occupants

Transcription:

The Session.. Rosaria Silipo Phil Winters KNIME 2016 KNIME.com AG. All Right Reserved.

Past KNIME Summits: Merging Techniques, Data and MUSIC! 2016 KNIME.com AG. All Rights Reserved. 2

Analytics, Machine Learning, Data Science, Data Mining, Predictive Analytics (Big Data): Any sufficiently advanced technology is indistinguishable from magic. Arthur C. Clarke, 1973 Trend1: More Magicians! Trend2: Power to the People! Data Scientist: The Sexiest Job of the 21st Century Harvard Business Review DataHookup Pair_ship 2016 KNIME.com AG. All Rights Reserved. 3

Guided Analytics Power to the People Rosaria Silipo Phil Winters Christian Albrecht KNIME 2016 KNIME.com AG. All Right Reserved.

Agenda Power to the People: 4 approaches Guided Analytics: The User Perspective Guided Analytics: The Platform Summary, Thoughts and Next Actions 2016 KNIME.com AG. All Rights Reserved. 5

Power to the People: 4 approaches Generic Black Box Machine Learning Citizen Data Scientists Analytic Cheat Sheets Guided Analytics Citizen Data Critical Capabilities Scientists Data Access 10% Data Preparation and Exploration 22% Advanced Modelling 5% Visual Composition Framework (VCF) 22% Automation 1% Delivery, Integration & Deployment 1% Platform and Project Management 1% Performance and Scalability 1% User Experience 22% Collaboration 1% Leverage and Productivity 14% Total 100% 2016 KNIME.com AG. All Rights Reserved. 6

Agenda Power to the People: 4 approaches Guided Analytics: The User Perspective Guided Analytics: The Platform Summary, Thoughts and Next Actions 2016 KNIME.com AG. All Rights Reserved. 7

Guided Analytics: Automate Understand 2016 KNIME.com AG. All Rights Reserved. 8

The Business Issue: Product Upsell by a Campaign Manager Lawyer s Insurance: A successful product Content Marketing is key: Right message, right person Young men: insurance for those things that happen (car, rent, purchase) - discount sensitive! Family Age women: protection for your family and children not discount sensitive! Older adults: complaints, purchase protection, contracts not discount sensitive A field in the Campaign Management system is needed to indicate whether a customer is likely to buy Lawyer s Insurance High likelihood individuals should be targeted with an offer Taking into account that each target group should be created around those demographics! 2016 KNIME.com AG. All Rights Reserved. 9

The Data: Classic Marketing Data! Demographics Information about previous product purchases Including whether the target product has been purchased or not Information about channel activity with the organization Some social media data Information about the value to the company 2016 KNIME.com AG. All Rights Reserved. 10

Goals and Requirements CRM data => Upselling of a Lawyer Insurance Calculate Propensity to buy a Lawyer Insurance product Cluster Customers into demographic groups Interactive Analytics Process Upload Data Check Data Quality Cleaning & Preproc. Clustering Refine Clustering Classific ation Scoring 2016 KNIME.com AG. All Rights Reserved. 11

The Analytics Process X-validation error Ratio Std dev/mean Missing Values Outliers Low Variance Zero Skewness High Correlation 3 clusters on demographic Features - Gender - Income - Age Explore Clusters If necessary, split one existing cluster into 3 sub-clusters Dedicated classifier (linear regression) for each cluster & sub-cluster Evaluate overall Accuracy Upload Data Check Data Quality Cleaning & Preproc. Clustering Refine Clustering Classific ation Scoring 2016 KNIME.com AG. All Rights Reserved. 12

2016 KNIME.com AG. All Rights Reserved. 13

2016 KNIME.com AG. All Rights Reserved. 14

2016 KNIME.com AG. All Rights Reserved. 15

2016 KNIME.com AG. All Rights Reserved. 16

2016 KNIME.com AG. All Rights Reserved. 17

2016 KNIME.com AG. All Rights Reserved. 18

2016 KNIME.com AG. All Rights Reserved. 19

2016 KNIME.com AG. All Rights Reserved. 20

2016 KNIME.com AG. All Rights Reserved. 21

2016 KNIME.com AG. All Rights Reserved. 22

2016 KNIME.com AG. All Rights Reserved. 23

2016 KNIME.com AG. All Rights Reserved. 24

2016 KNIME.com AG. All Rights Reserved. 25

2016 KNIME.com AG. All Rights Reserved. 26

0.996 2016 KNIME.com AG. All Rights Reserved. 27

Generic Black Box Analytics 2016 KNIME.com AG. All Rights Reserved. 28

2016 KNIME.com AG. All Rights Reserved. 29

2016 KNIME.com AG. All Rights Reserved. 30

2016 KNIME.com AG. All Rights Reserved. 31

2016 KNIME.com AG. All Rights Reserved. 32

2016 KNIME.com AG. All Rights Reserved. 33

2016 KNIME.com AG. All Rights Reserved. 34

2016 KNIME.com AG. All Rights Reserved. 35

2016 KNIME.com AG. All Rights Reserved. 36

2016 KNIME.com AG. All Rights Reserved. 37

2016 KNIME.com AG. All Rights Reserved. 38

2016 KNIME.com AG. All Rights Reserved. 39

2016 KNIME.com AG. All Rights Reserved. 40

2016 KNIME.com AG. All Rights Reserved. 41

2016 KNIME.com AG. All Rights Reserved. 42

2016 KNIME.com AG. All Rights Reserved. 43

2016 KNIME.com AG. All Rights Reserved. 44

2016 KNIME.com AG. All Rights Reserved. 45

2016 KNIME.com AG. All Rights Reserved. 46

2016 KNIME.com AG. All Rights Reserved. 47

2016 KNIME.com AG. All Rights Reserved. 48

2016 KNIME.com AG. All Rights Reserved. 49

2016 KNIME.com AG. All Rights Reserved. 50

2016 KNIME.com AG. All Rights Reserved. 51

Threshold set to.5 2016 KNIME.com AG. All Rights Reserved. 52

2016 KNIME.com AG. All Rights Reserved. 53

2016 KNIME.com AG. All Rights Reserved. 54

2016 KNIME.com AG. All Rights Reserved. 55

2016 KNIME.com AG. All Rights Reserved. 56

2016 KNIME.com AG. All Rights Reserved. 57

2016 KNIME.com AG. All Rights Reserved. 58

2016 KNIME.com AG. All Rights Reserved. 59

2016 KNIME.com AG. All Rights Reserved. 60

2016 KNIME.com AG. All Rights Reserved. 61

Agenda Power to the People: 4 approaches Guided Analytics: The User Perspective Guided Analytics: The Platform Summary, Thoughts and Next Actions 2016 KNIME.com AG. All Rights Reserved. 62

Goals and Requirements CRM data => Upselling of a Lawyer Insurance Calculate Propensity to buy a Lawyer Insurance product Cluster Customers into demographic groups Interactive Analytics Process Upload Data Check Data Quality Cleaning & Preproc. Clustering Refine Clustering Classific ation Scoring 2016 KNIME.com AG. All Rights Reserved. 63

Summary: the Analytics Process 2016 KNIME.com AG. All Rights Reserved. 64

Summary: Overall Workflow Loop till you are satisfied with total accuracy value 1. Upload file and check data quality 2. Interactive Pre-processing 3. Clustering and cluster refinement 4. Linear Regression and threshold based decision Accuracy evaluation 2016 KNIME.com AG. All Rights Reserved. 65

1. Upload File and check Data Quality Loop till you are satisfied with total accuracy value 1. Upload file and check data quality 2. Interactive Pre-processing 3. Clustering and cluster refinement 4. Linear Regression and threshold based decision Accuracy evaluation 2016 KNIME.com AG. All Rights Reserved. 66

1. Upload and check Data Quality 2016 KNIME.com AG. All Rights Reserved. 67

1. Upload the RIGHT Data File! 2016 KNIME.com AG. All Rights Reserved. 68

1. File Upload Wrapped Node 2016 KNIME.com AG. All Rights Reserved. 69

HTML 1. File Correct? Wrapped Node 2016 KNIME.com AG. All Rights Reserved. 70

1. Wrapped Node Description 2016 KNIME.com AG. All Rights Reserved. 71

1. Data Set Quality 2016 KNIME.com AG. All Rights Reserved. 72

2. Interactive Pre-processing Loop till you are satisfied with total accuracy value 1. Upload file and check data quality 2. Interactive Pre-processing 3. Clustering and cluster refinement 4. Linear Regression and threshold based decision Accuracy evaluation 2016 KNIME.com AG. All Rights Reserved. 73

2. Interactive Pre-processing 2016 KNIME.com AG. All Rights Reserved. 74

2. Column Cleaning by Missing Values 2016 KNIME.com AG. All Rights Reserved. 75

2. Outlier Removal 2016 KNIME.com AG. All Rights Reserved. 76

2. Column Cleaning by 2016 KNIME.com AG. All Rights Reserved. 77

2. Column Cleaning by Sorting Views on a Grid through JSON 2016 KNIME.com AG. All Rights Reserved. 78

3. Clustering and Cluster Refinement Loop till you are satisfied with total accuracy value 1. Upload file and check data quality 2. Interactive Pre-processing 3. Clustering and cluster refinement 4. Linear Regression and threshold based decision Accuracy evaluation 2016 KNIME.com AG. All Rights Reserved. 79

3. Cluster and Cluster Refinement K-Means: 3 clusters on age, income, gender 2016 KNIME.com AG. All Rights Reserved. 80

3. Wrapped Node Viz Clusters 2016 KNIME.com AG. All Rights Reserved. 81

3. Summary Statistics (No interactivity!) 2016 KNIME.com AG. All Rights Reserved. 82

4. Linear Regression and Threshold based Decision Loop till you are satisfied with total accuracy value 1. Upload file and check data quality 2. Interactive Pre-processing 3. Clustering and cluster refinement 4. Linear Regression and threshold based decision Accuracy evaluation 2016 KNIME.com AG. All Rights Reserved. 83

4. Linear Regression and Threshold based Decision Linear Regression Model on each Cluster and Sub-cluster prediction > threshold => 1 prediction <= threshold => 0 Default Threshold = 0.5 2016 KNIME.com AG. All Rights Reserved. 84

4. Correct vs. Wrong Visualization 2016 KNIME.com AG. All Rights Reserved. 85

4. Save or Loop? 2016 KNIME.com AG. All Rights Reserved. 86

4. Linear Regression and Threshold based Decision Would it not be nice to have threshold selection and visual inspection of correct vs. wrong results in the same frame? 2016 KNIME.com AG. All Rights Reserved. 87

4. Automatic Adjustment of Threshold through Scatter Plot Visualization 2016 KNIME.com AG. All Rights Reserved. 88

5. Audit Report Loop till you are satisfied with total accuracy value 1. Upload file and check data quality 2. Interactive Pre-processing 3. Clustering and cluster refinement 4. Linear Regression and threshold based decision Accuracy evaluation 2016 KNIME.com AG. All Rights Reserved. 89

Agenda Power to the People: 4 approaches Guided Analytics: The User Perspective Guided Analytics: The Platform Summary, Thoughts and Next Actions 2016 KNIME.com AG. All Rights Reserved. 90

What we did.. and could have done Data Audit Missings? How handled? Too Many Missings? Strange minimum or maximum values? Strange mean values or large differences between mean and median? Large skew or excessive kurtosis? (for algorithms assuming normal distribution? Gaps in distribution, bi-modal or multi-modal? Values in categorical that don t match valid values High-cardinality categorical variables (possibly needing binning or other treatment) Categorical variables with large percentage of single-value Unusually strong relationships with target variable? High correlation (possibly indicating redundancy)? Report on the data audit and the entire sequence of actions to product the result 2016 KNIME.com AG. All Rights Reserved. 91

What we did.. and could have done Data Audit Missings? How handled? Too Many Missings? Strange minimum or maximum values? Strange mean values or large differences between mean and median? Large skew or excessive kurtosis? (for algorithms assuming normal distribution? Gaps in distribution, bi-modal or multi-modal? Values in categorical that don t match valid values High-cardinality categorical variables (possibly needing binning or other treatment) Categorical variables with large percentage of single-value Unusually strong relationships with target variable? High correlation (possibly indicating redundancy)? Report on the data audit and the entire sequence of actions to product the result 2016 KNIME.com AG. All Rights Reserved. 92

What we did.. and could have done Data Audit Missings? How handled? Too Many Missings? Strange minimum or maximum values? Strange mean values or large differences between mean and median? Large skew or excessive kurtosis? (for algorithms assuming normal distribution? Gaps in distribution, bi-modal or multi-modal? Values in categorical that don t match valid values High-cardinality categorical variables (possibly needing binning or other treatment) Categorical variables with large percentage of single-value Unusually strong relationships with target variable? High correlation (possibly indicating redundancy)? Report on the data audit and the entire sequence of actions to product the result 2016 KNIME.com AG. All Rights Reserved. 93

What we did.. and could have done CRM Artificially Generated Data Set Iris workflow from the EXAMPLES Server to generate: - Existing First Names and Last Names - Existing Streets and Cities - Income and age with binomial distribution (???) - Gaussian random gender assignment - PLZ for certain groups of (age, income) - Shopping Basket: 5 insurance products assigned depending on income and age - Target as 0/1 if customer bought lawyer insurance - Lawyer assigned following purchase of lawyer insurance 2016 KNIME.com AG. All Rights Reserved. 94

What we did.. and could have done Predictive Modelling Using multiple models / smarter decision criteria / Ensembles Clustering Time Series Recommendation 2016 KNIME.com AG. All Rights Reserved. 95

What worked well The Guided packaging around a functional area The number of functions we could quickly make The mixing/matching to guide through the analytics Generating the data! Auditing 2016 KNIME.com AG. All Rights Reserved. 96

Guided Analytics: This was just a first Step! And Now Wrapped workflows for standard tasks? Feature reduction, creation, etc.? Automated decisioning about methods? Data testing environment? Sharing, discussing, developing best practices. Everyone at KNIME would love to discuss your ideas! 2016 KNIME.com AG. All Rights Reserved. 97

Material, white paper, etc. A white paper on initial first steps The approach The workflow The data generation The auditing 2016 KNIME.com AG. All Rights Reserved. 98

Guided Analytics: The User Perspective Power to the People Rosaria Silipo Phil Winters Christian Albrecht KNIME 2016 KNIME.com AG. All Right Reserved.