CSC475 Music Information Retrieval

Similar documents
Professor Dr. Gholamreza Nakhaeizadeh. Professor Dr. Gholamreza Nakhaeizadeh

A REPORT ON THE STATISTICAL CHARACTERISTICS of the Highlands Ability Battery CD

Scholastic s Early Childhood Program Correlated to the Minnesota Pre-K Standards

The Session.. Rosaria Silipo Phil Winters KNIME KNIME.com AG. All Right Reserved.

Web Information Retrieval Dipl.-Inf. Christoph Carl Kling

Supervised Learning to Predict Human Driver Merging Behavior

What do autonomous vehicles mean to traffic congestion and crash? Network traffic flow modeling and simulation for autonomous vehicles

Survey Report Informatica PowerCenter Express. Right-Sized Data Integration for the Smaller Project

Statistical Learning Examples

From Developing Credit Risk Models Using SAS Enterprise Miner and SAS/STAT. Full book available for purchase here.

Appendix B STATISTICAL TABLES OVERVIEW

AGENT-BASED MODELING, SIMULATION, AND CONTROL SOME APPLICATIONS IN TRANSPORTATION

Busy Ant Maths and the Scottish Curriculum for Excellence Foundation Level - Primary 1

Improving CERs building

Wellington Transport Strategy Model. TN19.1 Time Period Factors Report Final

Leveraging AI for Self-Driving Cars at GM. Efrat Rosenman, Ph.D. Head of Cognitive Driving Group General Motors Advanced Technical Center, Israel

Using Telematics Data Effectively The Nature Of Commercial Fleets. Roosevelt C. Mosley, FCAS, MAAA, CSPA Chris Carver Yiem Sunbhanich

Rule-based Integration of Multiple Neural Networks Evolved Based on Cellular Automata

Preface... xi. A Word to the Practitioner... xi The Organization of the Book... xi Required Software... xii Accessing the Supplementary Content...

TRUTH AND LIES: CONSUMER PERCEPTION VS. DATA

REGULATION No. 117 (Tyres rolling noise and wet grip adhesion) Proposal for amendment to the document ECE/TRANS/WP.29/2010/63. Annex 8.

Scholastic s Early Childhood Program correlated to the Kentucky Primary English/Language Arts Standards

2018 Linking Study: Predicting Performance on the Performance Evaluation for Alaska s Schools (PEAKS) based on MAP Growth Scores

2010 National Edition correlated to the. Creative Curriculum Teaching Strategies Gold

Announcements. CS 188: Artificial Intelligence Fall So Far: Foundational Methods. Now: Advanced Applications.

CS 188: Artificial Intelligence Fall Announcements

Your web browser (Safari 7) is out of date. For more security, comfort and. the best experience on this site: Update your browser Ignore

Intelligent Fault Analysis in Electrical Power Grids

Linking the Mississippi Assessment Program to NWEA MAP Tests

Linking the Indiana ISTEP+ Assessments to NWEA MAP Tests

PARTIAL LEAST SQUARES: WHEN ORDINARY LEAST SQUARES REGRESSION JUST WON T WORK

Investigation of Relationship between Fuel Economy and Owner Satisfaction

Linking the Indiana ISTEP+ Assessments to the NWEA MAP Growth Tests. February 2017 Updated November 2017

Module K Quality Function Deployment

2018 Linking Study: Predicting Performance on the NSCAS Summative ELA and Mathematics Assessments based on MAP Growth Scores

CRSM: Crowdsourcing based Road Surface Monitoring

Interactive Text Mining of Service Calls to Improve Customer Support Michael Schuh & Ron Zhang Advanced Product Engineering Oshkosh Corporation

Collective Traffic Prediction with Partially Observed Traffic History using Location-Based Social Media

2018 Linking Study: Predicting Performance on the TNReady Assessments based on MAP Growth Scores

Linking the Georgia Milestones Assessments to NWEA MAP Growth Tests *

CONNECTED AUTOMATION HOW ABOUT SAFETY?

How to build an autonomous anything

Albert Sanzari IE-673 Assignment 5

correlation to HEAD START OUTCOMES

How to build an autonomous anything

Linking the Kansas KAP Assessments to NWEA MAP Growth Tests *

Linking the Alaska AMP Assessments to NWEA MAP Tests

13th annual! Toy Product Design. a project based adventure in product design

NO. D - Language YES. E - Literature Total 6 28

Technical Manual for Gibson Test of Cognitive Skills- Revised

Incremental Joint Extraction of Entity Mentions and Relations

WET GRIP TEST METHOD IMPROVEMENT for Passenger Car Tyres (C1) GRBP 68 th session

David A. Ostrowski Global Data Insights and Analytics

Linking the Florida Standards Assessments (FSA) to NWEA MAP

Oil Palm Ripeness Detector (OPRID) and Non-Destructive Thermal Method of Palm Oil Quality Estimation

Scholastic Big Day for PreK. Arkansas Early Childhood Education Framework for Three & Four Year Old Children 2011

Linking the Virginia SOL Assessments to NWEA MAP Growth Tests *

E M P L O Y E E E N G A G E M E N T

Linking the North Carolina EOG Assessments to NWEA MAP Growth Tests *

ASSIGNMENT II. Author: Felix Heckert Supervisor: Prof. Richard N. Langlois Class: Economies of Organization Date: 02/16/2010

BUILDING A ROBUST INDUSTRY INDEX BASED ON LONGITUDINAL DATA

SOME ISSUES OF THE CRITICAL RATIO DISPATCH RULE IN SEMICONDUCTOR MANUFACTURING. Oliver Rose

Damping Ratio Estimation of an Existing 8-story Building Considering Soil-Structure Interaction Using Strong Motion Observation Data.

Cluster Knowledge and Skills for Business, Management and Administration Finance Marketing, Sales and Service Aligned with American Careers Business

Prediction Model of Driving Behavior Based on Traffic Conditions and Driver Types

Linking the New York State NYSTP Assessments to NWEA MAP Growth Tests *

COMP 776: Computer Vision

Index. Calculated field creation, 176 dialog box, functions (see Functions) operators, 177 addition, 178 comparison operators, 178

AUTONOMOUS VEHICLES & HD MAP CREATION TEACHING A MACHINE HOW TO DRIVE ITSELF

Cooperative Autonomous Driving and Interaction with Vulnerable Road Users

Save-the-date: Workshop on batteries for electric mobility

What s new. Bernd Wiswedel KNIME.com AG. All Rights Reserved.

Optimal Vehicle to Grid Regulation Service Scheduling

Busy Ant Maths and the Scottish Curriculum for Excellence Year 6: Primary 7

Automated Driving - Object Perception at 120 KPH Chris Mansley

SUBJECT AREA(S): Amperage, Voltage, Electricity, Power, Energy Storage, Battery Charging

Cluster Analysis. Presented by: Lauren Franklin and Maria Bakarman COM 631. April 2017

Rolling resistance as a part of total resistance plays a

Data Mining Approach for Quality Prediction and Improvement of Injection Molding Process

Kansas College and Career Ready Standards for English Language Arts Grade 4

Lesson 1: Introduction to PowerCivil

Frequently Asked Questions Style Guide. Developed by E-WRITE ewriteonline.com For the Energy Information Administration eia.doe.

ParkNet: Drive-by Sensing of Road-side Parking Statistics

Linking the PARCC Assessments to NWEA MAP Growth Tests

Data envelopment analysis with missing values: an approach using neural network

Yang Zheng, Amardeep Sathyanarayana, John H.L. Hansen

Descriptive Statistics

Vehicle Scrappage and Gasoline Policy. Online Appendix. Alternative First Stage and Reduced Form Specifications

Online Learning and Optimization for Smart Power Grid

Using cloud to develop and deploy advanced fault management strategies

What s new. Bernd Wiswedel KNIME.com AG. All Rights Reserved.

Pre-lab Questions: Please review chapters 19 and 20 of your textbook

How to Store a Billion Beans [Language Arts]

Regularized Linear Models in Stacked Generalization

Porsche unveils 4-door sports car

A Presentation on. Human Computer Interaction (HMI) in autonomous vehicles for alerting driver during overtaking and lane changing

WET GRIP TEST METHOD IMPROVEMENT for Passenger Car Tyres (C1) Overview of Tyre Industry / ISO activities. Ottawa

Deep Unordered Composition Rivals Syntactic Methods for Text Classification

Common pitfalls in (academic) writing Anya Siddiqi Writing Clinic Language Centre

Transcription:

CSC475 Music Information Retrieval Tags and Music George Tzanetakis University of Victoria 2014 G. Tzanetakis 1 / 53

Table of Contents I 1 Indexing music with tags 2 Tag acquisition 3 Autotagging 4 Evaluation 5 Ideas for future work G. Tzanetakis 2 / 53

Tags Definition A tag is a short phrase or word that can be used to characterize a piece of music. Examples: bouncy, heavy metal, or hand drums. Tags can be related to instruments, genres, amotions, moods, usages, geographic origins, musicological terms, or anything the users decide. Similarly to a text index, a music index associated music documents to tags. A document can be a song, an album, an artist, a record label, etc. We consider songs/tracks to be our musical documents. G. Tzanetakis 3 / 53

Music Index Vocabulary s 1 s 2 s 3 happy.8.2.6 pop.7 0.1 a capella.1.1.5 saxophone 0.7.9 A query can either be a list of tags or a song. Using the music index the system can return a playlist of songs that somehow match the specified tags. G. Tzanetakis 4 / 53

Tag research terminology Note: Cold-start problem: songs that are not annotated can not be retrieved. Popularity bias: songs (in the short head tend to be annotated more thoroughly than unpopular songs (in the long tail). Strong labeling versus weak labeling. Extensible or fixed vocabulary. Structured or unstructured vocabulary. Evaluation is a big challenge due to subjectivity. Tags generalize classification labels G. Tzanetakis 5 / 53

Many thanks to Material for these slides was generously provided by: Mohamed Sordo Emanule Coviello Doug Turnbull G. Tzanetakis 6 / 53

Tagging a song G. Tzanetakis 7 / 53

Tagging multiple songs G. Tzanetakis 8 / 53

Text query G. Tzanetakis 9 / 53

Table of Contents I 1 Indexing music with tags 2 Tag acquisition 3 Autotagging 4 Evaluation 5 Ideas for future work G. Tzanetakis 10 / 53

Sources of Tags Human participation: Surveys Social Tags Games Automatic: Text mining Autotagging G. Tzanetakis 11 / 53

Survey Pandora: a team of approximately 50 expert music reviewers (each with a degree in music and 200 hours of training) annotate songs using a structured vocabulary of between 150 and 200 tags. Tags are objective i.e there is a high degree of inter-reviewer agreement. Between 2000 and 2010, Pandora annotated about 750, 000 songs. Annotation takes approximately 20-30 minutes. CAL500: one song from 500 unique artists, each annod by a minimum of 3 nonexpert reviewers using a structured vocabulary of 174 tags. Standard dataset of training and evaluating tag-based retrieval systems. G. Tzanetakis 12 / 53

Harvesting social tags Last.fm is a music discovery Web site that allows users to contribute social tags through a text box in their audio player interface. It is an example of crowd sourcing. In 2007, 40 million active users built up a vocabulary of 960, 000 free-text tags and used it to annotate millions of songs. All data available through public web API. Tags typically annotate artists rather than sons. Problems with multiple spelling, polysemous tags (such as progressive). G. Tzanetakis 13 / 53

Last.fm tags for Adele G. Tzanetakis 14 / 53

Playing Annotation Games In ISMIR 2007, music annotation games were presented for the first time: ListenGame, Tag-a-Tune, and MajorMiner. ListenGame uses a structured vocabulary and is real time. Tag-a-Tune and MajorMiner are inspired by the ESP Game for image tagging. In this approach the players listen to a track and are asked to enter free text tags until they both enter the same tag. This results in an extensible vocabulary. G. Tzanetakis 15 / 53

Tag-a-tune G. Tzanetakis 16 / 53

Mining web documents There are many text sources of information associated with a music track. These include artist biographies, album reviews, song reviews, social media posts, and personal blogs. The set of documents associated with a song is typically processed by text mining techniques resulting in a vector space representation which can then be used as input to data mining/machine learning techniques (text mining will be covered in more detail in a future lecture). G. Tzanetakis 17 / 53

Table of Contents I 1 Indexing music with tags 2 Tag acquisition 3 Autotagging 4 Evaluation 5 Ideas for future work G. Tzanetakis 18 / 53

cal500.sness.net G. Tzanetakis 19 / 53

Audio feature extraction Audio features for tagging are typically very similar to the ones used for audio classification i.e statistics of the short-time magnitude spectrum over different time scales. G. Tzanetakis 20 / 53

Bag of words for text G. Tzanetakis 21 / 53

Bag of words for audio G. Tzanetakis 22 / 53

Multi-label classification (with twists) Classic classification is single label and multi-class. In multi-label classification each instance can be assigned more than one label. Tag annotation can be viewed as multi-label classification with some additional twists: Synonyms (female voice, woman singing) Subpart relations (string quartet, classical) Sparse (only a small subset of tags applies to each song) Noisy Useful because: Cold start problem Query-by-keywords G. Tzanetakis 23 / 53

Machine Learning for Tag Annotation A straightforward approach is to treat each tag independently as a classification problem. G. Tzanetakis 24 / 53

Tag models Identify songs associated with tag t Merge all features either directly or by model merging Estimate p(x t) G. Tzanetakis 25 / 53

Direct multi-label classifiers Alternatives to individual tag classifiers: K-NN multi-label classifier - straightforward extension that requires strategy for label merging (union or intersection are possibilities) Multi-layer perceptron - simple train directly with multi-label ground truth G. Tzanetakis 26 / 53

Tag co-occurence G. Tzanetakis 27 / 53

Stacking G. Tzanetakis 28 / 53

Stacking II G. Tzanetakis 29 / 53

How stacking can help? G. Tzanetakis 30 / 53

Other terms/variants The main idea behind stacking i.e using the output of a classification stage as the input to a subsequent classification stage has been proposed under several different names: Correction approach (using binary outputs) Anchor classification (for example classification into artists used as a feature for genre classification) Semantic space retrieval Cascaded classification (in computer vision) Stacked generalization (in the classification) Context modeling (in autotagging) Cost-sensitive stacking (variant) G. Tzanetakis 31 / 53

Combining taggers/bag of systems G. Tzanetakis 32 / 53

Table of Contents I 1 Indexing music with tags 2 Tag acquisition 3 Autotagging 4 Evaluation 5 Ideas for future work G. Tzanetakis 33 / 53

Datasets There are several datasets that have been used to train and evaluate auto-tagging. They differ in the amount of data they contain, and the source of the ground truth tag information. Major Miner Magnatagatune CAL500 (the most widely used one) CAL10K MediaEval Reproducibility: common dataset is not enough, ideally exact details about the cross-validation folding process and evaluation scripts should also be included. G. Tzanetakis 34 / 53

Magnatagatune 26K sound clips from magnatune.com Human annotation from the Tag-a-tune game Audio features from the Echo Nest 230 artists 183 tags G. Tzanetakis 35 / 53

CAL-10K Dataset Number of tracks: 10866 Tags: 1053 (genre and acoustic tags) Tags/Track: min = 2, max = 25, µ = 10.9, σ = 4.57, median = 11 Most used tags: major key tonality (4547), acoustic rhythm guitars (2296), a vocal-centric aesthetic (2163), extensive vamping (2130) Less used tags: cocky lyrics (1), psychedelic rock influences (1), breathy vocal sound (1), well-articulated trombone solo (1), lead flute (1) Tags collected using survey Available at: http://cosmal.ucsd.edu/cal/projects/annret/ G. Tzanetakis 36 / 53

Tagging evaluation metrics The inputs to a autotagging evaluation metric are the predicted tags (#tags by #tracks binary matrix) or tag affinities (#tags by #tracks) matrix of reals) and the associated ground truth (binary matrix). Asymmetry between positives and negatives makes classification accuracy not a very good metric. Retrieval metrics are better choices. If the output of the auto-tagging system is affinities then many metrics require binarization. Common binarization variants: select k top scoring tags for each track, threshold each column of tag affinities to achieve the tag priors in the training set. G. Tzanetakis 37 / 53

Annotation vs retrieval One possibility would be to convert matrices into vectors and then use classification evaluation metrics. This approach has the disadvantage that popular tags will dominate and performance in less-frequent tags (which one could argue are more important) will be irrelevant. Therefore the common approach is to treat each tag column separately and then average across tags (retrieval) or alternatively treat each track row separately and average across tracks (annotation). Validation schems are similar to classification: cross-validation, repeated cross-validation, and bootstrapping. G. Tzanetakis 38 / 53

Annotation Metrics Based on counting TP, FP, TN, FN: Precision Recall F-measure G. Tzanetakis 39 / 53

Annotation Metrics based on rank When using affinities it is possible to use rank correlation metrics: Spearman s rank correlation coefficient ρ Kendal tau τ G. Tzanetakis 40 / 53

Retrieval measures - Mean Average Precision Precision at N is the number of relevant songs retrieved out of N divided by N. Rather than choosing N one can average precision for different N and then take the mean over a set of queries (tags). G. Tzanetakis 41 / 53

Retrieval measures - AUC-ROC G. Tzanetakis 42 / 53

Stacking results I G. Tzanetakis 43 / 53

Stacking results II G. Tzanetakis 44 / 53

Stacking results III G. Tzanetakis 45 / 53

Stacking results IV G. Tzanetakis 46 / 53

Stacking results V G. Tzanetakis 47 / 53

MIREX Tag Annotation Task The Music Information Retrieval Evaluation Exchange (MIREX) audio tag annotation task started in 2008 MajorMiner dataset (2300 tracks, 45 tags) Mood tag dataset (6490 tracks, 135 tags) 10 second clips 3-fold cross-validation Binary relevance (F-measure, precision, recall) Affinity ranking (AUC-ROC, Precision at 3,6,9,12,15) G. Tzanetakis 48 / 53

MIREX 2012 F-measure G. Tzanetakis 49 / 53

MIREX 2012 AUC-ROC G. Tzanetakis 50 / 53

History of MIREX tagging G. Tzanetakis 51 / 53

Table of Contents I 1 Indexing music with tags 2 Tag acquisition 3 Autotagging 4 Evaluation 5 Ideas for future work G. Tzanetakis 52 / 53

Open questions Should the tag annotations be sanitized or should the machine learning part handle it? Do auto-taggers generalize outside their collections? Stacking seems to improve results (even though one paper has shown no improvement). How does stacking perform when dealing with synonyms, antonyms, noisy annotations? Why? How can multiple sources of tags be combined? G. Tzanetakis 53 / 53

Future work Weak labeling: in most cases absense of a tag does NOT imply that the tag would not be considered valid by most users Explore a continuous grading of semi-supervised learning where the distinction between supervised and unsupervised is not binary Explore feature clusering of untagged instances Include additional sources of information (separate from tags) such as artist, genre, album multiple instance learning approaches (for example if genre information is available at the album level) Statistical relational learning G. Tzanetakis 54 / 53

Future work The lukewarm start problem: what if some tags are known for the testing data but not all? Missing label type of approaches such as EM Markov logic inference in structured data Other ideas: Online learning where tags enter the system incrementally and individually rather than all at the same time or for a particular instance Taking into account user behavior when interacting with a tag system Personalization vs Crowd: would clustering users based on their tagging make sense? G. Tzanetakis 55 / 53