Combining Multi-Engine Machine Translation and Online Learning through Dynamic Phrase Tables

Similar documents
Incremental Joint Extraction of Entity Mentions and Relations

Text Generation and Neural Style Transfer

Design & Development of Regenerative Braking System at Rear Axle

INTELLIGENT ENERGY MANAGEMENT IN A TWO POWER-BUS VEHICLE SYSTEM

Heavy-Duty Vehicle Efficiency Global status and current research

EXTENDING PRT CAPABILITIES

BGE Smart Energy Pricing: Customers are making it work

SHC Swedish Centre of Excellence for Electromobility

MOTORISTS' PREFERENCES FOR DIFFERENT LEVELS OF VEHICLE AUTOMATION

Session Four Applying functional safety to machine interlock guards

POWER, PARALLEL AUTONOMY, AND PEOPLE Gill Pratt CEO at Toyota Research Institute GTC 2016

Technological Innovation, Environmentally Sustainable Transport, Travel Demand, Scenario Analysis, CO 2

Journal of Emerging Trends in Computing and Information Sciences

Antonio Olmos Priyalatha Govindasamy Research Methods & Statistics University of Denver

SUBJECT AREA(S): Amperage, Voltage, Electricity, Power, Energy Storage, Battery Charging

Supervised Learning to Predict Human Driver Merging Behavior

ParkNet: Drive-by Sensing of Road-side Parking Statistics

APPLICATION OF RELIABILITY GROWTH MODELS TO SENSOR SYSTEMS ABSTRACT NOTATIONS

Incorporating Drivability Metrics into Optimal Energy Management Strategies for Hybrid Vehicles. Daniel Opila

INTELLIGENT ENERGY MANAGEMENT IN A TWO POWER-BUS VEHICLE SYSTEM. DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited.

Design and development of mobile service for ecodriving

Autonomous taxicabs in Berlin a spatiotemporal analysis of service performance. Joschka Bischoff, M.Sc. Dr.-Ing. Michal Maciejewski

MPPT Control System for PV Generation System with Mismatched Modules

MIT ICAT M I T I n t e r n a t i o n a l C e n t e r f o r A i r T r a n s p o r t a t i o n

Analysis on Steering Gain and Vehicle Handling Performance with Variable Gear-ratio Steering System(VGS)

Multi-body Dynamical Modeling and Co-simulation of Active front Steering Vehicle

Consideration on the Implications of the WLTC - (Worldwide Harmonized Light-Duty Test Cycle) for a Middle Class Car

GT-Suite European User Conference

SUMMARY REPORT ON EVALUATION OF A FUEL ADDITIVE AT SOUTHWEST RESEARCH INSTITUTE SAN ANTONIO, TEXAS

Deploying Smart Wires at the Georgia Power Company (GPC)

Support for the revision of the CO 2 Regulation for light duty vehicles

MSFI TECHNOLOGY AT SAFRAN AIRCRAFT

Booming Noise Optimization on an All Wheel Drive Vehicle

Estimation of value of time for autonomous driving using revealed and stated preferences method

HAS MOTORIZATION IN THE U.S. PEAKED? PART 9: VEHICLE OWNERSHIP AND DISTANCE DRIVEN, 1984 TO 2015

Generator Efficiency Optimization at Remote Sites

Denver Car Share Program 2017 Program Summary

A REPORT ON THE STATISTICAL CHARACTERISTICS of the Highlands Ability Battery CD

D-MESH EFFICIENT SOLUTION FOR SMART METERING

REDUCING THE OCCURRENCES AND IMPACT OF FREIGHT TRAIN DERAILMENTS

Featured Articles Utilization of AI in the Railway Sector Case Study of Energy Efficiency in Railway Operations

Aria Etemad Volkswagen Group Research. Key Results. Aachen 28 June 2017

Multiphysics Modeling of Railway Pneumatic Suspensions

ROAD SAFETY RESEARCH, POLICING AND EDUCATION CONFERENCE, NOV 2001

ABB June 19, Slide 1

MOTORISTS' PREFERENCES FOR DIFFERENT LEVELS OF VEHICLE AUTOMATION: 2016

The MathWorks Crossover to Model-Based Design

Adaptive Fault-Tolerant Control for Smart Grid Applications

Electric vehicles a one-size-fits-all solution for emission reduction from transportation?

STPA in Automotive Domain Advanced Tutorial

HAS MOTORIZATION IN THE U.S. PEAKED? PART 2: USE OF LIGHT-DUTY VEHICLES

HELLENIC REPUBLIC MINISTRY OF DEVELOPMENT DIRECTORATE-GENERAL FOR ENERGY DIRECTORATE FOR RENEWABLE ENERGY SOURCES AND ENERGY-SAVING EXTENSIVE SUMMARY

Accurate and available today: a ready-made implementation of a battery management system for the new 48V automotive power bus

Comparative analysis of ship efficiency metrics

Leveraging AI for Self-Driving Cars at GM. Efrat Rosenman, Ph.D. Head of Cognitive Driving Group General Motors Advanced Technical Center, Israel

AGENT-BASED MODELING, SIMULATION, AND CONTROL SOME APPLICATIONS IN TRANSPORTATION

Discovery of Design Methodologies. Integration. Multi-disciplinary Design Problems

EPSRC-JLR Workshop 9th December 2014 TOWARDS AUTONOMY SMART AND CONNECTED CONTROL

Highway Safety Countermeasures

Weakly-Supervised Grammar-Informed Bayesian CCG Parser Learning

Intelligent Fault Analysis in Electrical Power Grids

NOISE REDUCTION ON AGRICULTURAL TRACTOR BY SHEET METAL OPTIMIZATION TAFE LIMITED

AGENT-BASED MICRO-STORAGE MANAGEMENT FOR THE SMART GRID. POWER AGENT: Salman Kahrobaee, Rasheed Rajabzadeh, Jordan Wiebe

Parameters Matching and Simulation on a Hybrid Power System for Electric Bulldozer Hong Wang 1, Qiang Song 2,, Feng-Chun SUN 3 and Pu Zeng 4

Ensuring increased runway throughput with advanced parallel runway operations and enhanced wake turbulence categories operations

Multi-Band Radio Frequency Energy Harvesting Storing in Super-Capacitor for Self- Sustainable Cognitive radio networks

Electric Vehicles and the Environment (EVE IWG)

Testing(and(evaluation(of(fault(handling( strategies(in(the(research(concept(vehicle((

INTEGRATED HYDRO-MECHANICAL SIMULATION OF A CAM-ROCKER ARM-UNIT INJECTOR SYSTEM TO ADDRESS NOISE AND VIBRATION ISSUES

Update on Fast SAR Techniques and IEC V3. Matthias MEIER Chairman of Advisory Board, ART-Fi 10 April 2014

Area-Wide Road Pricing Research in Minnesota

Optimal Vehicle to Grid Regulation Service Scheduling

MAXQ HRL in Soar. Mitchell Keith Bloch. University of Michigan. May 17, 2010

Reinventing Urban Transportation and Mobility. Pascal Van Hentenryck University of Michigan Ann Arbor, MI

Dual-Rail Domino Logic Circuits with PVT Variations in VDSM Technology

Railway noise control in urban areas. Jakob Oertli, SBB Infrastructure, Noise Abatement; Chair UIC Noise Groups

Flexible Public Transport Modelling for Large Urban Areas

Electric Vehicles Coordinated vs Uncoordinated Charging Impacts on Distribution Systems Performance

Use of Flow Network Modeling for the Design of an Intricate Cooling Manifold

Automotive NVH with Abaqus. About this Course

China Intelligent Connected Vehicle Technology Roadmap 1

Regularized Linear Models in Stacked Generalization

EVs and PHEVs environmental and technological evaluation in actual use

Integrated System Design Optimisation: Combining Powertrain and Control Design

Reluctance Motors Synchrel Design & Optimisation

Comparing FEM Transfer Matrix Simulated Compressor Plenum Pressure Pulsations to Measured Pressure Pulsations and to CFD Results

Vehicle Seat Bottom Cushion Clip Force Study for FMVSS No. 207 Requirements

White Paper Nest Learning Thermostat Efficiency Simulation for the U.K. Nest Labs April 2014

Design and Evaluation of Serial-Hybrid Vehicle Energy Gauges

Measuring Accessibility. Andrew Owen Director, Accessibility Observatory May 17, 2017

Invitation Workshop «Smart Workflows in Cytogenetics and Pathology» Zurich September 7th, 2016

ASI-CG 3 Annual Client Conference

Electric Vehicles and the Environment (EVE IWG)

(Type) Approval. Future and Current Developments INTRODUCTION. Partner in Mobiliteit. 4 july 2018

Design and Implementation of a Charging and Accounting Architecture for QoS-differentiated VPN Services to Mobile Users

The IAM in Pre-Selection of global automotive trends impacting the independent multi-brand aftermarket

Global Outlook & EMSA - Electric Motor Systems Annex

Approach for determining WLTPbased targets for the EU CO 2 Regulation for Light Duty Vehicles

Combustion Performance

Algebraic Integer Encoding and Applications in Discrete Cosine Transform

Transcription:

Combining Multi-Engine Machine Translation and Online Learning through Dynamic Phrase Tables Rico Sennrich University of Zurich Institute of Computational Linguistics 30.05.2011 Rico Sennrich Multi-Engine MT and Online Learning 1 / 20

Overview Multi-Engine Machine Translation Combine output of multiple translation systems Motivation Implementation Results Online Learning In post-editing environment: (partially) retrain system on corrected translation Similar implementation as multi-engine MT; results and combination with multi-engine MT Rico Sennrich Multi-Engine MT and Online Learning 2 / 20

Multi-Engine MT: Setting Text+Berg Corpus Collection of Alpine texts (publication of the Swiss Alpine Club since 1864) Since 1957: parallel edition DEFR parallel corpus of 4 million tokens. Research project: domain-specic SMT System BLEU METEOR in-domain SMT system 17.18 38.28 Personal Translator 14 13.29 35.68 Google Translate 12.94 34.36 Table: MT performance DEFR. Rico Sennrich Multi-Engine MT and Online Learning 3 / 20

Domain-specic translations DE Text+Berg Europarl Angri tentative ([climbing] attempt) attaque (attack) Führer guide (guide) dirigeant (leader) Pass col (mountain pass) passeport (passport) Spitze pointe (peak) tête (head [of an organisation]) Vorsprung ressaut (ledge) avance (lead) Rico Sennrich Multi-Engine MT and Online Learning 4 / 20

Multi-Engine MT: Motivation Do we need a full-edged SMT system for system combination? In WMT system combination tasks, approaches that do not consider source text still work well. Target side alignment; confusion network decoding with LM Examples: MANY [Bar10], MEMT [HL10] Let's see if it helps... Our observations: In-domain system suers from data-sparseness (high OOV rate). Out-of-domain and rule-based systems are worse than in-domain system, but have greater lexical coverage. Our conclusions: Promising strategy: prefer in-domain system for phrases it knows, and choose other systems otherwise. We hope to prot from source-side information and source-target alignment. Rico Sennrich Multi-Engine MT and Online Learning 5 / 20

Implementation Architecture Moses framework Primary system trained on in-domain training data Translation hypotheses are integrated through additional phrase table (alternative translation path during decoding) Optimization with MERT Rico Sennrich Multi-Engine MT and Online Learning 6 / 20

Implementation: Related Work This architecture is similar to [CEF + 07]. image source: Chen et al. (2007): Multi-Engine Machine Translation with an Open-Source (SMT) Decoder. In Proceedings of the Second Workshop on Statistical Machine Translation. Rico Sennrich Multi-Engine MT and Online Learning 7 / 20

Implementation Training secondary phrase table Trained on translation hypotheses for sentences to be translated dynamic (re-)training for any number of sentences Word alignment with MGIZA++ (using existing model from primary system) Phrase extraction with Moses heuristics Features in phrase table: p(t s); p(s t), lexical weights lex(t s); lex(s t) (and constant phrase penalty) Two dierent scoring methods to obtain feature values: vanilla and modied Rico Sennrich Multi-Engine MT and Online Learning 8 / 20

Implementation: Scoring vanilla scoring Scoring of phrase pairs as implemented in Moses Calculations based on Maximum-Likelihood Estimation (MLE) Problem: MLE is unreliable if frequencies are low ( 1 1, 1 2 ) modied scoring Add frequencies of primary and secondary corpus Secondary corpus has little eect if phrase is frequent in primary 500 corpus: 1000 = 0.5 vs. 500+2 1000+2 = 0.501 Secondary corpus has large eect if phrase is rare in primary corpus: 1 1+2 3 = 0.333 vs. 3+2 = 0.6 Fits our strategy of preferring primary corpus where possible, and considering external hypotheses for rare/unknown words Rico Sennrich Multi-Engine MT and Online Learning 9 / 20

Evaluation Systems Software from WMT 2010 system combination shared task. Dominant paradigm: output alignment and confusion network decoding MANY (Loïc Barrault) [Bar10] MEMT (Kenneth Heaeld) [BL05] Concatenation of parallel training corpus and translation hypotheses slow Dynamic - vanilla scoring Dynamic - modied (re-)scoring Rico Sennrich Multi-Engine MT and Online Learning 10 / 20

Results Combination System BLEU METEOR Personal Translator 14 13.29 35.68 Google Translate 12.94 34.36 in-domain SMT system 17.18 38.28 MANY 18.23 39.68 MEMT 18.39 39.01 Concat 19.11 39.45 Dynamic (vanilla) 19.33 40.00 Dynamic (modied) 20.06 40.59 Table: SMT performance DEFR for multiple system combination approaches. Rico Sennrich Multi-Engine MT and Online Learning 11 / 20

Results: Performance with Varying Phrase Table Size BLEU 20 19 18 modied vanilla 17 baseline 2 10 20 100 200 1k 4k dynamic corpus size (sentence pairs) Figure: SMT performance DEFR as a function of dynamic phrase table size. Comparison of vanilla scoring and modied scoring. Rico Sennrich Multi-Engine MT and Online Learning 12 / 20

Results Multi-Engine MT Multi-engine MT gives large performance boost (2.9 BLEU points over best individual system) Re-scoring with frequencies from primary corpus is eective: Performance gain over vanilla scoring (0.7 BLEU points) Performance does not degrade if secondary corpus is small Rico Sennrich Multi-Engine MT and Online Learning 13 / 20

Examples Source Reference System 1 (Moses) System 2 (PT 14) System 3 (Google Translate) Multi-Engine (vanilla) Multi-Engine (modied) Er ist ein Konditionswunder. He is in miraculous shape. C'est un miracle de condition physique. C'est un Konditionswunder. C'est un miracle de condition. Il est un miracle de remise en forme. C'est un miracle de condition. C'est un miracle de condition. Rico Sennrich Multi-Engine MT and Online Learning 14 / 20

Examples Source Reference System 1 (Moses) System 2 (PT 14) System 3 (Google Transl.) Multi-Engine (vanilla) Multi-Engine (modied) Wir konnten das Aussehen der Pässe nur ahnen. We could only guess at the look of the mountain passes. Nous ne pouvions que deviner l'aspect des cols. nous ne pouvions seulement deviner l'aspect des cols. Nous ne pouvions que nous douter de l'air des passeports. Nous ne pouvions imaginer l'aspect de la passe. nous ne pouvions de l'air des cols de la passe. nous ne pouvions l'aspect des cols que deviner. Rico Sennrich Multi-Engine MT and Online Learning 15 / 20

Online Learning Learning from Previous Translations In post-editing environment, how can we use previous, corrected translations to improve SMT quality? Hardt and Elming [HE10] propose incremental re-training of secondary phrase table. same principle that we used for multi-engine MT. Implementation We simulate approach with reference translations instead of actual post-editing. Alignment/scoring as for multi-engine MT - but with previous reference translations instead of translation hypotheses. Phrase table is dynamically rebuilt after each sentence. No new MERT; instead, both phrase tables use baseline weights. Rico Sennrich Multi-Engine MT and Online Learning 16 / 20

Online Learning System BLEU METEOR baseline 17.18 38.28 vanilla scoring 16.81 37.61 modied scoring 17.57 38.60 Table: SMT performance DEFR with online learning system. Rico Sennrich Multi-Engine MT and Online Learning 17 / 20

Combination of Multi-Engine MT and Online Learning System BLEU METEOR baseline 17.18 38.28 online learning 17.57 38.60 multi-engine MT 19.93 40.52 combined 20.05 40.61 Table: SMT performance DEFR with system combining multi-engine MT and online learning. Rico Sennrich Multi-Engine MT and Online Learning 18 / 20

Results Online Learning & Combination Online learning led to relatively small performance gain Incremental re-training more eective for texts with high text-internal repetition (Hardt and Elming [HE10], clinical trial protocols: 4 BLEU points increase) Combination of multi-engine MT and online learning possible, but no performance gain in this evaluation Rico Sennrich Multi-Engine MT and Online Learning 19 / 20

Conclusion Final Comments Multi-engine MT simple to implement, and promising for people/companies with little training data. In-domain system is more than Yet Another Hypothesis Approach has strong dependence on primary corpus: your mileage may vary Online learning experiments (and combination of both) were below expectations not necessary failure of technique, but applied to wrong corpus. Rico Sennrich Multi-Engine MT and Online Learning 20 / 20

Conclusion Final Comments Multi-engine MT simple to implement, and promising for people/companies with little training data. In-domain system is more than Yet Another Hypothesis Approach has strong dependence on primary corpus: your mileage may vary Online learning experiments (and combination of both) were below expectations not necessary failure of technique, but applied to wrong corpus. Thank you for your attention! Rico Sennrich Multi-Engine MT and Online Learning 20 / 20

Barrault, Loïc: MANY: Open source MT system combination at WMT'10. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pages 277281, Uppsala, Sweden, July 2010. Association for Computational Linguistics. http://www.aclweb.org/anthology/w10-1740. Banerjee, Satanjeev and Alon Lavie: METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 6572, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics. http://www.aclweb.org/anthology/w/w05/w05-0909. Chen, Yu, Andreas Eisele, Christian Federmann, Eva Hasler, Michael Jellinghaus, and Silke Theison: Multi-engine machine translation with an open-source decoder for statistical machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation, StatMT '07, pages 193196, Morristown, NJ, USA, 2007. Association for Computational Linguistics. http://portal.acm.org/citation.cfm?id=1626355.1626381. Hardt, Daniel and Jakob Elming: Incremental re-training for post-editing SMT. In Conference of the Association for Machine Translation in the Americas 2010 (AMTA 2010), Denver, CO, USA, 2010. Heaeld, Kenneth and Alon Lavie: CMU multi-engine machine translation for WMT 2010. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, WMT '10, pages 301306, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics, ISBN 978-1-932432-71-8. http://portal.acm.org/citation.cfm?id=1868850.1868894. Rico Sennrich Multi-Engine MT and Online Learning 20 / 20