Accuracy of imputed 50k genotypes from 3k and 6k chips using FImpute version 2

Similar documents
A new Marker-Assisted BLUP genomic evaluation for French dairy breeds

A new Marker-Assisted BLUP genomic evaluation for French dairy breeds

Breed Averages, Percentiles and Genetic Trends as of the week of August 21 st are shown below.

Heritability Estimates for Conformation Traits Gladys Huapaya and Gerrit Kistemaker Canadian Dairy Network

Quarterly Hogs and Pigs

Quarterly Hogs and Pigs

Pedigree updates and phenotypic data improvement

Quarterly Hogs and Pigs

MIT ICAT M I T I n t e r n a t i o n a l C e n t e r f o r A i r T r a n s p o r t a t i o n

Applying Molecular Marker information and Data Mining to a Commercial Breeding Pipeline

Effect of Sample Size and Method of Sampling Pig Weights on the Accuracy of Estimating the Mean Weight of the Population 1

Supplementary Figure 1 Examples of detection of MDA products based on molecular markers. To assess quality of whole-genome amplification by MDA, we

72 HOLSTEIN FRIESIANS

TABLE 4.1 POPULATION OF 100 VALUES 2

Quarterly Hogs and Pigs

Supplemental Data. Long Runs of Homozygosity Are Enriched. for Deleterious Variation. American Journal of Human Genetics, Volume 93

Draft Project Deliverables: Policy Implications and Technical Basis

DRIVER SPEED COMPLIANCE WITHIN SCHOOL ZONES AND EFFECTS OF 40 PAINTED SPEED LIMIT ON DRIVER SPEED BEHAVIOURS Tony Radalj Main Roads Western Australia

A REPORT ON THE STATISTICAL CHARACTERISTICS of the Highlands Ability Battery CD

JERSEY GENETIC SUMMARY

Quarterly Hogs and Pigs

Article: Sulfur Testing VPS Quality Approach By Dr Sunil Kumar Laboratory Manager Fujairah, UAE

GLENEITH. Friday, October 13, 2017 Gl e n e i t h Pa r k G a n m a i n. 14th ANNUAL ON-PROPERTY SALE. Commencing at 11.00am

GLENEITH. Friday, October 12, 2018 Gl e n e i t h Pa r k G a n m a i n. 15th ANNUAL ON-PROPERTY SALE. Commencing at 11.00am

ABI PRISM Linkage Mapping Set Version 2.5

DRIVING PERFORMANCE PROFILES OF DRIVERS WITH PARKINSON S DISEASE

Somatic Cell Count Benchmarks

Effect of driving patterns on fuel-economy for diesel and hybrid electric city buses

PVP Field Calibration and Accuracy of Torque Wrenches. Proceedings of ASME PVP ASME Pressure Vessel and Piping Conference PVP2011-

Wednesday October 25, :00 AM UPI Dairy Arena M-66 Marion, MI

Effectiveness of ECP Brakes in Reducing the Risks Associated with HHFT Trains

JERSEY GENETIC SUMMARY

Electric vehicles a one-size-fits-all solution for emission reduction from transportation?

Robust alternatives to best linear unbiased prediction of complex traits

Driving Tests: Reliability and the Relationship Between Test Errors and Accidents

Updated Assessment of the Drought's Impacts on Crop Prices and Biofuel Production

Supplementary Appendix

Short Status Presentation, August 2018

Structural Analysis Of Reciprocating Compressor Manifold

JERSEY GENETIC SUMMARY

Heat Transfer Enhancement for Double Pipe Heat Exchanger Using Twisted Wire Brush Inserts

PRIME TIME IN THE SUNSHINE

JERSEY GENETIC SUMMARY

PROCEDURES FOR ESTIMATING THE TOTAL LOAD EXPERIENCE OF A HIGHWAY AS CONTRIBUTED BY CARGO VEHICLES

MISS MONTANA-RED-ET

Real-time Bus Tracking using CrowdSourcing

Sustainable Urban Transport Index (SUTI)

Multinational enterprise groups in the EU Dissemination from the EGR

Passenger seat belt use in Durham Region

Abstract. 1. Introduction. 1.1 object. Road safety data: collection and analysis for target setting and monitoring performances and progress

Session D2 - Use of 3D Acoustic Telemetry to Monitor Upstream Passage of American Shad on the Merrimack River in Massachusetts

Autonomous taxicabs in Berlin a spatiotemporal analysis of service performance. Joschka Bischoff, M.Sc. Dr.-Ing. Michal Maciejewski

PEI ,686 9, Nova Scotia ,678 14, New Brunswick ,134 12,

Sustainable Emission Testing SET II Project General Findings. Gerhard Müller

Electrostatic Ignition Hazards Associated with the Pneumatic Transfer of Flammable Powders through Insulating or Dissipative Tubes and Hoses

Smart Operation for AC Distribution Infrastructure Involving Hybrid Renewable Energy Sources

House of Commons Standing Committee on Agriculture and Agri-Food: Perception of Public Trust in the Canadian Agriculture Sector

100 Rams 30 Stud Ewes

Samuel F. Hutton

I-95 high-risk driver analysis using multiple imputation methods

PARTIAL LEAST SQUARES: WHEN ORDINARY LEAST SQUARES REGRESSION JUST WON T WORK

Product Loss During Retail Motor Fuel Dispenser Inspection

Electric School Bus Pilot Program - Webinar

New Zealand Transport Outlook. VKT/Vehicle Numbers Model. November 2017

REDUCING VULNERABILITY OF AN ELECTRICITY INTENSIVE PROCESS THROUGH AN ASYNCHRONOUS INTERCONNECTION

ON PROPERTY RAM SALE LIST

Modeling and Optimization of a Linear Electromagnetic Piston Pump

Etiwanda White Dorpers Spring Ram Sale.

Accelerating the Development of Expandable Liner Hanger Systems using Abaqus

Powertrain Performance vs. engine performance

Getting Started with Correlated Component Regression (CCR) in XLSTAT-CCR

PHYSICAL MODEL TESTS OF ICE PASSAGE AT LOCKS

Shock tube based dynamic calibration of pressure sensors

Accident Reconstruction & Vehicle Data Recovery Systems and Uses

Novel Algorithms for Induction Motor Efficiency Estimation

The Influence of Voltage Stability on Congestion Management Cost in a Changing Electricity System. Fabian Hinz.

JERSEY GENETIC SUMMARY

Particulate emissions from vehicles: contribution of research to EU policy development

September Arizona Regional MLS. All Home Types Single Family Detached Manufactured All Other

Automotive Industry Review Dennis DesRosiers March 8, 2011

Exercise 4-1. Flowmeters EXERCISE OBJECTIVE DISCUSSION OUTLINE DISCUSSION. Rotameters. How do rotameter tubes work?

Oilseeds and Products

Should You Cull Young Bucks? Insights from the West-East Yana Project at the Faith Ranch

Galapagos San Cristobal Wind Project. VOLT/VAR Optimization Report. Prepared by the General Secretariat

Driver Speed Compliance in Western Australia. Tony Radalj and Brian Kidd Main Roads Western Australia

Deploying Smart Wires at the Georgia Power Company (GPC)

Use of National Household Travel Survey (NHTS) Data in Assessment of Impacts of PHEVs on Greenhouse Gas (GHG) Emissions and Electricity Demand

Paul Warner Chad Kreeger

A new methodology for the experimental evaluation of organic friction reducers additives in high fuel economy engine oils. M.

Puerto Rico Observational Survey of Seat Belt Use, 2017

2013 National Wagyu Sire Summary Washington State University, Department of Animal Sciences

Baseline Update for International Livestock Markets

FutureMetrics LLC. 8 Airport Road Bethel, ME 04217, USA. Cheap Natural Gas will be Good for the Wood-to-Energy Sector!

How and why does slip angle accuracy change with speed? Date: 1st August 2012 Version:

Assessing the Methodology for Testing Body Armor

Seeing Sound: A New Way To Reduce Exhaust System Noise

CAPTURING THE SENSITIVITY OF TRANSIT BUS EMISSIONS TO CONGESTION, GRADE, PASSENGER LOADING, AND FUELS

Feasibility Survey of Fuel Briquette Demands in Roasting Food Restaurants in Chiang Mai Province, Thailand

Traffic Safety Facts

CASCAD. (Causal Analysis using STAMP for Connected and Automated Driving) Stephanie Alvarez, Yves Page & Franck Guarnieri

Transcription:

Accuracy of imputed 50k genotypes from 3k and 6k chips using FImpute version 2 Sargolzaei, M. 12, Schenkel, F. 2 and Chesnais, J. 1 1 L'Alliance Boviteq, Saint-Hyacinthe, QC, Canada 2 University of Guelph, Centre for Genetic Improvement of Livestock, Guelph, ON, Canada Introduction The accuracy of GPA depends on many factors. The quality of input data is one of the most important. In genomic evaluation two sources of data are used: 1) phenotypes (or proofs derived from them) 2) genotypes. Traditional proofs (EBVs) used in the estimation set tend to be quite reliable. Genotypes received from the laboratory are also very accurate. Four different Bovine SNP chips are currently commercially available, namely HD, 50kV1, 50KV2 and 3k. A new 6k SNP chip will be available soon. The accuracy of HD and 50k genotypes is extremely high. However, the Golden Gate technology used in the 3k chip results in around 1% genotyping errors. The 3k panel has been extensively used to perform inexpensive genotyping on a large number of animals so more producers can afford the cost of genotyping. However, with the 3k SNP chip there exist more errors in imputed genotypes, which may affect GPA accuracy for certain animals. The experience with 3k imputation was very successful for animals with immediate genotyped ancestors (in Holstein, most animals are in this group). However, animals with missing pedigree or ungenotyped parents seem to have lower imputation accuracy from 3k to 50k mainly due to genotyping errors and the difficulty of determining the gametic phase using the 3k panel. The density of the 3k panel is sufficient for capturing close linkage (family) information but not high enough to capture short range linkage disequilibrium. Imputation accuracy is then mainly influenced by the direct relationship between the density of the low density (LD) chip and the number of generations from the genotyped ancestors. With a denser LD panel more information can be recovered from distant 50k or HD genotyped ancestors. Therefore a new LD panel (6k) has been developed by Illumina with Infinium technology to improve genotyping error rate and increase genome coverage. In the present study, the accuracy of imputation from 6k to 50k has been assessed and compared to that of the 3k SNP chip. Materials and Methods Holstein, Jersey and Brown Swiss data sets from the Canadian Dairy Network (CDN) August genomic run were used. There were 42,503 useable SNP on the 50k panel. Table 1 shows simple statistics for each breed. The data set was divided into a reference and a validation group. The validation group included 50k genotyped animals born after 2009 for HO and JE and after 2008 for BS. Three scenarios for validation animals were considered: 1. 2,641 SNP were kept and the rest of genotypes were set to missing. 1% error was randomly simulated on validation animals' genotypes. Dairy Cattle Breeding and Genetics Committee Meeting, September 13, 2011. 1

2. 2,641 SNP were kept and the rest of genotypes were set to missing (no errors were simulated). 3. 6,701 SNP were kept and the rest of genotypes were set to missing. The reference group for family imputation included all other animals including 3k animals The reference group for population imputation consisted of 2,000 individuals for HO and JE, and 1,553 for BS. The genotypes of validation animals (3k/6k) were imputed to 50k based on the information from the reference animals using FImpute version2. Imputation was done in 3 steps a. Genotypes known with certainty were filled b. Family imputation was carried out c. Population imputation based on haplotypes from the second step was performed After imputation, correct, incorrect and missing call rate were computed for all originally called SNP (low density SNP were included). Table 1 - Statistics Breed Total 50k 3k Val No. ped cnfl * No. ref ** Holstein 106,437 67,160 39,277 20,031 893 2,000 Jersey 13,248 5,786 7,462 1,289 239 2,000 Brown Swiss 2,335 2,031 305 209 9 1,553 * No. pedigree conflicts removed (Mendelian error rate >2%). ** No. of reference individuals for population imputation Results: Table 2 - Overall imputation accuracy - 3k vs 6k Breed 3k+1% error 6k Gain Holstein 97.81 99.47 1.66 Jersey 97.07 99.12 2.05 Brown Swiss 95.91 98.97 3.06 Dairy Cattle Breeding and Genetics Committee Meeting, September 13, 2011. 2

Table 3 - Imputation accuracy for different scenarios - Holstein Sire Dam No. Correct Incorrect Missing Correct Incorrect Missing Correct Incorrect Missing 50k 12,593 99.17 0.83 0.00 98.77 1.23 0.00 99.71 0.29 0.00 50k 3k 903 97.94 1.88 0.18 97.53 2.27 0.20 98.99 0.77 0.24 0k 6,081 97.17 2.83 0.01 96.29 3.71 0.01 99.17 0.83 0.00 Unknown 19 95.12 4.88 0.00 93.99 6.00 0.01 98.55 1.45 0.00 50k 34 96.48 3.50 0.02 95.98 3.97 0.05 98.02 1.94 0.05 3k 3k 18 96.50 3.42 0.08 95.96 3.92 0.13 97.98 1.92 0.10 50k 121 96.61 3.36 0.03 95.96 4.01 0.03 98.65 1.34 0.01 0k 3k 11 93.67 6.19 0.14 92.87 6.95 0.18 97.41 2.40 0.19 0k 157 90.33 9.66 0.02 88.58 11.40 0.02 96.92 3.07 0.01 50k 21 95.85 4.13 0.01 94.70 5.28 0.03 98.79 1.21 0.00 Unknown 3k 4 94.22 5.59 0.19 92.94 6.85 0.21 98.22 1.67 0.11 0k 35 90.09 9.90 0.00 88.31 11.68 0.01 96.77 3.21 0.02 Unknown 34 92.46 7.53 0.01 90.32 9.67 0.01 97.99 2.01 0.00 Overall 20,031 98.37 1.61 0.01 97.81 2.18 0.01 99.47 0.52 0.01 Dairy Cattle Breeding and Genetics Committee Meeting, September 13, 2011. 3

Table 4 - Number of animals with high error rate and missing rate for different scenarios - Holstein Sire Dam No. >5%Err >10%Err >5%Miss >5%Err >10%Err >5%Miss >5%Err >10%Err >5%Miss 50k 12,593 5 1 0 5 1 0 2 0 0 50k 3k 903 7 3 0 7 3 0 3 0 0 0k 6,081 470 15 0 1,574 21 0 13 3 0 Unknown 19 5 0 0 15 1 0 0 0 0 50k 34 8 3 0 9 3 0 5 0 0 3k 3k 18 3 1 0 3 2 0 3 0 0 50k 121 24 14 0 32 15 0 11 0 0 0k 3k 11 5 1 0 7 2 0 1 0 0 0k 157 144 65 0 151 103 0 15 4 0 50k 21 4 0 0 15 0 0 0 0 0 Unknown 3k 4 2 0 0 4 0 0 0 0 0 0k 35 31 11 0 35 19 0 4 2 0 Unknown 34 32 2 0 34 11 0 0 0 0 Overall 20,031 740 116 0 1,891 181 0 57 9 0 Dairy Cattle Breeding and Genetics Committee Meeting, September 13, 2011. 4

Table 5 - Imputation accuracy for different scenarios - Jersey Sire Dam No Correct Incorrect Missing Correct Incorrect Missing Correct Incorrect Missing 50k 477 99.14 0.86 0.00 98.80 1.20 0.00 99.67 0.33 0.00 50k 3k 298 97.57 2.20 0.23 97.18 2.57 0.25 98.75 0.97 0.29 0k 463 96.65 3.33 0.02 95.80 4.18 0.02 98.94 1.06 0.00 3k 3k 2 97.08 2.54 0.38 96.78 2.82 0.40 98.43 1.22 0.35 0k 3 95.85 4.13 0.02 94.79 5.19 0.03 98.73 1.22 0.05 50k 8 95.55 4.45 0.00 94.33 5.66 0.00 98.38 1.62 0.00 0k 3k 4 90.27 9.56 0.17 89.30 10.49 0.20 95.59 4.25 0.16 0k 25 92.04 7.95 0.01 90.50 9.47 0.04 97.44 2.55 0.01 Unknown 0k 6 92.68 7.32 0.00 91.19 8.81 0.00 97.97 2.02 0.01 Unknown 3 92.94 6.85 0.21 91.59 8.10 0.31 97.73 2.23 0.04 Overall 1,289 97.64 2.30 0.06 97.07 2.87 0.07 99.12 0.82 0.07 Dairy Cattle Breeding and Genetics Committee Meeting, September 13, 2011. 5

Table 6 - Number of animals with high error rate and missing rate for different scenarios - Jersey Sire Dam No >5%Err >10%Err >5%Miss >5%Err >10%Err >5%Miss >5%Err >10%Err >5%Miss 50k 477 0 0 0 0 0 0 0 0 0 50k 3k 298 3 0 0 5 0 0 0 0 0 0k 463 30 3 0 87 3 0 3 2 0 3k 3k 2 0 0 0 0 0 0 0 0 0 0k 3 0 0 0 2 0 0 0 0 0 50k 8 1 1 0 4 1 0 0 0 0 0k 3k 4 3 2 0 4 2 0 2 0 0 0k 25 22 4 0 25 7 0 2 0 0 Unknown 0k 6 5 1 0 6 1 0 1 0 0 Unknown 3 3 0 0 3 0 0 0 0 0 Overall 1,289 67 11 0 136 14 0 8 2 0 Dairy Cattle Breeding and Genetics Committee Meeting, September 13, 2011. 6

Table 7 - Imputation accuracy for different scenarios - Brown Swiss Sire Dam No Correct Incorrect Missing Correct Incorrect Missing Correct Incorrect Missing 50k 25 98.97 1.03 0.00 98.61 1.39 0.00 99.57 0.43 0.00 50k 3k 5 97.46 2.42 0.12 97.09 2.79 0.12 98.71 1.00 0.29 0k 164 96.75 3.24 0.01 95.89 4.11 0.00 99.00 1.00 0.00 0k 50k 1 95.85 4.13 0.02 95.41 4.57 0.02 99.00 1.00 0.00 0k 14 92.67 7.33 0.00 90.94 9.04 0.03 97.71 2.28 0.01 Overall 209 96.76 3.24 0.01 95.91 4.08 0.01 98.97 1.02 0.01 Table 8 - Number of animals with high error rate and missing rate for different scenarios - Brown Swiss Sire Dam No >5%Err >10%Err >5%Miss >5%Err >10%Err >5%Miss >5%Err >10%Err >5%Miss 50k 25 0 0 0 0 0 0 0 0 0 50k 3k 5 0 0 0 0 0 0 0 0 0 0k 164 5 1 0 27 1 0 1 0 0 0k 50k 1 0 0 0 0 0 0 0 0 0 0k 14 13 1 0 14 4 0 0 0 0 Overall 209 18 2 0 41 5 0 1 0 0 Dairy Cattle Breeding and Genetics Committee Meeting, September 13, 2011. 7

Table 9 - Percentage of correctly imputed genotypes for each chromosome BTA Length No. Holstein Jersey Brown Swiss (Mbp) 3k 6k 3k+%1 6k 3k+1% 6k 3k+1% 6k 1 158.1 157 388 98.33 99.59 97.86 99.37 96.76 99.23 2 136.7 128 342 98.02 99.51 97.32 99.15 96.29 99.04 3 121.1 115 297 98.01 99.50 97.37 99.17 95.93 98.95 4 120.6 126 298 97.98 99.50 97.33 99.27 96.06 98.96 5 121.1 111 300 97.76 99.55 97.16 99.28 96.11 99.16 6 129.8 114 299 97.79 99.35 97.18 99.10 96.61 99.03 7 112.4 110 274 97.84 99.52 97.11 99.15 95.95 98.92 8 112.9 118 298 98.18 99.56 97.61 99.26 96.15 99.06 9 105.5 110 264 98.22 99.56 97.60 99.23 96.51 99.11 10 103.1 98 266 98.01 99.52 97.27 99.17 96.22 98.96 11 107 105 279 97.86 99.51 96.94 99.08 96.19 99.10 12 90.9 90 223 97.85 99.45 97.20 99.15 95.65 98.90 13 83.8 82 215 97.70 99.45 96.86 99.08 95.51 99.03 14 83.2 86 223 97.87 99.42 96.64 98.95 96.06 98.90 15 84.2 90 221 97.96 99.49 97.48 99.23 96.01 98.90 16 81.3 81 201 97.85 99.46 96.65 98.95 96.30 99.03 17 74.9 71 189 97.45 99.45 96.87 99.15 95.58 98.96 18 65.4 66 173 97.42 99.33 97.06 99.09 95.40 98.82 19 63.5 56 175 97.10 99.36 96.08 98.80 94.42 98.78 20 71.6 72 204 97.79 99.53 97.32 99.25 95.78 99.01 21 71.2 71 182 97.69 99.42 96.72 98.96 95.13 98.77 22 61.2 66 167 97.92 99.45 97.17 99.03 96.57 98.98 23 52.1 50 151 97.06 99.39 96.61 99.05 94.92 98.85 24 62.1 66 173 97.85 99.54 96.87 99.14 96.10 99.16 25 42.8 45 139 97.46 99.39 96.41 98.93 94.54 98.72 26 51 47 146 97.39 99.39 96.43 98.98 95.00 98.79 27 45.4 44 136 97.26 99.45 95.63 98.87 94.32 98.89 28 46.2 48 124 97.22 99.36 96.28 98.85 95.08 98.88 29 51.1 52 134 97.74 99.35 97.16 99.08 96.12 98.71 30 * 15.9 4 17 87.07 95.35 81.02 93.73 78.44 93.00 31 ** 143.8 135 203 98.42 99.49 98.06 99.17 97.70 99.29 All 2,669.8 2,614 6,701 97.81 99.47 97.07 99.12 95.91 98.97 * Pseudo autosomal region ** Sex specific region Dairy Cattle Breeding and Genetics Committee Meeting, September 13, 2011. 8

Conclusions: Imputation accuracy using the 3k panel is very high when both parent are genotyped with the 50k panel and the gain from using the 6k panel is small in this case. The 6k panel resulted in substantially higher accuracy for animals with low family information especially for those with both parents missing or ungenotyped. The 6k panel worked much better in Brown Swiss and Jersey compared to the 3k panel mainly due to less family information in these two breeds. Dairy Cattle Breeding and Genetics Committee Meeting, September 13, 2011. 9