A Unified Regularized Group PLS Algorithm Scalable to Big Data
|
|
- Randell Shields
- 5 years ago
- Views:
Transcription
1 A Unified Regularized Group PLS Algorithm Scalable to Big Data Pierre Lafaye de Micheaux 1, Benoit Liquet 2, Matthew Sutton 3 21 October, CREST, ENSAI. 2 Université de Pau et des Pays de l Adour, LMAP. 3 Queensland Uninversity of Technology, Brisbane, Australia. Big Data PLS Methods JSTAR 2016, Rennes 1/54
2 Contents 1. Motivation: Integrative Analysis for group data 2. Application on a HIV vaccine study 3. PLS approaches: SVD, PLS-W2A, canonical, regression 4. Sparse Models Lasso penalty Group penalty Group and Sparse Group PLS 5. R package: sgpls 6. Regularized PLS Scalable to BIG-DATA 7. Concluding remarks Big Data PLS Methods JSTAR 2016, Rennes 2/54
3 Integrative Analysis Wikipedia. Data integration involves combining data residing in different sources and providing users with a unified view of these data. This process becomes significant in a variety of situations, which include both commercial and scientific domains. System Biology. Integrative Analysis: Analysis of heterogeneous types of data from inter-platform technologies. Goal. Combine multiple types of data: Contribute to a better understanding of biological mechanisms. Have the potential to improve the diagnosis and treatments of complex diseases. Big Data PLS Methods JSTAR 2016, Rennes 3/54
4 Example: Data definition p q n X - n observations - p variables Y - n observations - q variables n Big Data PLS Methods JSTAR 2016, Rennes 4/54
5 Example: Data definition p q n X - n observations - p variables Y - n observations - q variables n Omics. Y matrix: gene expression, X matrix: SNP (single nucleotide polymorphism). Many others such as proteomic, metabolomic data. Big Data PLS Methods JSTAR 2016, Rennes 4/54
6 Example: Data definition p q n X - n observations - p variables Y - n observations - q variables n Omics. Y matrix: gene expression, X matrix: SNP (single nucleotide polymorphism). Many others such as proteomic, metabolomic data. Neuroimaging. Y matrix: behavioral variables, X matrix: brain activity (e.g., EEG, fmri, NIRS) Big Data PLS Methods JSTAR 2016, Rennes 4/54
7 Example: Data definition p q n X - n observations - p variables Y - n observations - q variables n Omics. Y matrix: gene expression, X matrix: SNP (single nucleotide polymorphism). Many others such as proteomic, metabolomic data. Neuroimaging. Y matrix: behavioral variables, X matrix: brain activity (e.g., EEG, fmri, NIRS) Neuroimaging Genetics. Y matrix: DTI (Diffusion Tensor Imaging), X matrix: SNP Big Data PLS Methods JSTAR 2016, Rennes 4/54
8 Data: Constraints and Aims Main constraint: colinearity among the variables, or situation with p > n or q > n. But p and q are supposed to be not too large. Big Data PLS Methods JSTAR 2016, Rennes 5/54
9 Data: Constraints and Aims Main constraint: colinearity among the variables, or situation with p > n or q > n. But p and q are supposed to be not too large. Two Aims: 1. Symmetric situation. Analyze the association between two blocks of information. Analysis focused on shared information. Big Data PLS Methods JSTAR 2016, Rennes 5/54
10 Data: Constraints and Aims Main constraint: colinearity among the variables, or situation with p > n or q > n. But p and q are supposed to be not too large. Two Aims: 1. Symmetric situation. Analyze the association between two blocks of information. Analysis focused on shared information. 2. Asymmetric situation. X matrix= predictors and Y matrix= response variables. Analysis focused on prediction. Big Data PLS Methods JSTAR 2016, Rennes 5/54
11 Data: Constraints and Aims Main constraint: colinearity among the variables, or situation with p > n or q > n. But p and q are supposed to be not too large. Two Aims: 1. Symmetric situation. Analyze the association between two blocks of information. Analysis focused on shared information. 2. Asymmetric situation. X matrix= predictors and Y matrix= response variables. Analysis focused on prediction. Partial Least Square Family: dimension reduction approaches Big Data PLS Methods JSTAR 2016, Rennes 5/54
12 Data: Constraints and Aims Main constraint: colinearity among the variables, or situation with p > n or q > n. But p and q are supposed to be not too large. Two Aims: 1. Symmetric situation. Analyze the association between two blocks of information. Analysis focused on shared information. 2. Asymmetric situation. X matrix= predictors and Y matrix= response variables. Analysis focused on prediction. Partial Least Square Family: dimension reduction approaches PLS finds pairs of latent vectors ξ = Xu, ω = Yv with maximal covariance. e.g., ξ = u 1 SNP 1 + u 2 SNP u p SNP p Symmetric situation and Asymmetric situation. Matrix decomposition of X and Y into successive latent variables. Latent variables: are not directly observed but are rather inferred (through a mathematical model) from other variables that are observed (directly measured). Capture an underlying phenomenon (e.g., health). Big Data PLS Methods JSTAR 2016, Rennes 5/54
13 PLS and sparse PLS Classical PLS Output of PLS: H pairs of latent variables (ξ h, ω h ), h = 1,..., H. Reduction method (H << min(p, q)). But no variable selection for extracting the most relevant (original) variables from each latent variable. Big Data PLS Methods JSTAR 2016, Rennes 6/54
14 PLS and sparse PLS Classical PLS Output of PLS: H pairs of latent variables (ξ h, ω h ), h = 1,..., H. Reduction method (H << min(p, q)). But no variable selection for extracting the most relevant (original) variables from each latent variable. sparse PLS sparse PLS selects the relevant SNPs Some coefficients u l are equal to 0 ξ h = u 1 SNP 1 + u }{{} 2 SNP 2 + u 3 SNP }{{} u p SNP p =0 =0 The spls components are linear combinations of the selected variables Big Data PLS Methods JSTAR 2016, Rennes 6/54
15 Group structures within the data Natural example: Categorical variables form a group of dummy variables in a regression setting. Big Data PLS Methods JSTAR 2016, Rennes 7/54
16 Group structures within the data Natural example: Categorical variables form a group of dummy variables in a regression setting. Genomics: genes within the same pathway have similar functions and act together in regulating a biological system. These genes can add up to have a larger effect can be detected as a group (i.e., at a pathway or gene set/module level). Big Data PLS Methods JSTAR 2016, Rennes 7/54
17 Group structures within the data Natural example: Categorical variables form a group of dummy variables in a regression setting. Genomics: genes within the same pathway have similar functions and act together in regulating a biological system. These genes can add up to have a larger effect can be detected as a group (i.e., at a pathway or gene set/module level). We consider that variables are divided into groups: Example: p SNPs grouped into K genes (X j = SNP j ) [ X = SNP 1,..., SNP k } {{ } gene 1 SNP k+1, SNP k+2,..., SNP } {{ } h gene 2... SNP l+1,..., SNP p } {{ } gene K Example: p genes grouped into K pathways/modules (X j = gene j ) [ ] X = X 1, X 2,..., X } {{ } k X k+1, X k+2,..., X h... X } {{ } l+1, X l+2,..., X p } {{ } M 1 M 2 M K ] Big Data PLS Methods JSTAR 2016, Rennes 7/54
18 Group PLS Aim: select groups of variables taking into account the data structure Big Data PLS Methods JSTAR 2016, Rennes 8/54
19 Group PLS Aim: select groups of variables taking into account the data structure PLS components ξ h = u 1 X 1 + u 2 X 2 + u 3 X u p X p sparse PLS components (spls) ξ h = u 1 X 1 + u }{{} 2 X 2 + u 3 X }{{} u p X p =0 =0 Big Data PLS Methods JSTAR 2016, Rennes 8/54
20 Group PLS Aim: select groups of variables taking into account the data structure PLS components ξ h = u 1 X 1 + u 2 X 2 + u 3 X u p X p sparse PLS components (spls) ξ h = u 1 X 1 + u }{{} 2 X 2 + u 3 X }{{} u p X p =0 =0 group PLS components (gpls) module 1 module 2 module K { }} {{ }} {{ }} { ξ h = u }{{} 1 X 1 + u }{{} 2 X 2 + u }{{} 3 X 3 + u }{{} 4 X 1 + u }{{} 5 X u p 1 X p 1 + u p X p }{{}}{{} =0 = =0 =0 select groups of variables; either all the variables within a group are selected or none of them are selected Big Data PLS Methods JSTAR 2016, Rennes 8/54
21 Group PLS Aim: select groups of variables taking into account the data structure PLS components ξ h = u 1 X 1 + u 2 X 2 + u 3 X u p X p sparse PLS components (spls) ξ h = u 1 X 1 + u }{{} 2 X 2 + u 3 X }{{} u p X p =0 =0 group PLS components (gpls) module 1 module 2 module K { }} {{ }} {{ }} { ξ h = u }{{} 1 X 1 + u }{{} 2 X 2 + u }{{} 3 X 3 + u }{{} 4 X 1 + u }{{} 5 X u p 1 X p 1 + u p X p }{{}}{{} =0 = =0 =0 select groups of variables; either all the variables within a group are selected or none of them are selected... does not achieve sparsity within each group... Big Data PLS Methods JSTAR 2016, Rennes 8/54
22 Sparse Group PLS Aim: combine both sparsity of groups and within each group. Example: X matrix = genes. We might be interested in identifying particularly important genes in pathways of interest. sparse PLS components (spls) ξ h = u 1 X 1 + u }{{} 2 X 2 + u 3 X }{{} u p X p =0 =0 group PLS components (gpls) module 1 module 2 module K { }} {{ }} {{ }} { ξ h = u }{{} 1 X 1 + u }{{} 2 X 2 + u }{{} 3 X 3 + u }{{} 4 X 1 + u }{{} 5 X u p 1 X p 1 + u p X p }{{}}{{} =0 = =0 =0 Big Data PLS Methods JSTAR 2016, Rennes 9/54
23 Sparse Group PLS Aim: combine both sparsity of groups and within each group. Example: X matrix = genes. We might be interested in identifying particularly important genes in pathways of interest. sparse PLS components (spls) ξ h = u 1 X 1 + u }{{} 2 X 2 + u 3 X }{{} u p X p =0 =0 group PLS components (gpls) module 1 module 2 module K { }} {{ }} {{ }} { ξ h = u }{{} 1 X 1 + u }{{} 2 X 2 + u }{{} 3 X 3 + u }{{} 4 X 1 + u }{{} 5 X u p 1 X p 1 + u p X p }{{}}{{} =0 = =0 =0 sparse group PLS components (sgpls) module 1 module 2 module K { }} {{ }} {{ }} { ξ h = u }{{} 1 X 1 + u }{{} 2 X 2 + u }{{} 3 X 3 + u }{{} 4 X 4 + u }{{} 5 X u p 1 X p 1 + u p X p }{{}}{{} =0 =0 0 =0 =0 =0 =0 Big Data PLS Methods JSTAR 2016, Rennes 9/54
24 Aims in a regression setting Select groups of variables taking into account the data structure; all the variables within a group are selected otherwise none of them are selected Combine both sparsity of groups and within each group; only relevant variables within a group are selected Big Data PLS Methods JSTAR 2016, Rennes 10/54
25 Illustration: Dendritic Cells in Addition to Antiretroviral Treatment (DALIA) trial Evaluation of the safety and the immunogenicity of a vaccine on n = 19 HIV-1 infected patients. The vaccine was injected on weeks 0, 4, 8 and 12 while patients received an antiretroviral therapy. An interruption of the antiretrovirals was performed at week 24. After vaccination, a deep evaluation of the immune response was performed at week 16. Repeated measurements of the main immune markers and gene expression were performed every 4 weeks until the end of the trials. Big Data PLS Methods JSTAR 2016, Rennes 11/54
26 DALIA trial: Question? First results obtained using group of genes Significant change of gene expression among 69 modules over time before antiretroviral treatment interruption. Big Data PLS Methods JSTAR 2016, Rennes 12/54
27 DALIA trial: Question? First results obtained using group of genes Significant change of gene expression among 69 modules over time before antiretroviral treatment interruption. How does the gene abundance of these 69 modules as measured at week 16 correlate with immune markers measured at week 16? Big Data PLS Methods JSTAR 2016, Rennes 12/54
28 spls, gpls and sgpls Response variables Y= immune markers composed of q = 7 cytokines (IL21, IL2, IL13, IFNg, Luminex score, TH1 score, CD4). Predictor variables X= expression of p = 5399 genes extracted from the 69 modules. Use the structure of the data (modules) for gpls and sgpls. Each gene belongs to one of the 69 modules. Asymmetric situation. Big Data PLS Methods JSTAR 2016, Rennes 13/54
29 Results: Modules and number of genes selected p = 5399 ; 24 modules selected by gpls or sgpls on 3 scores Big Data PLS Methods JSTAR 2016, Rennes 14/54
30 Results: Modules and number of genes selected Big Data PLS Methods JSTAR 2016, Rennes 15/54
31 Results: Venn diagram Big Data PLS Methods JSTAR 2016, Rennes 16/54
32 Results: Venn diagram sgpls selects slightly more genes than spls (respectively 487 and 420 genes selected) But sgpls selects fewer modules than spls (respectively 21 and 64 groups of genes selected) Note: all the 21 groups of genes selected by sgpls were included in those selected by spls. sgpls selects slightly more modules than gpls (4 more, 14/21 in common). Big Data PLS Methods JSTAR 2016, Rennes 16/54
33 Results: Venn diagram sgpls selects slightly more genes than spls (respectively 487 and 420 genes selected) But sgpls selects fewer modules than spls (respectively 21 and 64 groups of genes selected) Note: all the 21 groups of genes selected by sgpls were included in those selected by spls. sgpls selects slightly more modules than gpls (4 more, 14/21 in common). However, gpls leads to more genes selected than sgpls (944) In this application, the sgpls approach led to a parsimonious selection of modules and genes that sound very relevant biologically Chaussabel s functional modules: wikis/module annotation/v2 Trial 8 Modules Big Data PLS Methods JSTAR 2016, Rennes 16/54
34 M7.1 M5.5 M5.1 M3.5 M5.7 M7.5 M4.7 M5.4 M7.4 M6.2 M6.7 M5.9 M5.6 M7.8 M4.9 M1.1 M7.6 M4.6 M6.4 M7.2 M5.3 M5.8 M6.9 M3.2 M2.1 M3.1 M4.1 M4.2 M6.6 M4.8 M3.6 M7.7 M5.2 M4.5 M4.4 M2.3 M4.3 M4.2 M6.6 M3.2 M7.1 M3.6 M4.6 M1.1 M4.8 M2.3 M3.1 M4.4 M5.1 M4.7 M4.1 M2.1 M5.3 M7.8 M6.9 M5.5 M4.3 M6.7 M6.4 M5.7 M5.9 M3.5 M6.2 M7.2 M7.6 M5.8 M4.5 M4.9 M5.2 M7.5 M7.7 M5.4 M7.4 M5.6 Stability of the variable selection (100 bootstrap samples) gpls component Selected Not Selected 1.0 Selected Not Selected Frequency M7.1 M5.1 M3.2 M4.2 M4.6 M6.6 M5.7 M1.1 M3.6 M3.5 M4.1 M5.5 M5.3 M4.7 M2.3 M3.1 M7.2 M4.4 M5.4 M5.8 M7.5 M7.8 M6.4 M6.7 M7.6 M4.3 M5.6 M5.9 M7.7 M4.8 M4.9 M6.2 M2.1 M7.4 M4.5 M5.2 M6.9 sgpls component Frequency M5.15 M7.27 M4.13 M4.15 M6.13 M7.35 M5.14 M6.14 M6.10 M5.11 M7.33 M8.14 M4.14 M6.20 M4.11 M8.59 M4.12 M7.14 Module M7.15 M4.16 M7.24 M7.25 M8.13 M8.35 M7.16 M5.10 M7.11 M5.13 M7.12 M7.21 M7.26 M7.28 M5.15 M4.13 M4.15 M7.27 M6.13 M5.14 M5.11 M6.14 M7.35 M7.12 M7.15 M4.16 M7.25 M6.10 Module M7.11 M7.33 M8.14 M5.10 M7.16 M4.12 M4.14 M7.24 M6.20 M7.21 M5.13 M7.14 M7.26 M7.28 M8.35 M8.13 M4.11 M8.59 spls component Selected Not Selected Apoptosis / Survival Apoptosis / Survival Cell Cycle 0.8 Cell Death Cytotoxic/NK Cell Erythrocytes 0.6 Inflammation Frequency 0.4 Mitochondrial Respiration Mitochondrial Stress / Proteasome Monocytes Neutrophils Plasma Cells Platelets 0.2 Protein Synthesis T cell T cells 0.0 Undetermined M5.13 M5.10 M7.12 M7.11 M7.25 M7.21 M7.16 M6.20 M6.10 M5.11 M4.12 M4.16 M7.15 M4.15 Module M7.14 M6.14 M7.26 M7.24 M6.13 M4.13 M5.14 M7.27 M8.14 M7.35 M8.13 M5.15 M7.28 M7.33 M4.14 M8.59 M8.35 M4.11 Stability of the variable selection assessed on 100 bootstrap samples on DALIA-1 trial data, for the gpls, sgpls and spls procedures respectively. For each procedure, the modules selected on the original sample are separated from those that were not. Big Data PLS Methods JSTAR 2016, Rennes 17/54
35 Now some mathematics... Big Data PLS Methods JSTAR 2016, Rennes 18/54
36 PLS family PLS = Partial Least Squares or Projection to Latent Structures Four main methods coexist in the literature: (i) Partial Least Squares Correlation (PLSC) also called PLS-SVD; (ii) PLS in mode A (PLS-W2A, for Wold s Two-Block, Mode A PLS); (iii) PLS in mode B (PLS-W2B) also called Canonical Correlation Analysis (CCA); (iv) Partial Least Squares Regression (PLSR, or PLS2). Big Data PLS Methods JSTAR 2016, Rennes 19/54
37 PLS family PLS = Partial Least Squares or Projection to Latent Structures Four main methods coexist in the literature: (i) Partial Least Squares Correlation (PLSC) also called PLS-SVD; (ii) PLS in mode A (PLS-W2A, for Wold s Two-Block, Mode A PLS); (iii) PLS in mode B (PLS-W2B) also called Canonical Correlation Analysis (CCA); (iv) Partial Least Squares Regression (PLSR, or PLS2). (i),(ii) and (iii) are symmetric while (iv) is asymmetric. Different objective functions to optimise. Good news: all use the singular value decomposition (SVD). Big Data PLS Methods JSTAR 2016, Rennes 19/54
38 Singular Value Decomposition (SVD) Definition 1 Let a matrix M : p q of rank r: r M = U V T = δ l u l v T l, (1) l=1 U = (u l ) : p p and V = (v l ) : q q are two orthogonal matrices which contain the normalised left (resp. right) singular vectors = diag(δ 1,..., δ r, 0,..., 0): the ordered singular values δ 1 δ 2 δ r > 0. Note: fast and efficient algorithms exist to solve the SVD. Big Data PLS Methods JSTAR 2016, Rennes 20/54
39 Connexion between SVD and maximum covariance We were able to describe the optimization problem of the four PLS methods as: (u, v ) = argmax Cov(X h 1 u, Y h 1 v), h = 1,..., H. u 2 = v 2 =1 Matrices X h and Y h are obtained recursively from X h 1 and Y h 1. Big Data PLS Methods JSTAR 2016, Rennes 21/54
40 Connexion between SVD and maximum covariance We were able to describe the optimization problem of the four PLS methods as: (u, v ) = argmax Cov(X h 1 u, Y h 1 v), h = 1,..., H. u 2 = v 2 =1 Matrices X h and Y h are obtained recursively from X h 1 and Y h 1. The four methods differ by the deflation process, chosen so that the above scores or weight vectors satisfy given constraints. Big Data PLS Methods JSTAR 2016, Rennes 21/54
41 Connexion between SVD and maximum covariance We were able to describe the optimization problem of the four PLS methods as: (u, v ) = argmax Cov(X h 1 u, Y h 1 v), h = 1,..., H. u 2 = v 2 =1 Matrices X h and Y h are obtained recursively from X h 1 and Y h 1. The four methods differ by the deflation process, chosen so that the above scores or weight vectors satisfy given constraints. The solution at step h is obtained by computing only the first triplet (δ 1, u 1, v 1 ) of singular elements of the SVD of M h 1 = X T h 1 Y h 1: (u, v ) = (u 1, v 1 ) Big Data PLS Methods JSTAR 2016, Rennes 21/54
42 Connexion between SVD and maximum covariance We were able to describe the optimization problem of the four PLS methods as: (u, v ) = argmax Cov(X h 1 u, Y h 1 v), h = 1,..., H. u 2 = v 2 =1 Matrices X h and Y h are obtained recursively from X h 1 and Y h 1. The four methods differ by the deflation process, chosen so that the above scores or weight vectors satisfy given constraints. The solution at step h is obtained by computing only the first triplet (δ 1, u 1, v 1 ) of singular elements of the SVD of M h 1 = X T h 1 Y h 1: Why is this useful? (u, v ) = (u 1, v 1 ) Big Data PLS Methods JSTAR 2016, Rennes 21/54
43 SVD properties Theorem 2 Eckart-Young (1936) states that the (truncated) SVD of a given matrix M (of rank r) provides the best reconstitution (in a least squares sense) of M by a matrix with a lower rank k: min M A of rank k A 2 F = k M δ l u l v T l l=1 2 F = r l=k+1 δ 2 l. If the minimum is searched for matrices A of rank 1, which are under the form ũṽ T where ũ, ṽ are non-zero vectors, we obtain M ũṽ T 2 r δ 2 l = M δ1 u 1 v T 2 1. F min ũ,ṽ F = l=2 Big Data PLS Methods JSTAR 2016, Rennes 22/54
44 SVD properties Thus, solving argmin ũ,ṽ M h 1 ũṽ T 2 and norming the resulting vectors gives us u 1 and v 1. This is another approach to solve the PLS optimization problem. F (2) Big Data PLS Methods JSTAR 2016, Rennes 23/54
45 Towards sparse PLS Shen and Huang (2008) connected (2) (in a PCA context) to least square minimisation in regression: 2 M h 1 ũṽ T 2 = 2 vec(m F h 1 ) (I } {{ } p ũ)ṽ = } {{ } vec(m h 1 ) (ṽ I } {{ } q )ũ. } {{ } y Xβ Possible to use many existing variable selection techniques using regularization penalties. 2 y Xβ 2 Big Data PLS Methods JSTAR 2016, Rennes 24/54
46 Towards sparse PLS Shen and Huang (2008) connected (2) (in a PCA context) to least square minimisation in regression: 2 M h 1 ũṽ T 2 = 2 vec(m F h 1 ) (I } {{ } p ũ)ṽ = } {{ } vec(m h 1 ) (ṽ I } {{ } q )ũ. } {{ } y Xβ Possible to use many existing variable selection techniques using regularization penalties. We propose iterative alternating algorithms to find normed vectors ũ/ ũ and ṽ/ ṽ that minimise the following penalised sum-of-squares criterion M h 1 ũṽ T 2 F + P λ(ũ, ṽ), for various penalization terms P λ (ũ, ṽ). We obtain several sparse versions (in terms of the weights u and v) of the four methods (i) (iv). 2 Big Data PLS Methods JSTAR 2016, Rennes 24/54 y Xβ 2
47 Sparse PLS models For cases (i) (iv), Aim: obtaining sparse weight vectors u h and v h. Associated component scores (i.e., latent variables) ξ h := X h 1 u h and ω h := Y h 1 v h, h = 1,..., H, for a small number of components. Recursive procedure with objective function involving X h 1 and Y h 1 decomposition (approximation) of the original matrices X and Y: X = Ξ H C T H + F X,H, Y = Ω HD T H + F Y,H, (3) where Ξ = (ξ h ) and Ω = (ω h ). For the regression mode, we have the multivariate linear regression model Y = X B PLS + E, with B PLS = U H (C T H U H) 1 P H D T H and E is a matrix of residuals. Big Data PLS Methods JSTAR 2016, Rennes 25/54
48 Example case (ii): PLS-W2A Definition 3 The objective function at step h is subject to the constraints: (u h, v h ) = argmax Cov(X h 1 u, Y h 1 v) u 2 = v 2 =1 Cov(ξ h, ξ j ) = Cov(ω h, ω j ) = 0, 1 j < h. In order to satisfy these constraints: X h = P ξ h X h 1 and Y h = P ω h Y h 1, (X 0 = X, Y 0 = Y) where ξ h (resp. Ω h ) is the first left (resp. right) singular vector obtained by applying a SVD to M h 1 := X T h 1 Y h 1, h = 1,..., H. Big Data PLS Methods JSTAR 2016, Rennes 26/54
49 Regression mode (iv): PLSR, PLS2 Aim of this asymmetric model is prediction. PLS2 finds latent variables that model X and simultaneously predict Y. Difference with PLS-W2A is the deflation step: X h = P ξ h X h 1 and Y h = P ξ h Y h 1. Big Data PLS Methods JSTAR 2016, Rennes 27/54
50 The algorithm Main steps of the iterative algorithm 1. X 0 = X, Y 0 = Y h = 1 2. M h 1 := X T h 1 Y h SVD: extraction of the first pair of singular vectors u h and v h. 4. Sparsity step. Produces sparse weights u sparse and v sparse. 5. Latent variables: ξ h = X h 1 u sparse and ω h = Y h 1 v sparse 6. Slope coefficients: c h = X T ξ h 1 h/ξ T h ξ h for both modes d h = Y T ξ h 1 h/ξ T h ξ h for PLSR regression mode e h = Y T ω h 1 h/ω Tω h h for PLS mode A 7. Deflation: X h = X h 1 ξ h c T for both modes h Y h = Y h 1 ξ h d T h for PLSR regression mode Y h = Y h 1 ω h e T for PLS mode A h 8. If h = H stop, else h = h + 1 and goto step 2. Big Data PLS Methods JSTAR 2016, Rennes 28/54
51 Introducing sparsity Sparsity implies many zeros in a vector or a matrix. (Credits: Jun Liu, Shuiwang Ji, and Jieping Ye) Big Data PLS Methods JSTAR 2016, Rennes 29/54
52 Introducing sparsity Let θ be the model parameters to be estimated. A commonly employed method for estimating θ is min [loss(θ) + λ penalty(θ)]. This is equivalent to the following method: min loss(θ) subject to the constraints penalty(θ) z (for some z). Example: loss(θ) = 0.5 θ v 2 2 for some fixed vector v. Big Data PLS Methods JSTAR 2016, Rennes 30/54
53 Why does L 1 induce sparsity? Analysis in 1D (comparison with L 2 ) 0.5 (θ v) 2 + λ θ If v λ, θ = v λ If v λ, θ = v + λ Else, θ = 0 (sparsity!) 0.5 (θ v) 2 + λθ 2 θ = No sparsity here. v 1 + 2λ Nondifferentiable at 0 Differentiable at 0 Big Data PLS Methods JSTAR 2016, Rennes 31/54
54 Why does L 1 induce sparsity? Understanding from the projection Big Data PLS Methods JSTAR 2016, Rennes 32/54
55 Why does L 1 induce sparsity? Understanding from constrained optimization Big Data PLS Methods JSTAR 2016, Rennes 33/54
56 sparse PLS (spls) In spls, the optimisation problem to solve is Mh min u h v T 2 h + P u h,v F λ 1,h (u h ) + P λ2,h (v h ), h M h u h v T h 2 F = p i=1 q j=1 (m ij u ih v jh ) 2, M h = X T h Y h for each iteration h. P λ1,h (u h ) = p i=1 2λh 1 u i and P λ2,h (v h ) = q j=1 2λh 2 v i Big Data PLS Methods JSTAR 2016, Rennes 34/54
57 sparse PLS (spls) In spls, the optimisation problem to solve is Mh min u h v T 2 h + P u h,v F λ 1,h (u h ) + P λ2,h (v h ), h M h u h v T h 2 F = p i=1 q j=1 (m ij u ih v jh ) 2, M h = X T h Y h for each iteration h. P λ1,h (u h ) = p i=1 2λh 1 u i and P λ2,h (v h ) = q j=1 2λh 2 v i Iterative solution. Applying the thresholding function g soft (x, λ) = sign(x)( x λ) + to the vector Mv h componentwise to get u h. to the vector M T u h componentwise to get v h. Big Data PLS Methods JSTAR 2016, Rennes 34/54
58 group PLS (gpls) X and Y can be divided respectively into K and L sub-matrices (groups) X (k) : n p k and Y (l) : n q l. Same idea as Yuan and Lin (2006), we use group lasso penalties: K L P λ1 (u) = λ 1 pk u (k) 2 and P λ2 (v) = λ 2 ql v (l) 2, k=1 where u (k) (resp. v (l) ) is the weight vector associated to the k-th (resp. l-th) block. l=1 In gpls, the optimisation problem to solve is K L M (k,l) u (k) v (l)t 2 + P λ 1 (u) + P λ2 (v), F k=1 l=1 M (k,l) = X (k) Y (l)t. Remark if the k-th block is composed by only one variable then u (k) 2 = (u (k) ) 2 = u (k). Big Data PLS Methods JSTAR 2016, Rennes 35/54
59 group PLS (gpls) Previous objective function can be written as K k=1 { M (k, ) u (k) v T 2 F + λ } 1 pk u (k) 2 + Pλ2 (v) where M (k, ) = X (k) Y T. We can optimize (for v fixed) over groupwise components of u separately. First term above expands as: trace[m (k, ) M (k, )T ] 2trace[u (k) v T M (k, )T ] + trace[u (k) u (k)t ] Optimal u (k) thus optimizes trace[u (k) u (k)t ] 2trace[u (k) v T M (k, )T ] + λ 1 pk u (k) 2. This objective function is convex, so the optimal solution is characterized by subgradient equations (subdifferential equals to 0). Big Data PLS Methods JSTAR 2016, Rennes 36/54
60 Subdifferential Subderivative, subgradient, and subdifferential generalize the derivative to functions which are not differentiable (e.g., x is nondifferentiable at 0). The subdifferential of a function is set-valued. Blue: convex function (nondifferentiable at x 0 ). Slope of each red line = a subderivative at x 0. The set [a, b] of all subderivatives is called the subdifferential of the function f at x 0. If f is convex and its subdifferential at x 0 contains exactly one subderivative, then f is differentiable at x 0. Big Data PLS Methods JSTAR 2016, Rennes 37/54
61 We have and f(x) f(x 0 ) a = lim x x x x 0 0 f(x) f(x 0 ) b = lim. x x + x x 0 0 Example: Consider the function f(x) = x which is convex. Then, the subdifferential at the origin is the interval [a, b] = [ 1, 1]. The subdifferential at any point x 0 < 0 is the singleton set { 1}, while the subdifferential at any point x 0 > 0 is the singleton set {1}. Big Data PLS Methods JSTAR 2016, Rennes 38/54
62 For group k, u (k) must satisfy that the subdifferential is null: 2u (k) + 2M (k, ) v = λ 1 pk θ, (4) where θ is a subgradient of u (k) 2 evaluated at u (k). So, u (k) if u (k) 0; θ = u (k) 2 {θ : θ 2 1} if u (k) = 0. We can see that subgradient equations (4) are satisfied with u (k) = 0 if M (k, ) v λ 1 pk. (5) For u (k) 0, equation (4) gives u (k) 2u (k) + 2M (k, ) v = λ 1 pk. u (k) (6) 2 Combining equations (5) and (6), we find: ( u (k) = 1 λ ) 1 pk M (k, ) v, 2 M (k, ) k = 1,..., K, (7) v 2 + where (a) + = max(a, 0). Big Data PLS Methods JSTAR 2016, Rennes 39/54
63 In the same vein, optimisation over v for a fixed u is also obtained by optimising over groupwise components: v (l) = 1 λ 2 2 ql M (,l)t u 2 We thus obtain the following theorem. + M (,l)t u, l = 1,..., L. (8) Big Data PLS Methods JSTAR 2016, Rennes 40/54
64 group PLS (gpls) Theorem 4 Solution of the group PLS optimisation problem is given by: ( u (k) = 1 λ ) 1 pk M (k, ) v (for fixed v) 2 M (k, ) v 2 and v (l) = 1 λ 2 ql M (,l)t u 2 M (,l)t u (for fixed u). Note: we will iterate until convergence of u (k) and v (l), using alternatively one of the above formulas. Big Data PLS Methods JSTAR 2016, Rennes 41/54
65 sparse group PLS: sparsity within groups Following Simon et al. (2013), we introduce sparse group lasso penalties: K P λ1 (u) = (1 α 1 )λ 1 pk u (k) 2 + α 1 λ 1 u 1, k=1 L P λ2 (v) = (1 α 2 )λ 2 ql v (l) 2 + α 2 λ 2 v 1. l=1 Big Data PLS Methods JSTAR 2016, Rennes 42/54
66 sparse group PLS (sgpls) Theorem 5 Solution of the sparse group PLS optimisation problem is given by: u (k) = 0 if g soft ( M (k, ) v, λ 1 α 1 /2 ) 2 λ 1 (1 α 1 ) p k, otherwise u (k) = 1 ( 2 gsoft M (k, ) v, λ 1 α ) 1 /2 λ 1 (1 α 1 ) g soft ( M (k, ) v, λ1 α ) 1 /2 p k g soft ( M (k, ) v, λ 1 α ) 1 /2. 2 We have v (l) = 0 if ( ) g soft M (,l)t u, λ 2 α 2 /2 λ 2 (1 α 2 ) q l 2 and v (l) = ( ) 1 ( ) 2 gsoft M (,l)t u, λ 1 α 1 /2 λ 2 (1 α 2 ) g soft M (,l)t u, λ 2 α 2 /2 q l ) g (M soft (,l)t u, λ 2 α 2 /2 2 otherwise. Similar proof (see our paper in Bioinformatics, 2016). Big Data PLS Methods JSTAR 2016, Rennes 43/54
67 R package: sgpls sgpls package implements spls, gpls and sgpls methods: Includes some functions for choosing the tuning parameters related to the predictor matrix for different sparse PLS model (regression mode). Some simple code to perform a sgpls: model.sgpls <- sgpls(x, Y, ncomp = 2, mode = "regression", keepx = c(4, 4), keepy = c(4, 4), ind.block.x = ind.block.x, ind.block.y = ind.block.y, alpha.x = c(0.5, 0.5), alpha.y = c(0.5, 0.5)) Last version also includes sparse group Discriminant Analysis. Big Data PLS Methods JSTAR 2016, Rennes 44/54
68 Regularized PLS scalable for BIG-DATA What happens in a MASSIVE DATA SET context? Big Data PLS Methods JSTAR 2016, Rennes 45/54
69 Regularized PLS scalable for BIG-DATA What happens in a MASSIVE DATA SET context? Massive datasets. The size of the data is large and analysing it takes a significant amount of time and computer memory. Emerson & Kane (2012). Dataset considered large if it exceeds 20% of the RAM (Random Access Memory) on a given machine, and massive if it exceeds 50% Big Data PLS Methods JSTAR 2016, Rennes 45/54
70 Case of a lot of observations: two massive data sets X: n p matrix and Y: n q matrix due to a large number of observations. We suppose here that n is very large, but not p nor q. Big Data PLS Methods JSTAR 2016, Rennes 46/54
71 Case of a lot of observations: two massive data sets X: n p matrix and Y: n q matrix due to a large number of observations. We suppose here that n is very large, but not p nor q. PLS algorithm mainly based on the SVD of M h 1 = X T h 1 Y h 1: Big Data PLS Methods JSTAR 2016, Rennes 46/54
72 Case of a lot of observations: two massive data sets X: n p matrix and Y: n q matrix due to a large number of observations. We suppose here that n is very large, but not p nor q. PLS algorithm mainly based on the SVD of M h 1 = X T h 1 Y h 1: Dimension of M h 1 : p q matrix!! This matrix fits into memory. But not X nor Y. Big Data PLS Methods JSTAR 2016, Rennes 46/54
73 Computation of M = X T Y by chunks G M = X T Y = X T (g) Y (g) g=1 All terms fit (successively) into memory! Big Data PLS Methods JSTAR 2016, Rennes 47/54
74 Computation of M = X T Y by chunks using R No need to load the big matrices X and Y Use memory-mapped files (called filebacking ) through the bigmemory package to allow matrices to exceed the RAM size. A big.matrix is created which supports the use of shared memory for efficiency in parallel computing. foreach: package for running in parallel the computation of M by chunks Big Data PLS Methods JSTAR 2016, Rennes 48/54
75 Computation of M = X T Y by chunks using R No need to load the big matrices X and Y Use memory-mapped files (called filebacking ) through the bigmemory package to allow matrices to exceed the RAM size. A big.matrix is created which supports the use of shared memory for efficiency in parallel computing. foreach: package for running in parallel the computation of M by chunks Regularized PLS algorithm: Computation of the components ( Scores ): Xu (n 1) and Yv (n 1) Easy to compute by chunks and store in a big.matrix object. Big Data PLS Methods JSTAR 2016, Rennes 48/54
76 Illustration of group PLS with Big-Data Simulated: X (5GB) and Y (5GB); n = 560, 000 observations, p = 400 and q = 500; Linked by two latent variables, made up of sparse linear combinations of the original variables; Both X and Y have a group structure: 20 groups of 20 variables for X and 25 groups of 20 variables for Y; Only 4 groups in each data set are relevant, 5 variables in each of these groups are not relevant. Big Data PLS Methods JSTAR 2016, Rennes 49/54
77 Figure 1: Comparison of gpls and BIG-gPLS (for small n = 1, 000) Big Data PLS Methods JSTAR 2016, Rennes 50/54
78 Figure 2: Use of BIG-gPLS. Left: small n. Right: Large n. Blue: truth. Red: Recovered. Big Data PLS Methods JSTAR 2016, Rennes 51/54
79 Regularised PLS Discriminant Analysis Categorical response variable becomes a dummy matrix in PLS algorithms: Big Data PLS Methods JSTAR 2016, Rennes 52/54
80 Concluding Remarks and Take Home Message We were able to derive a simple unified algorithm that perfoms standard, sparse, group and sparse group versions of the four classical PLS algorithms (i) (iv). (And also PLSDA.) We used big memory objects, and a simple trick that makes our procedure scalable to big data (large n). We also parallelized the code for faster computation. This will soon been made available in our new R package: bigsgpls. Eager to apply to real neuroimaging data sets! We are currently working on a batch version of this algorithm, as well as a large n and large p version of it. Big Data PLS Methods JSTAR 2016, Rennes 53/54
81 References Yuan M. and Lin Y. (2006) Model Selection and Estimation in Regression with Grouped Variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68 (1), Simon N., Friedman J., Hastie T. and Tibshirani R. (2013) A Sparse-group Lasso. Journal of Computational and Graphical Statistics, 22 (2), Liquet B., Lafaye de Micheaux P., Hejblum B. and Thiebaut R., (2016) Group and Sparse Group Partial Least Square Approaches Applied in Genomics Context. Bioinformatics, 32(1), Lafaye de Micheaux P., Liquet B. and Sutton M., A Unified Parallel Algorithm for Regularized Group PLS Scalable to Big Data (in progress). Thank you! Questions? Big Data PLS Methods JSTAR 2016, Rennes 54/54
Preface... xi. A Word to the Practitioner... xi The Organization of the Book... xi Required Software... xii Accessing the Supplementary Content...
Contents Preface... xi A Word to the Practitioner... xi The Organization of the Book... xi Required Software... xii Accessing the Supplementary Content... xii Chapter 1 Introducing Partial Least Squares...
More informationThe Degrees of Freedom of Partial Least Squares Regression
The Degrees of Freedom of Partial Least Squares Regression Dr. Nicole Krämer TU München 5th ESSEC-SUPELEC Research Workshop May 20, 2011 My talk is about...... the statistical analysis of Partial Least
More informationData envelopment analysis with missing values: an approach using neural network
IJCSNS International Journal of Computer Science and Network Security, VOL.17 No.2, February 2017 29 Data envelopment analysis with missing values: an approach using neural network B. Dalvand, F. Hosseinzadeh
More informationLecture 2. Review of Linear Regression I Statistics Statistical Methods II. Presented January 9, 2018
Review of Linear Regression I Statistics 211 - Statistical Methods II Presented January 9, 2018 Estimation of The OLS under normality the OLS Dan Gillen Department of Statistics University of California,
More informationInvestigation in to the Application of PLS in MPC Schemes
Ian David Lockhart Bogle and Michael Fairweather (Editors), Proceedings of the 22nd European Symposium on Computer Aided Process Engineering, 17-20 June 2012, London. 2012 Elsevier B.V. All rights reserved
More informationStatistical Learning Examples
Statistical Learning Examples Genevera I. Allen Statistics 640: Statistical Learning August 26, 2013 (Stat 640) Lecture 1 August 26, 2013 1 / 19 Example: Microarrays arrays High-dimensional: Goals: Measures
More informationPLS score-loading correspondence and a bi-orthogonal factorization
PLS score-loading correspondence and a bi-orthogonal factorization Rolf Ergon elemark University College P.O.Box, N-9 Porsgrunn, Norway e-mail: rolf.ergon@hit.no telephone: ++ 7 7 telefax: ++ 7 7 Published
More informationPARTIAL LEAST SQUARES: WHEN ORDINARY LEAST SQUARES REGRESSION JUST WON T WORK
PARTIAL LEAST SQUARES: WHEN ORDINARY LEAST SQUARES REGRESSION JUST WON T WORK Peter Bartell JMP Systems Engineer peter.bartell@jmp.com WHEN OLS JUST WON T WORK? OLS (Ordinary Least Squares) in JMP/JMP
More informationRobust alternatives to best linear unbiased prediction of complex traits
Robust alternatives to best linear unbiased prediction of complex traits WHY BEST LINEAR UNBIASED PREDICTION EASY TO EXPLAIN FLEXIBLE AMENDABLE WELL UNDERSTOOD FEASIBLE UNPRETENTIOUS NORMALITY IS IMPLICIT
More informationRule-based Integration of Multiple Neural Networks Evolved Based on Cellular Automata
1 Robotics Rule-based Integration of Multiple Neural Networks Evolved Based on Cellular Automata 2 Motivation Construction of mobile robot controller Evolving neural networks using genetic algorithm (Floreano,
More informationInvestigating the Concordance Relationship Between the HSA Cut Scores and the PARCC Cut Scores Using the 2016 PARCC Test Data
Investigating the Concordance Relationship Between the HSA Cut Scores and the PARCC Cut Scores Using the 2016 PARCC Test Data A Research Report Submitted to the Maryland State Department of Education (MSDE)
More informationCost-Efficiency by Arash Method in DEA
Applied Mathematical Sciences, Vol. 6, 2012, no. 104, 5179-5184 Cost-Efficiency by Arash Method in DEA Dariush Khezrimotlagh*, Zahra Mohsenpour and Shaharuddin Salleh Department of Mathematics, Faculty
More informationWorkshop on Frame Theory and Sparse Representation for Complex Data June 1, 2017
Workshop on Frame Theory and Sparse Representation for Complex Data June 1, 2017 Xiaoming Huo Georgia Institute of Technology School of industrial and systems engineering I. Statistical Dependence II.
More informationEfficiency Measurement on Banking Sector in Bangladesh
Dhaka Univ. J. Sci. 61(1): 1-5, 2013 (January) Efficiency Measurement on Banking Sector in Bangladesh Md. Rashedul Hoque * and Md. Israt Rayhan Institute of Statistical Research and Training (ISRT), Dhaka
More informationVOLTAGE STABILITY CONSTRAINED ATC COMPUTATIONS IN DEREGULATED POWER SYSTEM USING NOVEL TECHNIQUE
VOLTAGE STABILITY CONSTRAINED ATC COMPUTATIONS IN DEREGULATED POWER SYSTEM USING NOVEL TECHNIQUE P. Gopi Krishna 1 and T. Gowri Manohar 2 1 Department of Electrical and Electronics Engineering, Narayana
More informationDifferential Evolution Algorithm for Gear Ratio Optimization of Vehicles
RESEARCH ARTICLE Differential Evolution Algorithm for Gear Ratio Optimization of Vehicles İlker Küçükoğlu* *(Department of Industrial Engineering, Uludag University, Turkey) OPEN ACCESS ABSTRACT In this
More informationRegression Analysis of Count Data
Regression Analysis of Count Data A. Colin Cameron Pravin K. Trivedi Hfl CAMBRIDGE UNIVERSITY PRESS List offigures List oftables Preface Introduction 1.1 Poisson Distribution 1.2 Poisson Regression 1.3
More informationAntonio Olmos Priyalatha Govindasamy Research Methods & Statistics University of Denver
Antonio Olmos Priyalatha Govindasamy Research Methods & Statistics University of Denver American Evaluation Association Conference, Chicago, Ill, November 2015 AEA 2015, Chicago Ill 1 Paper overview Propensity
More informationCHAPTER 3 PROBLEM DEFINITION
42 CHAPTER 3 PROBLEM DEFINITION 3.1 INTRODUCTION Assemblers are often left with many components that have been inspected and found to have different quality characteristic values. If done at all, matching
More informationPredicting Solutions to the Optimal Power Flow Problem
Thomas Navidi Suvrat Bhooshan Aditya Garg Abstract Predicting Solutions to the Optimal Power Flow Problem This paper discusses an implementation of gradient boosting regression to predict the output of
More informationIMA Preprint Series # 2035
PARTITIONS FOR SPECTRAL (FINITE) VOLUME RECONSTRUCTION IN THE TETRAHEDRON By Qian-Yong Chen IMA Preprint Series # 2035 ( April 2005 ) INSTITUTE FOR MATHEMATICS AND ITS APPLICATIONS UNIVERSITY OF MINNESOTA
More informationAn Introduction to Partial Least Squares Regression
An Introduction to Partial Least Squares Regression Randall D. Tobias, SAS Institute Inc., Cary, NC Abstract Partial least squares is a popular method for soft modelling in industrial applications. This
More informationProfessor Dr. Gholamreza Nakhaeizadeh. Professor Dr. Gholamreza Nakhaeizadeh
Statistic Methods in in Data Mining Business Understanding Data Understanding Data Preparation Deployment Modelling Evaluation Data Mining Process (Part 2) 2) Professor Dr. Gholamreza Nakhaeizadeh Professor
More informationIntegrating remote sensing and ground monitoring data to improve estimation of PM 2.5 concentrations for chronic health studies
Integrating remote sensing and ground monitoring data to improve estimation of PM 2.5 concentrations for chronic health studies Chris Paciorek and Yang Liu Departments of Biostatistics and Environmental
More informationStatistics and Quantitative Analysis U4320. Segment 8 Prof. Sharyn O Halloran
Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O Halloran I. Introduction A. Overview 1. Ways to describe, summarize and display data. 2.Summary statements: Mean Standard deviation Variance
More informationImproving CERs building
Improving CERs building Getting Rid of the R² tyranny Pierre Foussier pmf@3f fr.com ISPA. San Diego. June 2010 1 Why abandon the OLS? The ordinary least squares (OLS) aims to build a CER by minimizing
More informationDomain-invariant Partial Least Squares (di-pls) Regression: A novel method for unsupervised and semi-supervised calibration model adaptation
Domain-invariant Partial Least Squares (di-pls) Regression: A novel method for unsupervised and semi-supervised calibration model adaptation R. Nikzad-Langerodi W. Zellinger E. Lughofer T. Reischer 2 S.
More informationGetting Started with Correlated Component Regression (CCR) in XLSTAT-CCR
Tutorial 1 Getting Started with Correlated Component Regression (CCR) in XLSTAT-CCR Dataset for running Correlated Component Regression This tutorial 1 is based on data provided by Michel Tenenhaus and
More informationAGENT-BASED MODELING, SIMULATION, AND CONTROL SOME APPLICATIONS IN TRANSPORTATION
AGENT-BASED MODELING, SIMULATION, AND CONTROL SOME APPLICATIONS IN TRANSPORTATION Montasir Abbas, Virginia Tech (with contributions from past and present VT-SCORES students, including: Zain Adam, Sahar
More informationA REPORT ON THE STATISTICAL CHARACTERISTICS of the Highlands Ability Battery CD
A REPORT ON THE STATISTICAL CHARACTERISTICS of the Highlands Ability Battery CD Prepared by F. Jay Breyer Jonathan Katz Michael Duran November 21, 2002 TABLE OF CONTENTS Introduction... 1 Data Determination
More informationPUBLICATIONS Silvia Ferrari February 24, 2017
PUBLICATIONS Silvia Ferrari February 24, 2017 [1] Cordeiro, G.M., Ferrari, S.L.P. (1991). A modified score test statistic having chi-squared distribution to order n 1. Biometrika, 78, 573-582. [2] Cordeiro,
More informationOptimal Power Flow Formulation in Market of Retail Wheeling
Optimal Power Flow Formulation in Market of Retail Wheeling Taiyou Yong, Student Member, IEEE Robert Lasseter, Fellow, IEEE Department of Electrical and Computer Engineering, University of Wisconsin at
More informationTire-Road Forces Estimation Using Extended Kalman Filter and Sideslip Angle Evaluation
2008 American Control Conference Westin Seattle Hotel, Seattle, Washington, USA June 11-13, 2008 FrB09.4 Tire-Road Forces Estimation Using Extended Kalman Filter and Sideslip Angle Evaluation J.Dakhlallah,S.Glaser,S.Mammar
More informationComplex Power Flow and Loss Calculation for Transmission System Nilam H. Patel 1 A.G.Patel 2 Jay Thakar 3
IJSRD International Journal for Scientific Research & Development Vol. 2, Issue 04, 2014 ISSN (online): 23210613 Nilam H. Patel 1 A.G.Patel 2 Jay Thakar 3 1 M.E. student 2,3 Assistant Professor 1,3 Merchant
More informationOPTIMIZATION STUDIES OF ENGINE FRICTION EUROPEAN GT CONFERENCE FRANKFURT/MAIN, OCTOBER 8TH, 2018
OPTIMIZATION STUDIES OF ENGINE FRICTION EUROPEAN GT CONFERENCE FRANKFURT/MAIN, OCTOBER 8TH, 2018 M.Sc. Oleg Krecker, PhD candidate, BMW B.Eng. Christoph Hiltner, Master s student, Affiliation BMW AGENDA
More informationUsing Statistics To Make Inferences 6. Wilcoxon Matched Pairs Signed Ranks Test. Wilcoxon Rank Sum Test/ Mann-Whitney Test
Using Statistics To Make Inferences 6 Summary Non-parametric tests Wilcoxon Signed Ranks Test Wilcoxon Matched Pairs Signed Ranks Test Wilcoxon Rank Sum Test/ Mann-Whitney Test Goals Perform and interpret
More informationWavelet-PLS Regression: Application to Oil Production Data
Wavelet-PLS Regression: Application to Oil Production Data Benammou Saloua 1, Kacem Zied 1, Kortas Hedi 1, and Dhifaoui Zouhaier 1 1 Computational Mathematical Laboratory, saloua.benammou@yahoo.fr 2 ZiedKacem2004@yahoo.fr
More informationCHAPTER I INTRODUCTION
CHAPTER I INTRODUCTION 1.1 GENERAL Power capacitors for use on electrical systems provide a static source of leading reactive current. Power capacitors normally consist of aluminum foil, paper, or film-insulated
More informationVoting Draft Standard
page 1 of 7 Voting Draft Standard EL-V1M4 Sections 1.7.1 and 1.7.2 March 2013 Description This proposed standard is a modification of EL-V1M4-2009-Rev1.1. The proposed changes are shown through tracking.
More informationAnalysis on natural characteristics of four-stage main transmission system in three-engine helicopter
Article ID: 18558; Draft date: 2017-06-12 23:31 Analysis on natural characteristics of four-stage main transmission system in three-engine helicopter Yuan Chen 1, Ru-peng Zhu 2, Ye-ping Xiong 3, Guang-hu
More informationRegularized Linear Models in Stacked Generalization
Regularized Linear Models in Stacked Generalization Sam Reid and Greg Grudic Department of Computer Science University of Colorado at Boulder USA June 11, 2009 Reid & Grudic (Univ. of Colo. at Boulder)
More informationTopic 5 Lecture 3 Estimating Policy Effects via the Simple Linear. Regression Model (SLRM) and the Ordinary Least Squares (OLS) Method
Econometrics for Health Policy, Health Economics, and Outcomes Research Topic 5 Lecture 3 Estimating Policy Effects via the Simple Linear Regression Model (SLRM) and the Ordinary Least Squares (OLS) Method
More informationThe DPM Detector. Code:
The DPM Detector P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan Object Detection with Discriminatively Trained Part Based Models T-PAMI, 2010 Paper: http://cs.brown.edu/~pff/papers/lsvm-pami.pdf
More informationSteady-State Power System Security Analysis with PowerWorld Simulator
Steady-State Power System Security Analysis with PowerWorld Simulator using PowerWorld Simulator 2001 South First Street Champaign, Illinois 61820 +1 (217) 384.6330 support@powerworld.com http://www.powerworld.com
More informationA Personalized Highway Driving Assistance System
A Personalized Highway Driving Assistance System Saina Ramyar 1 Dr. Abdollah Homaifar 1 1 ACIT Institute North Carolina A&T State University March, 2017 aina Ramyar, Dr. Abdollah Homaifar (NCAT) A Personalized
More informationLinking the Florida Standards Assessments (FSA) to NWEA MAP
Linking the Florida Standards Assessments (FSA) to NWEA MAP October 2016 Introduction Northwest Evaluation Association (NWEA ) is committed to providing partners with useful tools to help make inferences
More informationInventory systems for dependent demand
Roberto Cigolini roberto.cigolini@polimi.it Department of Management, Economics and Industrial Engineering Politecnico di Milano 1 Overall view (taxonomy) Planning systems Push systems (needs based) (requirements
More informationPARTIAL LEAST SQUARES: APPLICATION IN CLASSIFICATION AND MULTIVARIABLE PROCESS DYNAMICS IDENTIFICATION
PARIAL LEAS SQUARES: APPLICAION IN CLASSIFICAION AND MULIVARIABLE PROCESS DYNAMICS IDENIFICAION Seshu K. Damarla Department of Chemical Engineering National Institute of echnology, Rourkela, India E-mail:
More informationPassenger density and flow analysis and city zones and bus stops classification for public bus service management
Passenger density and flow analysis and city zones and bus stops classification for public bus service management Raul S. Barth, Renata Galante 1 Instituto de Informática Universidade Federal do Rio Grande
More informationACCIDENT MODIFICATION FACTORS FOR MEDIAN WIDTH
APPENDIX G ACCIDENT MODIFICATION FACTORS FOR MEDIAN WIDTH INTRODUCTION Studies on the effect of median width have shown that increasing width reduces crossmedian crashes, but the amount of reduction varies
More informationLinking the Alaska AMP Assessments to NWEA MAP Tests
Linking the Alaska AMP Assessments to NWEA MAP Tests February 2016 Introduction Northwest Evaluation Association (NWEA ) is committed to providing partners with useful tools to help make inferences from
More informationBayesian Trajectory Optimization for Magnetic Resonance Imaging Sequences
Bayesian Trajectory Optimization for Magnetic Resonance Imaging Sequences Matthias Seeger Saarland University and MPI for Informatics, Saarbrücken Joint work with Hannes Nickisch, Rolf Pohmann, Bernhard
More informationSuburban bus route design
University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2013 Suburban bus route design Shuaian Wang University
More informationCapacity-Achieving Accumulate-Repeat-Accumulate Codes for the BEC with Bounded Complexity
Capacity-Achieving Accumulate-Repeat-Accumulate Codes for the BEC with Bounded Complexity Igal Sason 1 and Henry D. Pfister 2 Department of Electrical Engineering 1 Techion Institute, Haifa, Israel Department
More informationOnline Appendix for Subways, Strikes, and Slowdowns: The Impacts of Public Transit on Traffic Congestion
Online Appendix for Subways, Strikes, and Slowdowns: The Impacts of Public Transit on Traffic Congestion ByMICHAELL.ANDERSON AI. Mathematical Appendix Distance to nearest bus line: Suppose that bus lines
More informationLinking the Mississippi Assessment Program to NWEA MAP Tests
Linking the Mississippi Assessment Program to NWEA MAP Tests February 2017 Introduction Northwest Evaluation Association (NWEA ) is committed to providing partners with useful tools to help make inferences
More informationDiscovery of Design Methodologies. Integration. Multi-disciplinary Design Problems
Discovery of Design Methodologies for the Integration of Multi-disciplinary Design Problems Cirrus Shakeri Worcester Polytechnic Institute November 4, 1998 Worcester Polytechnic Institute Contents The
More informationOnline Learning and Optimization for Smart Power Grid
1 2016 IEEE PES General Meeting Panel on Domain-Specific Big Data Analytics Tools in Power Systems Online Learning and Optimization for Smart Power Grid Seung-Jun Kim Department of Computer Sci. and Electrical
More informationStatistical Estimation Model for Product Quality of Petroleum
Memoirs of the Faculty of Engineering,, Vol.40, pp.9-15, January, 2006 TakashiNukina Masami Konishi Division of Industrial Innovation Sciences The Graduate School of Natural Science and Technology Tatsushi
More informationComputer Aided Transient Stability Analysis
Journal of Computer Science 3 (3): 149-153, 2007 ISSN 1549-3636 2007 Science Publications Corresponding Author: Computer Aided Transient Stability Analysis Nihad M. Al-Rawi, Afaneen Anwar and Ahmed Muhsin
More informationLinking the Virginia SOL Assessments to NWEA MAP Growth Tests *
Linking the Virginia SOL Assessments to NWEA MAP Growth Tests * *As of June 2017 Measures of Academic Progress (MAP ) is known as MAP Growth. March 2016 Introduction Northwest Evaluation Association (NWEA
More informationLinking the Georgia Milestones Assessments to NWEA MAP Growth Tests *
Linking the Georgia Milestones Assessments to NWEA MAP Growth Tests * *As of June 2017 Measures of Academic Progress (MAP ) is known as MAP Growth. February 2016 Introduction Northwest Evaluation Association
More informationPublished: 14 October 2014
Electronic Journal of Applied Statistical Analysis EJASA, Electron. J. App. Stat. Anal. http://siba-ese.unisalento.it/index.php/ejasa/index e-issn: 2070-5948 DOI: 10.1285/i20705948v7n2p343 A note on ridge
More informationAnalyzing Crash Risk Using Automatic Traffic Recorder Speed Data
Analyzing Crash Risk Using Automatic Traffic Recorder Speed Data Thomas B. Stout Center for Transportation Research and Education Iowa State University 2901 S. Loop Drive Ames, IA 50010 stouttom@iastate.edu
More informationSmart Operation for AC Distribution Infrastructure Involving Hybrid Renewable Energy Sources
Milano (Italy) August 28 - September 2, 211 Smart Operation for AC Distribution Infrastructure Involving Hybrid Renewable Energy Sources Ahmed A Mohamed, Mohamed A Elshaer and Osama A Mohammed Energy Systems
More informationExample #1: One-Way Independent Groups Design. An example based on a study by Forster, Liberman and Friedman (2004) from the
Example #1: One-Way Independent Groups Design An example based on a study by Forster, Liberman and Friedman (2004) from the Journal of Personality and Social Psychology illustrates the SAS/IML program
More informationLinking the North Carolina EOG Assessments to NWEA MAP Growth Tests *
Linking the North Carolina EOG Assessments to NWEA MAP Growth Tests * *As of June 2017 Measures of Academic Progress (MAP ) is known as MAP Growth. March 2016 Introduction Northwest Evaluation Association
More informationLinking the Indiana ISTEP+ Assessments to NWEA MAP Tests
Linking the Indiana ISTEP+ Assessments to NWEA MAP Tests February 2017 Introduction Northwest Evaluation Association (NWEA ) is committed to providing partners with useful tools to help make inferences
More information9.2 User s Guide SAS/STAT. The PLS Procedure. (Book Excerpt) SAS Documentation
SAS/STAT 9.2 User s Guide The PLS Procedure (Book Excerpt) SAS Documentation This document is an individual chapter from SAS/STAT 9.2 User s Guide. The correct bibliographic citation for the complete manual
More informationLinking the Kansas KAP Assessments to NWEA MAP Growth Tests *
Linking the Kansas KAP Assessments to NWEA MAP Growth Tests * *As of June 2017 Measures of Academic Progress (MAP ) is known as MAP Growth. February 2016 Introduction Northwest Evaluation Association (NWEA
More informationThe Brake Assist System
Service. Self-study programme 264 The Brake Assist System Design and function Accident statistics show that in 1999 alone, 493,527 accidents in Germany were caused by driver error. Many accidents caused
More informationRotorcraft Gearbox Foundation Design by a Network of Optimizations
13th AIAA/ISSMO Multidisciplinary Analysis Optimization Conference 13-15 September 2010, Fort Worth, Texas AIAA 2010-9310 Rotorcraft Gearbox Foundation Design by a Network of Optimizations Geng Zhang 1
More informationLinking the New York State NYSTP Assessments to NWEA MAP Growth Tests *
Linking the New York State NYSTP Assessments to NWEA MAP Growth Tests * *As of June 2017 Measures of Academic Progress (MAP ) is known as MAP Growth. March 2016 Introduction Northwest Evaluation Association
More informationBenchmarking Inefficient Decision Making Units in DEA
J. Basic. Appl. Sci. Res., 2(12)12056-12065, 2012 2012, TextRoad Publication ISSN 2090-4304 Journal of Basic and Applied Scientific Research www.textroad.com Benchmarking Inefficient Decision Making Units
More informationA UNIFYING VIEW ON MULTI-STEP FORECASTING USING AN AUTOREGRESSION
doi: 10.1111/j.1467-6419.2009.00581.x A UNIFYING VIEW ON MULTI-STEP FORECASTING USING AN AUTOREGRESSION Philip Hans Franses and Rianne Legerstee Econometric Institute and Tinbergen Institute, Erasmus University
More informationIntelligent Fault Analysis in Electrical Power Grids
Intelligent Fault Analysis in Electrical Power Grids Biswarup Bhattacharya (University of Southern California) & Abhishek Sinha (Adobe Systems Incorporated) 2017 11 08 Overview Introduction Dataset Forecasting
More informationPropeller Blade Bearings for Aircraft Open Rotor Engine
NTN TECHNICAL REVIEW No.84(2016) [ New Product ] Guillaume LEFORT* The Propeller Blade Bearings for Open Rotor Engine SAGE2 were developed by NTN-SNR in the frame of the Clean Sky aerospace programme.
More informationWHITE PAPER. Preventing Collisions and Reducing Fleet Costs While Using the Zendrive Dashboard
WHITE PAPER Preventing Collisions and Reducing Fleet Costs While Using the Zendrive Dashboard August 2017 Introduction The term accident, even in a collision sense, often has the connotation of being an
More informationSharif University of Technology. Graduate School of Management and Economics. Econometrics I. Fall Seyed Mahdi Barakchian
Sharif University of Technology Graduate School of Management and Economics Econometrics I Fall 2010 Seyed Mahdi Barakchian Textbook: Wooldridge, J., Introductory Econometrics: A Modern Approach, South
More informationIRT Models for Polytomous Response Data
IRT Models for Polytomous Response Data Lecture #4 ICPSR Item Response Theory Workshop Lecture #4: 1of 53 Lecture Overview Big Picture Overview Framing Item Response Theory as a generalized latent variable
More informationFrom Developing Credit Risk Models Using SAS Enterprise Miner and SAS/STAT. Full book available for purchase here.
From Developing Credit Risk Models Using SAS Enterprise Miner and SAS/STAT. Full book available for purchase here. About this Book... ix About the Author... xiii Acknowledgments...xv Chapter 1 Introduction...
More informationRisk factors, driver behaviour and accident probability. The case of distracted driving.
Risk factors, driver behaviour and accident probability. The case of distracted driving. Panagiotis Papantoniou PhD, Civil - Transportation Engineer National Technical University of Athens Vienna, June
More informationREDUCING THE OCCURRENCES AND IMPACT OF FREIGHT TRAIN DERAILMENTS
REDUCING THE OCCURRENCES AND IMPACT OF FREIGHT TRAIN DERAILMENTS D-Rail Final Workshop 12 th November - Stockholm Monitoring and supervision concepts and techniques for derailments investigation Antonella
More informationCHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION 1.1 CONSERVATION OF ENERGY Conservation of electrical energy is a vital area, which is being regarded as one of the global objectives. Along with economic scheduling in generation
More informationSupplementary file related to the paper titled On the Design and Deployment of RFID Assisted Navigation Systems for VANET
Supplementary file related to the paper titled On the Design and Deployment of RFID Assisted Navigation Systems for VANET SUPPLEMENTARY FILE RELATED TO SECTION 3: RFID ASSISTED NAVIGATION SYS- TEM MODEL
More informationThe Assist Curve Design for Electric Power Steering System Qinghe Liu1, a, Weiguang Kong2, b and Tao Li3, c
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 26) The Assist Curve Design for Electric Power Steering System Qinghe Liu, a, Weiguang Kong2, b and
More informationSupervised Learning to Predict Human Driver Merging Behavior
Supervised Learning to Predict Human Driver Merging Behavior Derek Phillips, Alexander Lin {djp42, alin719}@stanford.edu June 7, 2016 Abstract This paper uses the supervised learning techniques of linear
More informationStudent-Level Growth Estimates for the SAT Suite of Assessments
Student-Level Growth Estimates for the SAT Suite of Assessments YoungKoung Kim, Tim Moses and Xiuyuan Zhang November 2017 Disclaimer: This report is a pre-published version. The version that will eventually
More informationAn Integrated Process for FDIR Design in Aerospace
An Integrated Process for FDIR Design in Aerospace Fondazione Bruno Kessler, Trento, Italy Benjamin Bittner, Marco Bozzano, Alessandro Cimatti, Marco Gario Thales Alenia Space,France Regis de Ferluc Thales
More informationA First Principles-based Li-Ion Battery Performance and Life Prediction Model Based on Reformulated Model Equations NASA Battery Workshop
A First Principles-based Li-Ion Battery Performance and Life Prediction Model Based on Reformulated Model Equations NASA Battery Workshop Huntsville, Alabama November 17-19, 19, 2009 by Gerald Halpert
More informationExperimental Study on Torsional Vibration of Transmission System Under Engine Excitation Xin YANG*, Tie-shan ZHANG and Nan-lin LEI
217 3rd International Conference on Applied Mechanics and Mechanical Automation (AMMA 217) ISBN: 978-1-6595-479- Experimental Study on Torsional Vibration of Transmission System Under Engine Excitation
More informationSTUDY OF AIRBAG EFFECTIVENESS IN HIGH SEVERITY FRONTAL CRASHES
STUDY OF AIRBAG EFFECTIVENESS IN HIGH SEVERITY FRONTAL CRASHES Jeya Padmanaban (JP Research, Inc., Mountain View, CA, USA) Vitaly Eyges (JP Research, Inc., Mountain View, CA, USA) ABSTRACT The primary
More information2010 Journal of Industrial Ecology
21 Journal of Industrial Ecology www.wiley.com/go/jie Subramanian, R., B. Talbot, and S. Gupta. 21. An approach to integrating environmental considerations within managerial decisionmaking. Journal of
More informationSPEED AND TORQUE CONTROL OF AN INDUCTION MOTOR WITH ANN BASED DTC
SPEED AND TORQUE CONTROL OF AN INDUCTION MOTOR WITH ANN BASED DTC Fatih Korkmaz Department of Electric-Electronic Engineering, Çankırı Karatekin University, Uluyazı Kampüsü, Çankırı, Turkey ABSTRACT Due
More informationPOWER FLOW SIMULATION AND ANALYSIS
1.0 Introduction Power flow analysis (also commonly referred to as load flow analysis) is one of the most common studies in power system engineering. We are already aware that the power system is made
More informationAdaptive Routing and Recharging Policies for Electric Vehicles
Adaptive Routing and Recharging Policies for Electric Vehicles Timothy M. Sweda, Irina S. Dolinskaya, Diego Klabjan Department of Industrial Engineering and Management Sciences Northwestern University
More informationSecond Generation of Pollutant Emission Models for SUMO
Second Generation of Pollutant for SUMO Daniel Krajzewicz, Stefan Hausberger, Mario Krumnow, Michael Behrisch; SUMO 2014 Conference Institut für Verkehrssystemtechnik www.dlr.de Folie 2 > Institut für
More informationModule 9. DC Machines. Version 2 EE IIT, Kharagpur
Module 9 DC Machines Lesson 38 D.C Generators Contents 38 D.C Generators (Lesson-38) 4 38.1 Goals of the lesson.. 4 38.2 Generator types & characteristics.... 4 38.2.1 Characteristics of a separately excited
More information8. Supplementary Material
8. Supplementary Material 8.1. Proofs Proof of Proposition 1 Proof. (R2) = (R3): As for both (R2) and (R3) we have µ L and µ ( s) = µ s (u), s X u, we only need to prove that µ s conv(x s ), s L is equivalent
More informationRotor Position Detection of CPPM Belt Starter Generator with Trapezoidal Back EMF using Six Hall Sensors
Journal of Magnetics 21(2), 173-178 (2016) ISSN (Print) 1226-1750 ISSN (Online) 2233-6656 http://dx.doi.org/10.4283/jmag.2016.21.2.173 Rotor Position Detection of CPPM Belt Starter Generator with Trapezoidal
More information