The use of PARAFAC in the analysis of CDOM fluorescence Kate Murphy 1,2 1. Smithsonian Environmental Research Center, Edgewater USA 2. The University of New South Wales, Dept. of Civil and Environmental Engineering, Sydney Australia
Challenges in CDOM fluorescence research Can different sources of DOM be reliably distinguished on the basis of fluorescence? Which chemical constituents contribute to CDOM fluorescence? How do environmental variables (e.g ph, temp) and processes (e.g. photodegradation) affect fluorescence spectra?
EEMs (Excitation-Emission Matrices) M Emission Wavelength (nm)
What is PARAFAC? Chemometrics decomposition method utilizing ALS algorithms to estimate the underlying structure of a multiway dataset How does it work?
Multiway Data Structure Component 1 Component 2 Component N
PARAFAC model F Emission wavelength (nm) 55 45 35 EEM 3 24 26 28 3 32 34 36 38 42 44 Excitation wavelength (nm) 7 6 5 4 3 2 1 x ijk = a if b jf c kf + e ijk a concentration b emission spectrum Loading.3.2.1 3 35 45 wavelength (nm) c excitation spectrum Loading.6.4.2 24 29 34 wavelength (nm)
Model hierarchy Principal Components Analysis (PCA) (bilinear / 2-way) apply constraints reduce complexity reduce degrees of freedom reduce fit Parallel Factor Analysis (PARAFAC) (trilinear / 3-way)
9 component PARAFAC model 3 25 3 35 45 3 253 35 45 25 3 35 45 253 3545 3 25 3 35 45 3 253 35 45 C1 C2 C3 25 3 35 45 C4 C5 C6 3 25 3 35 45 C7 C8 C9 3 25 3 35 45
Raw Data Model Residuals 1.5 55 7 6 55 7 6 55 1 5 5.5 45 4 3 = 45 4 3 + 45 -.5 35 2 1 35 2 1 35-1 3 24 26 28 3 32 34 36 38 42 44 3 24 26 28 3 32 34 36 38 42 44 3 24 26 28 3 32 34 36 38 42 44-1.5 8 6 Conc. 4 2 1 2 3 4 5 6 7 8 9 Component
Assumptions PARAFAC assumes that: 1. Data structure is approximately trilinear fluorescence increases linearly with concentration Emission spectra doesn t change with excitation wavelength, and vice versa 2. Additivity: Fluorescence results from the linear superposition of N individual fluorescent components determine N by trial and error (or know in advance) 3. Uniqueness: No two components have identical spectra PARAFAC makes NO assumptions about: 1. Spectral shapes 2. Number of components 3. Structure of parameters and error terms
Advantages of PARAFAC Unique solution (with few exceptions) pure spectra are recovered concentrations can be estimated for each component cf. rotational freedom in PCA means external information is needed to recover spectra and concentrations Fully exploits the 2 nd order advantage can estimate the concentration of an analyte in an unknown mixture in the presence of uncalibrated interferents Easily interpreted
Modelling with PARAFAC 1. Pre-treatment Center and Scale Remove or down-weight scatter (Raman, Rayleigh) 2. Calibration Apply constraints (non-negativity, unimodality) Choose number of components Software: Matlab N-way toolbox or PLS toolbox 3. Validation 4. Interpretation
Validation 1. Examine residuals 2. Split-half analysis (due to uniqueness) 3. Core consistency, cross-validation, influence plots 4. Compare models of different datasets
Split-half analysis Emission wavelength (nm) (contours) / Loadings (line plots) 45 35 45 35 45 35 45 35 P1 P3 P5 P7 3 35 45.2.1.3.2.1.6.4.2.3.2.1 3 45 35 45 35 45 35 45 35 P2 P4 P6.6.4.2.4.2.4.2.2 3 Excitation wavelength (nm) (contours) / Wavelength (nm) (line plots) P8 3 35 45
Validation of dye spectra Fluorescence [QSE] Fluorescence [QSE] 7 6 5 4 3 2 1 25 3 35 45 Excitation Wavelength (nm) 7 6 5 4 3 2 PARAFAC sample Emission sample PARAFAC Excitation C5 fluorescence maximum [QSE] 12 1 8 6 4 2 Scores vs. Concentration 2 4 6 8 1 12 F 255/58 [QSE] 1 3 35 45 55 Emission Wavelength (nm)
Validation of protein constituents loading.8.6.4 Ex (C1) Em (C1) Ex (tyrosine) Em (tyrosine) 55 45 C1.2 35 loading 2 25 3 35 45 55 wavelength (nm).5.4.3.2.1 Ex (C6) Em (C6) Ex (Tryptophan) Em (Tryptophan) 2 25 3 35 45 55 wavelength (nm) 3 25 3 35 45 55 45 35 C6 3 25 3 35 45
Inter-model comparisons loading.4.35.3.25.2.15.1.5 Ex (C7) Em (C7) Ex (S&M*) Em (S&M*) 2 25 3 35 45 55 wavelength (nm) 55 45 35 C7 3 25 3 35 45 (S&M* = Stedmon & Markager (in press). Marine Chemistry.)
Inter-model comparisons Loading.5 C1 c.f. P2+P5.2.1 C2 c.f. P1 3 3 Loading.2 C3 c.f. P3.4 C6 c.f. P6+P7.2 3 3 Loading C8 c.f. P8 Kauai model.2 BWE7 model 3 Excitation (LHS) and emission (RHS) wavelength (nm)
Example: Ships ballast water
Sampling Effort 45 N Pacific Ocean Port Survey Cruise 8nmi transect 45 S 9 E 135 E 18 E 135 W 9 W
B PAH 9 component PARAFAC model 3 25 3 35 45 25 3 35 45 3 253 35 45 M 253 3545 C1 C2 C3 25 3 35 45 C4 C5 C6 3 25 3 35 45 A, C T Rhodamine WT dye? 3 25 3 35 45 3 253 35 45 C7 C8 C9 A, C 3 25 3 35 45 Model used EEMs from >7 samples of seawater and ballast water?
Humic-like fluorescence C3 3 25 3 35 45 fluorescence ratio relative to C3 1.5 1..5. -2 2-1 1-25 25-5 5-1 1-2 ocean C2 3 25 3 35 45 C8 3 253 35 45 distance to land (nautical miles) 6 N 45 N K 4 N Fos 3 N 2 N BN 24 N 15 N 165 W 22 N 2 N 16 W 158 W 156 W 154 W 15 W 135 W 12 W 15 W 8 W 6 W 4 W 2 W
Decoupling between C2 and C3 fluorescence log(c2) 2 1-1 -2 A Harbor Coast Shelf Ocean -2-1 1 2 log(c3) log(c2) 2 1-1 (iii) (ii) (i) B (iv) -2-2 -1 1 2 log(c3) 4 3 C 4 3 (iv) D C2/C3 2 C2/C3 2 (ii) 1 5 1 C2 1 (i) (iii) 5 1 C2 (A) At low C3 concentration, C2 concentrations frequently lie above the conservative dilution curve; (B) the ratio of C2/C3 in seawater is independent of C2 at high concentrations, but at low concentrations, it is driven by the concentration of C2; (C & D) modeled relationships assuming dilution only (i), dilution and increased removal of C3 (ii), dilution and constant production of C2 (iii), or dilution and heterogeneous but generally increasing production of C2 (iv).
Interpretation of PARAFAC models 12 C3*: 37/494 Concentration (QSE) 1 8 6 4 C3 3 25 3 35 45 2.7 ppb.1.1 1 1 1 1 Distance to Land
Protein-like fluorescence C1 C6 C7 F (qse) Rott. NS/EC E.Shlf B.Bisc NEAtlc TropAt BrzShf SaoLui 55 C7 45 35 3 25 3 35 45
Public Resources Educational Materials Chemometrics group of the Faculty of Life Sciences at the University of Copenhagen - www.models.life.ku.dk. Information on meetings, symposia, new books, software Downloadable datasets (including fluorescence of amino acids, fish muscle, parma ham, yoghurt,.) Web-based tutorials, interactive internet courses, graphical illustrations (movies)
Public Resources Spectral database www.models.life.ku.dk Guidelines for fluorescence spectral correction and calibration procedures. For a range of compounds and IHSS humic standards, ASCII files containing: currently published and available DOM PARAFAC components carbon specific absorption spectra of the individual compounds carbon specific fluorescence excitation emission matrices (EEMs) Small datasets of DOM fluorescence for use in PARAFAC tutorials.
Acknowledgements Funding: The University of Birmingham (Fluoronet), USCG Research & Development Center, Columbia River Aquatic Nuisance Species Initiative (CRANSI), California State Lands Commission, New Zealand Ministry of Fisheries Host Shipping Companies: NYK Bulkship (USA) LTD., Gateway Maritime Corp. / Sincere Industrial Corp., Matson Navigation Company, Bergesen DY ASA., Sea River Maritime, the Alaska Tanker Company, BP Amoco PLC and Krupp Seeschiffahrt GmbH Analyses: University of S. Florida, University of Maine, Portland State University, Denmark National Environmental Research Institute PARAFAC: Thanks to Colin Stedmon for sharing his PARAFAC spectra with me