## General metabolite data processing guide

*Purpose*

This guide is intended to help with processing non-isotopically labeled intracellular and extracellular LCMS metabolite data. The guide assumes you are working on data already extracted into an accessible format, like excel (for a guide on how to do this, see **DAN Lab Maven User Guide**). The guide has a companion excel file “metabolite data processing walkthrough.xlsx” where the methods discussed here are used on a small, simulated set of data for illustrative purposes. The text will refer to that file in parentheses, so (1D) in the text of this document indicates Sheet 1, Section D in the excel sheet.

* *

*Introduction*

As with many sorts of data, the raw intensity values you have extracted for each metabolite are not the easiest or best way to compare metabolite levels under different conditions. Instead, averaging data from multiple replicates or calculating fold changes between conditions or strains can allow better qualitative comparison. Calculating measures of uncertainty (such as variance or standard deviation) can allow quantitative comparison of conditions or strains by enabling the determination of confidence intervals and the application of hypothesis tests (like ANOVAs or T-tests).

Unfortunately, mass spectrometry data is not normally distributed in linear space, which makes calculating and comparing the average, variance, standard deviation, and other statistics from the raw data invalid. The lack of a normal distribution also makes the use of parametric hypothesis tests (ANOVAs and T-tests are parametric) inappropriate. Fortunately, log-transformation of LCMS data is assumed to restore a normal distribution, allowing these tests to be used, though non-parametric tests (such as Kruskal-Wallis) on the non-transformed data are also appropriate.

Although log transformation or non-parametric testing allows hypothesis testing to be applied to metabolite measurements, the ability of broad spectrum metabolomics to recover data for a large number of metabolites simultaneously introduces a caveat to the significance of results. Because any given dataset contains so many metabolites that each have a separate hypothesis test performed on them (if using individual tests like T-tests), there is a high possibility of false positive results. For example, if using T-tests with a p-value cutoff of 0.05, then each test performed has a 5% chance to show a significant result even if there is none. Performing many tests exacerbates this problem, so hypothesis testing statistics from metabolomics data should often be corrected to counteract this false discovery rate (FDR).

This guide will walk you through log transformation of the data to restore a normal distribution, then will show you how to determine fold changes, averages, variance, and standard deviation. Next, the guide will describe how to apply the calculated statistics to hypothesis testing and determining confidence intervals. Finally, the guide will demonstrate a few methods for correcting for FDR after hypothesis testing.

*Log transformation*

Because data from mass spectrometry are not normally distributed in linear space, they do not meet the necessary assumptions required for typical hypothesis testing techniques (ANOVAs, T-tests, etc.) or determining confidence intervals. Fortunately, log-transformation of the data restores a normal distribution, allowing these tests to be used, though non-parametric tests (such as Kruskal-Wallis) on the non-transformed data are also appropriate. To log transform your data, take the log of the raw intensity value (1A) for the base peak (aka C12 PARENT or M+0) of a given metabolite to log transform your data (1B). The base of the logarithm does not matter but will be important to keep in mind when interpreting fold changes. Common logs to use are base 2, the natural log, and base 10. Directly take the arithmetic mean of the log-transformed data (1C) to get the average in log space; these can be converted back to linear space using exponents (see “average in linear space values” values in Section 1G). Note that the arithmetic mean of log-transformed data, once converted back to linear space, is equivalent to the geometric mean of the underlying data (the aforementioned values in 1G are converted from the averaged logs, but equivalent values can be achieved using the =geomean() function on the raw data). Generally, you should calculate statistical values in whatever space (log space, linear space, etc.) the data are normally distributed in, if possible. Because of this, the arithmetic mean in linear space would be more appropriate to use if the data you are working with are normally distributed in linear space.

*Fold changes*

Fold changes can be easily calculated from log-transformed data. Subtract the average of the log-transformed value for one condition from the average in the other condition (1H). The result is the log-fold change (of whichever base you used). Negative values indicate fold division rather than fold multiplication. These can be converted to linear space values using exponents (with the base you used for your log-transformation) (1H). Note that the absolute value function (=abs()) was used during conversion to linear space so that the fold change would always be in the positive direction – the result of ~16.7 for glucose-6-phosphate in the spreadsheet is really a 16.7-fold lower concentration of glucose-6-phosphate in condition 1 relative to condition 2. Without using the absolute value function, the result returned would be ~0.0598, which is equivalent, but harder to interpret by eye.

* *

*Variance, confidence intervals, *and *hypothesis testing*

In order to calculate accurate confidence intervals and perform hypothesis testing, you need to know how close to the average your data points typically lie, also known as uncertainty. Variance, standard deviation (SD), and standard error (SE), are three closely linked measures of uncertainty. Because your data are not normally distributed except in log space, we will first calculate these measures in log space, then use them to generate confidence intervals and perform hypothesis tests. The variance of the data in log space (1D) can be determined by the formula: , where R_{i} = replicate i, R_{avg} = average of all replicates, and N = the number of replicates. Because SD is equivalent to the square root of the variance, you can take the square root of this result to determine the SD, which is often used for error bars when graphing. Another way to calculate SD if using excel is by using the =STDEV.S() function on the log-transformed data (1E); the variance can be calculated by squaring the result.

The third measure of uncertainty we will discuss here, the standard error (SE), is needed to calculate confidence intervals and is also sometimes used to generate error bars when graphing data. The SE can be determined by dividing the variance by the number of replicates and taking the square root of the result (1F). Confidence intervals for the real average values in linear space can be calculated by multiplying the standard error by 1.96 (for 95% confidence interval), then adding (for max) or subtracting (for min) the result from the average of the log-transformed values. An exponent (with the base you used for your log-transformation), raised to the power of this result will return the linear-space value for the confidence intervals (1G). Hypothesis testing, such as a T-test, can be done directly on the log-transformed data (1I). Confidence intervals for fold changes can be determined by using the 95% confidence interval maximums and minimums determined in 1G to calculate the fold change max and min (1J).

While the example given here uses T-tests, ANOVAs are sometimes more appropriate statistical tests to apply to your data. T-tests only allow comparison of two means at a time, so if you have more than two means (such as an experiment where you compare a wild-type strain to two mutant strains), using an ANOVA can allow simultaneous comparison of all groups, which also lowers the chance of false-positives from performing many T-tests (see *Correcting for false discovery rate (FDR) and Family-wise Error Rate (FWER) *section below).

To perform an ANOVA in Excel, you need to have the Analysis Toolpak Add-in activated. Activate the Analysis Toolpak using one of the following two options:

**Option 1: **File < Options < Add-ins. Select “Analysis ToolPak”. Hit OK to activate it.

**Option 2: **If you have a “Developer” tab, go Developer < Excel Add-ins, then select Analysis ToolPak and hit OK.

Once you have the Analysis Toolpak installed, you’ll have a new “Data Analysis” button/drop-down menu option.

To run a **single-factor** ANOVA (where you are testing only one variable, such as comparing a wild type to several mutants):

- Data < Data Analysis. This should open a box of test options.

IMAGE 1

- Select “Anova: Single Factor”.

IMAGE 2

- Input range: your data. If it includes headings, check “Labels in first row”. Don’t include any other labels. My data is in columns, so I have that box checked. You can also have your data in rows. For the output range, just highlight some boxes wherever you want the output printed. It will fill from the upper left, size of your selected grid doesn’t matter. Hit OK!

IMAGE 3

- TAH-DAH!

Here, you can see that the P-value between groups is less than 0.05, meaning that at least one group of data is significantly different than the rest. Do pairwise t-tests to determine which groups have significant differences. Here, you’ll find that group C is significantly different than all other groups.

IMAGE 4

- If you have two independent variables (for example, you compared multiple strains AND multiple growth conditions), you can run a two-factor ANOVA in Excel using the other ANOVA functions in the data analysis tab. Use “without replication” if you have only one replicate for each sample (i.e. one data point for each combination of your two variables), and “with replication” if you have multiple replicates for each data point.

If you want to compare many dependent variables simultaneously (such as the concentrations of ATP, glucose-6-phosphate, succinate, and glutamate) across your populations, a MANOVA test can be used. If you used an ANOVA or MANOVA to compare multiple groups and got a significant result, you will only know that some of the means from your dataset differ, but will not know which means are significantly different. Pairwise testing, such as T-tests or one-way ANOVAs, will still be necessary to identify which specific means differ.

* *

*Correcting for false discovery rate (FDR) and Family-wise Error Rate (FWER)*

Because metabolomics often produces large datasets, you may find that you perform many hypothesis tests (like a T-test) on a single dataset. Each time you perform a test, there is a chance, equivalent to the P-value cutoff, that the test is significant merely due to random chance. This means that in large datasets, you are likely to have significant results that come solely from the large number of tests you perform, rather than a true difference between the groups you are comparing. We can correct for this error using a variety of procedures. Here we will present the Bonferroni correction and the Benjamini-Hochberg procedure. Both procedures are commonly used, but have different advantages and disadvantages.

The Bonferroni correction is an adjustment of family-wise error rate (FWER), which is the probability that any of your significant results are a false positive. To perform a Bonferroni correction, simply divide whatever P-value cutoff you were using for your hypothesis test by the number of hypotheses you are testing (i.e. if you are testing 10 hypotheses with a cutoff of 0.05, then your new cutoff is 0.05/1000=0.005). This is a highly stringent method of adjustment and can be too conservative (reducing statistical power), especially in massive datasets or datasets with positively correlated test statistics. This is because if you adjust your statistics to have only a 5% chance of any false positive, you must be very stringent to reduce the chance that any single one is a false positive (i.e. if you are testing 10000 hypotheses with a cutoff of 0.05, then your new cutoff is 0.05/100000=0.0000005). While the Bonferroni correction is widely used and is sometimes the correct choice, these shortcomings make other methods of correction more appropriate, such as the Benjamini-Hochberg procedure.

Unlike the Bonferroni correction, the Benjamini-Hochberg procedure corrects for false discovery rate (FDR). While FWER adjusts hypothesis testing so that the total chance of a false positive is below some level, FDR adjusts the results of hypothesis testing so that the *rate* of false positives stays below a particular level. In other words, using the Benjamini-Hochberg procedure with an FDR of 0.05 means that 5% of positive results (rejected null hypotheses) are false positives (unlike FWER of 0.05, which means that there is only a 5% chance that any single positive result is a false positive). To conduct the Benjamini-Hochberg procedure (1K), perform your hypothesis tests as normal, then order the p-values from lowest to highest. Each p-value is assigned a rank value from its position on the list (i.e. the lowest p-value is rank 1, the second lowest is rank 2, etc.). Calculate the Benjamini-Hochberg critical value (*k*) by dividing each assigned rank by the total number of hypothesis tests you performed (the number of items in the list) and multiply the result by your acceptable false discovery rate (FDR; commonly set around 0.05, but both higher and lower are used). This will result in each p-value having its own *k. *Compare each *k *with the original p-values from your initial hypothesis test. Find the largest p-value that is larger than its corresponding *k*. This hypothesis test and all those above it in the list are considered statistically significant at your assigned FDR, all others are insignificant.

Last modified: 24 January 2022; TBJ

## Labeled data analysis and processing

**Isotope Tracers**

*Introduction *

Stable isotope tracers are molecules in which the naturally occurring isotope of one or more atoms has been replaced with a non-radioactive isotope with a different mass. These replacements change the mass of the overall molecule (which we then call “labeled”). Labeled molecules are chemically identical to unlabeled ones (for exceptions see *Kinetic Isotope Effect *section below), making them behave the same way within chemical systems (like metabolic networks) and for the purposes of separation (such as via LCMS). While their chemical equivalence makes them behave the same way within metabolic networks, their increased mass of labeled molecules can be differentiated from the unlabeled form of the same molecule through techniques like high-resolution mass spectrometry. These properties can be exploited to investigate metabolic networks and their function by feeding cells labeled tracers, then using high-resolution mass spectrometry to monitor the progression of labeled atoms and molecules through metabolic networks. These techniques are analogous to the perhaps more familiar radioisotope tracer experiments used to investigate nucleic acid metabolism, although stable isotope tracers are much safer to handle.

^{ }

*Nomenclature*

Before talking about how to interpret labeling data, it is important to first be able to identify what the names of labeled molecules mean and what isotopes are commonly used. Let’s take [1,2-^{13}C_{2}]glucose as an example. This tracer is chemically the same as the saccharide glucose. The brackets enclose the information that defines how the tracer differs from naturally labeled glucose. The “^{13}C_{2}” tells you that the isotope of carbon weighing approximately 13 Daltons has replaced the naturally abundant carbons at 2 places. The “1,2” at the beginning tell you that the ^{13}C carbons have replaced carbons 1 and 2. You may also see this same molecule represented as [1,2-^{13}C]glucose, where the subscript “2” is missing; [1,2-^{13}C_{2}]glucose and [1,2-^{13}C]glucose are equivalent. Complete replacement of a single type of atom with a new isotope is usually represented as a “U”, as in [U-^{13}C_{6}]glucose, in which every carbon is a ^{13}C carbon. Other than ^{12}C, ^{13}C is the only stable isotope of carbon, but other atoms can be replaced with alternate isotopes for labeling.

Besides ^{13}C replacing ^{12}C, common isotope replacements are ^{15}N replacing ^{14}N, ^{18}O replacing ^{16}O, and ^{2}H replacing ^{1}H (^{2}H is also known as deuterium or D). ^{17}O, ^{33}S, ^{34}S, and ^{36}S are also used, but more rarely. Nomenclature for these tracers follows a similar pattern to those for carbon tracers. For example, glutamine has 2 nitrogen atoms that can be ^{14}N (natural) or ^{15}N (labeled). Nitrogen labeled glutamine is sold as [amide-^{15}N_{1}]glutamine, [alpha-^{15}N_{1}]glutamine, or [U-^{15}N_{2}]glutamine, corresponding to glutamine labeled at the amide, amine, or both positions, respectively. Oxygen or hydrogen (deuterium) labeled tracers are named based on the carbon the labeled isotope is bound to. Thus, [1,5-^{2}H]glucose (equivalent to[1,5-D]glucose) has ^{2}H atoms (deuterons) replacing the protons at carbon 1 (the aldehyde carbon) and carbon 5 of glucose. Note that only carbon-bound protons are replaced in D-labeled glucose – the hydroxyl protons are easily exchanged with water (many times per second), so labeling them would be meaningless. While the above naming schemes should be used in publications and presentations, within the lab, you may see truncated versions of these names, such as 5D-glucose, which indicates [5-^{2}H_{1}]glucose, or U^{13}C glucose, which is [U-^{13}C_{6}]glucose.

When mass spectrometry tracer data is displayed, you will often see metabolites, such as glucose-6-phosphate, described as “M+3”. This means that the signal detected on the mass spectrometer was equivalent to the mass of the metabolite (M), plus 3 Daltons. In most cases, such as during labeling with ^{13}C tracers, this means that the molecule contains three ^{13}C carbons (though does not provide information as to the location of these carbons. Another way to represent the same information is “3-labeled”, which also indicates that 3 carbons are replaced with ^{13}C carbons. These names are more complex when isotopes that are more than 1 Dalton heavier than the mass of the naturally-abundant isotope are used. For example, during ^{18}O labeling, M+4 glucose-6-phosphate would only be 2-labeled. This is because if you replace two ^{16}O atoms with ^{18}O atoms, you add 2 to the mass of the molecule each time, making the new mass equal to M+4, but you have still only labeled the molecule twice.

*Label purity and the binomial distribution*

When you purchase isotopically labeled molecules, they will be sold with an isotopic enrichment value, usually expressed as a percent. Note that all discussions of tracer purity and labeling distribution in this section assume that the tracer data has been corrected for the natural abundance of the ^{13}C isotope (see *Natural Abundance Correction *section below). Often, this will be written something like, “[1,2-^{13}C_{2}, 99%]glucose” or , “glucose (1,2-^{13}C_{2}, 99%)”. Either way, unless otherwise specified, the percentage refers to the probability that each individual specified carbon (C1 and C2 from the example) are ^{13}C carbons. Thus, C1 has a 99% chance of being ^{13}C and C2 has a separate 99% chance of being ^{13}C. To calculate the actual distribution of ^{13}C carbons in a tracer, you have to calculate the chance that each combination is labeled. Since both C1 and C2 each have independent 99% chance to be labeled in [1,2-^{13}C_{2}, 99%]glucose, we need to use the binomial theorem. Simply, to find the probability that a single molecule of glucose has a ^{13}C carbon at both C1 and C2, multiply the probability that C1 is labeled by the probability that C2 is labeled (2A). In this case, 98.01% of the glucose in this standard should be labeled at both positions. To calculate how much isn’t labeled at all, multiply the probability that C1 is not labeled by the probability that C2 is not labeled, resulting in 0.01% (2A). The chance that only C1 or C2 is labeled is the sum of the probability that only C1 is labeled and the probability that only C2 is labeled, or 1.98% (2A). For singly-labeled tracers, like [6-^{13}C_{1}, 99%]glucose, this is simple, and the chance of a labeled carbon at C6 is 99%.

Because each carbon’s chance of being labeled must be taken into account, tracers with the same purity, but more potential labeling positions, like [U-^{13}C_{6}, 99%]glucose, have a lower proportion of the tracer actually labeled at all positions (2B). Compared to [1,2-^{13}C_{2}, 99%]glucose, which has 98.01% of the tracer labeled at both positions, [U-^{13}C_{6}, 99%]glucose is only labeled at all 6 positions 94.15% of the time, though their purity is equal (2A and 2B). Note that for incomplete labeling of the [U-^{13}C_{6}, 99%]glucose tracer, you must take into account tracer that is labeled (or not) at each position. For example, for M+5 glucose in the [U-^{13}C_{6}, 99%]glucose tracer, you must calculate the odds that all carbons except C1 are labeled, then add the odds that all carbons except C2 are labeled, then all except C3, etc. (2C). Because the chance of each of these is equal though, you can instead multiply the probability of one way of achieving a labeling by the number of possible ways to get that labeling (i.e. by 6 in the case of M+5 glucose in a [U-^{13}C_{6}, 99%]glucose tracer; cell E17 uses this shorter method (2B) while cell D34 (2C) adds each probability manually, but both achieve the same result). Instead of counting manually, you can use the =COMBIN(*x*,*y*) function in excel (nCr function on a calculator) where *x *is the number of carbons that have the possibility to be labeled and *y* is the number that are actually labeled (see “Combinations that produce this labeling” column in 2B).

* *The same calculations used to determine the proportion of a particular labeled form (such as M+5) of a molecule from the listed purity can be used in reverse to calculate the purity of a tracer from the labeling distribution. This is helpful to determine an unknown tracer purity or check whether a manufacturer-supplied tracer purity is correct (they often are not). This is especially applicable to metabolic flux analysis (MFA) or other computational techniques using tracers, where it is important to supply an accurate tracer purity to achieve accurate flux fitting. In order to perform the calculations in reverse, assume that you have used the mass spectrometer to collect data on a solution of [5,6-^{13}C_{2}]glucose of unknown purity. If you need to know the purity, you must start with the labeling distribution (1D). Label purity can be calculated from the proportion of any labeled form, but is easiest to calculate from completely labeled or unlabeled tracer. Using M+3 as an example, because we know that the proportion of M+3 is equivalent to the chance that 3 carbons are labeled (the purity^3), we can take the cubed root of the proportion of M+3 to find the purity (1D). If we calculate from M+0, we use the same formula, but substitute the proportion of M+0 in place of the M+3. The result returned is equivalent to the opposite of the purity, since M+0 is the chance that 3 carbons are NOT labeled. Instead, we need to correct for this by taking one minus the result of our original formula. In our example, the tracer purity is 98.5% (1D).

*Natural abundance correction*

As mentioned previously, stable isotopes used as tracers can occur naturally, though usually at very low abundances. For example, while ^{12}C carbon is by far the most abundant isotope in nature, any naturally occurring carbon has a 1.07% chance of being ^{13}C labeled.

It is often important to correct LCMS data using isotope tracers to remove labeling data that occurs due to these naturally occurring labeled isotopes to accurately interpret labeling data, calculate tracer purity, and use methods like MFA. ^{13}C is the most naturally abundant of the tracers commonly used, so correcting for natural abundance can often be ignored for other tracers (such as ^{2}H or ^{18}O) due to its much lower effect on the data. There are many different ways to correct for natural abundance; our lab currently uses the Matlab script ElemCor, available for download on the lab website or from github. For instructions on how to use ElemCor, see the tutorial, readme, and example files included with Elemcor. Alternatively, Accucor downloaded from github and run in R, or Daniel can provide a Matlab script written by Jing. Some MFA programs include a natural abundance correction script – take care that you either use uncorrected data with these programs or turn off the correction.

*Determining pathway utilization using tracer data*

Within metabolic systems, molecules made from a labeled molecule can “inherit” its labeled atoms, making them labeled. This can be used to determine which pathway is being used by an organism. For example, if we want to know whether an organism is using Embden-Meyerhof-Parnas (EMP) or Entner-Doudoroff (ED) glycolysis, we can use ^{13}C labeling to tell these pathways apart (Fig. 1, below). If cells are fed [1-^{13}C_{1}]glucose and are using the EMP pathway (Fig. 1, upper branch), then the glucose will be converted into FBP that is labeled at C1. FBP will be split into DHAP and GAP, and DHAP, being formed from C1-C3 of FBP, will be labeled. DHAP itself will be isomerized to GAP, so approximately half of the GAP present in the cell will be labeled. GAP supplies the carbon for lower glycolysis, making approximately half of the pool of each lower glycolytic intermediate (3PG, PEP, Pyr) labeled if the EMP is being used as the glycolytic pathway. If, on the other hand, the ED pathway is being used as the major glycolytic pathway, then glucose will be used to produce KDPG (Fig. 1, lower branch). KDPG will thus be labeled at C1. KDPG is split into pyruvate and GAP, with pyruvate being made from C1-C3 of KDPG and GAP from C4-C6. Thus, GAP will be unlabeled, while pyruvate is labeled. Because GAP is unlabeled, other lower glycolytic intermediates will be unlabeled (3PG, PEP). Thus, labeling with [1-^{13}C_{1}]glucose and monitoring the labeling of lower glycolytic intermediates can reveal which glycolytic pathway is operational: if 3PG and PEP are approximately 50% M+1, then the EMP is being used. If pyruvate is roughly 50% M+1 labeled and lower glycolytic intermediates are unlabeled, then the ED is being used. Note also that the position of the ^{13}C carbon changes in pyruvate between the ED and EMP pathway. The EMP pathway leads to the ^{13}C carbon being located in the methyl group of pyruvate, while the ED pathway leads to labeling of the carboxyl carbon of pyruvate. This can be used as additional confirmation of ED/EMP pathway activity if you are using a MS method that allows positional labeling to be determined or by monitoring metabolites that inherit their carbons from a specific part of pyruvate (the acetyl group of acetyl-CoA is formed from C2 and C3 of pyruvate, for example).

IMAGE 5

Figure 1. Labeling with [1-^{13}C_{1}]glucose showing hypothetical labeling if different pathways are used (left); black circles represent ^{12}C carbons and red circles represent ^{13}C carbons. The bar graph (right) shows the proportion of each labeled form detected for each metabolite in extracts of a EMP-utilizing organism.

*Determining reaction reversibility using tracer data*

Besides revealing which pathway is being used by an organism, tracer data can also be used to measure the reversibility of reactions in a pathway. For example, feeding a cell using EMP glycolysis [1-^{13}C]glucose will cause the [1-^{13}C]glucose (Glc) to be taken up and phosphorylated by a glucokinase (Glk) to glucose-6-phosphate (G6P), which will also have a ^{13}C atom at C1, and would thus be [1-^{13}C]glucose-6-phosphate (Fig. 2, below). Phosphoglucose isomerase (Pgi) can convert G6P to fructose-6-phosphate (F6P), which is phosphorylated by phosphofructokinase (Pfk) to fructose-1,6-bisphosphate (FBP). No carbons are exchanged or lost during these reactions, thus each of these molecules is labeled at C1. Fructose bisphosphate aldolase (Fba) splits FBP into dihydroxyacetone phosphate (DHAP) and glyceraldehyde-3-phosphate (GAP). Because DHAP is formed from the C1-C3 of FBP and GAP from C4-C6, DHAP is labeled because C1 of FBP was labeled. DHAP can be isomerized by triosephosphate isomerase (Tpi) to form GAP, resulting in labeled GAP as well.

IMAGE 6

Figure 2. Upper glycolysis labeling with [1-^{13}C_{1}]glucose; black circles represent ^{12}C carbons and red circles represent ^{13}C carbons.

While only the M+1 labeled form is produced by reactions operating in the forward direction, reversibility of the Tpi reaction allows unlabeled DHAP to be produced from the unlabeled GAP (Fig. 3, checked box). Reversibility of the Fba reaction allows two unique labeled forms of FBP to be produced, M+0 and M+2. M+0 is produced when unlabeled DHAP and GAP are combined to reform FBP by the reverse action of Fba. M+2 is produced when labeled DHAP and GAP are used.

IMAGE 7

Figure 3. Upper glycolysis labeling with [1-^{13}C_{1}]glucose; black circles represent ^{12}C carbons and red circles represent ^{13}C carbons. Dashed boxes indicate labeled forms that only appear due to reversibility of glycolytic reactions.

Utilization of labeled or unlabeled metabolites by an enzyme is proportional to their relative abundance, except where kinetic isotope effects are present (see *Kinetic Isotope Effect* section). M+0 and M+2 FBP can be used to produce F6P, which is itself used to produce G6P. Thus, the amount of M+0 and M+2 FBP, F6P, and G6P is proportional to the reversibility of the reactions that normally consume these metabolites (Fig.4).

IMAGE 8

Figure 4. Upper glycolysis labeling with [1-^{13}C_{1}]glucose. The bar graph shows the proportion of each labeled form detected for each metabolite in extracts of two EMP-utilizing organisms, one with greater reaction reversibility.

While this section discusses how labeling data are qualitatively linked to reaction fluxes and reaction reversibility, you may need to quantitatively determine these parameters, such as when trying to measure reaction free energies *in vivo*. Metabolic flux analysis (MFA) and flux balance analysis (FBA) are among the computational techniques that can be used to quantitatively determine fluxes. Within the lab, we have used both Metran and INCA to perform MFA, and both can be obtained from the lab software folder or by directly contacting the people who developed the program.

* *

*Kinetic Isotope Effect*

* *So far in this guide, we have been assuming that the stable isotopes we use for labeling have no effect on the operation of the enzymes catalyzing metabolic reactions. While this is generally true, there are cases where reaction activity depends on the mass of participating atoms. This causes the, typically heavier, labeled atoms to slow reactions they participate in relative to unlabeled atoms. This effect is termed the kinetic isotope effect (KIE) and should be considered when attempting to explain any labeling data.

The KIE tends to be more pronounced when a reaction mechanism directly acts on a labeled atom. Thus, decarboxylation of pyruvate to form CO_{2} and acetaldehyde by pyruvate decarboxylase (Pdc) has a more pronounced KIE when the substrate is pyruvate that contains a single ^{13}C carbon at C1 (the carbon removed to form CO_{2}) than when the substrate contains a single ^{13}C carbon at C3 (the methyl group uninvolved in the reaction mechanism). Thus, pyruvate ^{13}C labeled at C1 would proceed through the reaction more slowly than pyruvate labeled at C3. Even if both labeled forms of pyruvate are produced in equal proportion, this difference in consumption can interfere with data interpretation.

Because the KIE is caused by the increased mass of heavier isotopes, the KIE is proportional to the ratio of the mass of the labeled isotope to the unlabeled isotope. Thus, while ^{13}C carbon is approximately 1 Dalton heavier than ^{12}C carbon, it is roughly 13/12=1.08 times the mass of ^{12}C carbon. In contrast, although ^{2}H hydrogen is also 1 Dalton heavier than ^{1}H hydrogen, it is 2/1=2 times the mass of the ^{1}H hydrogen, making deuterium labeling much more prone to interference from the KIE than carbon labeling.

The KIE can be measured by *in vitro *enzyme assays using labeled and unlabeled substrates to determine the effect of the KIE on individual reactions, and this information can be used to correct labeling data for the KIE^{1}. While possible, it can be time and cost-prohibitive to measure and correct for the KIE for a large number of reactions, so it is often assumed to be negligible for non-deuterium tracers or literature values for similar enzyme are used, though few reactions have published information.

* *

Last modified: 24 January 2022; TBJ