Fitting to proteomics

Support for fitting turnover numbers in GECKO and RBA models to supplied proteomics data.

GeckoFitKcats

Fitting of turnover numbers to proteomics data in GECKO models.

class f2xba.GeckoFitKcats(optim, orig_kcats_fname)[source]

Support the fitting of turnover numbers to proteomics data for GECKO models.

Using the optimization result of the original GECKO model, the turnover numbers of the original GECKO model are fitted to the supplied measured protein mass fractions (mg/gP). A configuration file with the fitted turnover numbers is generated. This file can be used to generate a new GECKO model.

Using the COBRApy interface for turnover number fitting:

import cobra

ecm = cobra.io.read_sbml_model('iML1515_GECKO.xml')
eo = EcmOptimization('iML1515_GECKO.xml', ecm)

ecm.medium = {rid: 1000.0 for rid in lb_medium}
solution = ecm.optimize()

gfk = GeckoFitKcats(eo, 'iML1515_GECKO_kcats.xlsx')
tot_fitted_mpmf = gfk.process_data(solution.fluxes, measured_mpmfs)
exceeding_max_scale = gfk.update_kcats('iML1515_fitted_kcats.xlsx', target_sat=0.5, max_scale_factor=100.0)
# subsequently, generate a new GECKO model using the fitted turnover numbers.

Using the gurobipy interface for turnover number fitting: Note: GUROBI optimizer with gurobipy (https://www.gurobi.com) needs to be installed on your system.

eo = EcmOptimization('iML1515_GECKO.xml)
solution = eo.optimize()

gfk = GeckoFitKcats(eo, 'iML1515_GECKO_kcats.xlsx')
tot_fitted_mpmf = gfk.process_data(solution.fluxes, measured_mpmfs)
exceeding_max_scale = gfk.update_kcats('iML1515_fitted_kcats.xlsx', target_sat=0.5, max_scale_factor=100.0)
# subsequently, generate a new GECKO model using the fitted turnover numbers.
process_data(fluxes, measured_mpmfs)[source]

Process flux solution and proteomics data, prior to updating kcat values.

Provided are the GECKO flux solution for a given condition and corresponding proteomics data in mg protein per g total protein indexed by gene locus of gene product. Processing of data is in preparation of fitting and updating the kcat values for the GECKO model.

Kcat fitting can only be done for reactions that carry flux. The coupling factors between reaction flux and protein requirement get updated by scaling the original kcat value. The process ensures that traffic can be shifted between iso-reactions, when proteomics suggests another iso-enzyme to carry the flux.

  1. Based on the flux solution we identify the iso-reaction that carries the flux per active reaction. For all iso-reactions we determine the predicted protein cost (mpmf) based on the reaction flux.

  2. Based on the flux solution we sum up all reaction fluxes that could be routed through a protein.

  3. Based on proteomics and reaction fluxes, we identify the iso-reaction that should carry the flux per active reaction. The iso-reaction with the highest measured protein cost is selected. Measured protein costs for promiscuous enzymes are allocated as per predicted flux distribution.

Parameters:
  • fluxes (dict or pandas.Series) – reaction fluxes of GECKO solution for given condition

  • measured_mpmfs (dict) – gene loci and related protein mass fractions measured in mg protein / g total protein

Returns:

tot_fitted_mpmf: protein mass fraction used for kcat fitting

Return type:

float

update_kcats(fitted_kcats_fname, target_sat=0.5, max_scale_factor=None, min_kcat=0.01, max_kcat=5000.0)[source]

Fit turnover numbers to proteomics data and export fitted turnover numbers to file.

This requires process_data() to be executed first.

Kcat fitting can only be done for reactions that carry flux. The coupling factors between reaction flux and protein requirement get updated by scaling the original kcat value. The process ensures that traffic can be shifted between iso-reactions, when proteomics suggests another iso-enzyme to carry the flux.

Fitting is not applied, when max_scale_factor would be exceeded.

In the simplest case, we have a given reaction flux and a single protein measurement. If predicted protein is too high, we increase the kcat value for the reaction to make the enzyme more efficient. The scaling factor is predicted/measured protein concentrations.

If there are iso-reactions, we need to scale the kcat values of the iso-reactions as well, to avoid that any of the iso-reactions becomes ‘cheaper’.

More complex cases can appear with iso-reactions, when the model uses another iso-reaction than proteomics suggests. In this case we first have to increase the kcat value of the iso-reaction suggested by proteomics, and subsequently we adjust the scaling to the measured protein concentration.

A further kcat scaling is applied to move the model to a given target enzyme saturation level. It is ensured that kcat values fall into the min_kcat, max_kcat range.

Fitted kcat values are exported to fitted_kcats_fname.

Parameters:
  • fitted_kcats_fname (str) – filename for fitted and exported turnover numbers (.xlsx)

  • target_sat (float) – (optional) expected target saturation of fitted model (default: 0.5)

  • max_scale_factor (float) – (optional) maximum scaling [1/factor … factor] (default None)

  • min_kcat (float) – (optional) minimal turnover number in s-1 (default: 0.01)

  • max_kcat (float) – (optional) maximal turnover number in s-1 (default: 5000.0)

Returns:

kcat records not scaled due to exceeding max scaling

Return type:

dict(dict)

RbaFitKcats

Fitting of enzyme efficiencies (turnover numbers) to proteomics data in RBA models. Note: GeckoFitKcats can shift traffic between isoenzymes of a reaction, which is not implemented in RbaFitKcats.

class f2xba.RbaFitKcats(optim, orig_kcats_fname)[source]

Support fitting of turnover numbers to proteomics data.

Usage, with the dictionary measured_mpmfs of measured protein levels:

ro = RbaOptimization('RBA_model.xml)
ro_bl.set_medium_conc(ex_mmol_per_l)
solution = ro_bl.solve(gr_min=0.05, gr_max=0.7, bisection_tol=1e-3)
rfk = RbaFitKcats(ro, 'baseline_RBA_kcats.xlsx')
tot_fitted_mpmf = rfk.process_data(solution.fluxes, measured_mpmfs)
exceeding_max_scale = rfk.update_kcats('fitted_RBA_kcats.xlsx', max_scale_factor=2.5)
process_data(var_values, measured_mpmfs)[source]

Process RBA solution and proteomics data, prior to updating kcat values.

var_values (i.e. optimization variable values) in RBA solution contain both the reaction fluxes for metabolic reactions, split by catalyzing enzyme (i.e. iso-reaction fluxes) in mmol/gDWh and the predicted enzyme and process machine concentrations in µmol/gDWh.

In a first step the predicted protein concentrations in mmol/gDW is determined using the values of the optimization variables for enzyme and process machine concentrations and values encoded in the RBA model providing the enzyme/process machine composition and protein molecular weights. Include protein concentration from target concentrations, e.g. dummy protein requirements.

Subsequently, convert the units from mmol/gDW to mpmf (mg protein / g total protein)

Parameters:
  • var_values (dict or pandas.Series) – values for optimization variables of RBA solution for given condition

  • measured_mpmfs (dict) – gene loci and protein mass fractions measured in mg protein / g total protein

Returns:

tot_fitted_mpmf: protein mass fraction used for kcat fitting

Return type:

float

update_kcats(fitted_kcats_fname, target_sat=0.5, max_scale_factor=None, min_kcat=0.01, log_scale=False)[source]

Fit turnover numbers to proteomics data and export updated turnover numbers to file.

This requires process_data() to be executed first.

The idea is to scale the original turnover numbers of active enzyme catalyzed reactions to get the predicted protein levels closer to the measured protein levels. This however, is only a first approximation, assuming that the flux distribution would not change significantly. Using adjusted turnover numbers will however impact flux levels and might impact flux distribution, making the automatic fitting suboptimal.

Per active reaction an optimal scaling factor is determined. In case of an enzyme with a single protein component, this scaling factor is just the ratio of predicted to measured protein mass fraction. Only scaling factors in the range of 1/max_scale_factor ond max_scale factor are considered. For enzyme complexes a weighted scaling factor, wrt measured protein mass fractions, is determined. Weighing for enzyme complexes can be based linear or log scale of pmf.

Turnover numbers of all iso-reactions of a given net_reaction are rescaled, to avoid that another iso-reaction become more favorable, which would change the type of proteins used.

Fitted kcat values are exported to fitted_kcats_fname

Parameters:
  • fitted_kcats_fname (str) – filename for fitted and exported turnover numbers (.xlsx)

  • target_sat (float) – expected target saturation of fitted model (default: 0.5)

  • max_scale_factor (float) – maximum scaling [1/factor … factor] (default None)

  • min_kcat (float) – minimal turnover number in s-1 (default: 0.01)

  • log_scale (bool) – select weighing based on lin/log scale protein mass fractions (default: False)

Returns:

records not scaled due to exceeding max scaling

Return type:

dict(dict)