2. Tutorial: Create iML1515 GECKO with modified default values¶

In the preceding tutorial, an initial GECKO model of iML1515 was developed, incorporating minimal specific parametrization, default turnover numbers for metabolic and transporter reactions, and enzyme compositions derived from the BioCyc online database. While the correlation between predicted and measured growth rates was deemed acceptable, the correlation between predicted and measured protein levels was found to be unsatisfactory. In this tutorial, we will methodically modify parameters to enhance the prediction of protein levels, with a particular focus on transporters. To this end, a more thorough understanding of the organism is essential. The BioCyc/EcoCyc web portal offers valuable insights into E. coli, and reviewing relevant implementation details is also advisable. Coupling constraints, which couple reaction flux to protein requirements, are critical to a GECKO model. As previously outlined in the tutorial, these coupling constraints (\(\text{CC}_{ij}\)) are functions of protein copy number in the catalyzing enzyme (\(\text{stoic}_{ij}\)), protein molecular weight (\(\text{MW}_i\)), turnover number (\(\text{kcat}_{ij}\)), number of active sites (\(\text{n_AS}\)), and average enzyme saturation (\(\text{avg_enz_sat}\)).

\(\text{CC}_{ij} = \frac{\text{stoic}_{ij} \cdot \text{MW}_i } {(\text{kcat}_{ij} \cdot \text{n_AS}_{ij} \cdot \text{avg_enz_sat})}\)

The f2xba method involves the extraction of protein requirements from the gene-product associations configured in the foundational GEM. The average enzyme saturation, applicable to all enzymes, is configured in the ECM configuration file. The protein molecular weight is extracted from UniProt or NCBI online data. In the initial model, protein copy numbers were extracted from BioCyc online data, and the number of active sites were configured based on heuristics. The turnover numbers were set to default values, as specified in the XBA configuration file.

It is important to note that lowering the values of coupling constraint will reduce the amount of protein required to catalyze a specific reaction; alternatively, reaction flux will increase for a given protein investment. An increase in the turnover number will reduce the coupling. Within the scope of this study, transporters are delineated as reactions comprising reactants and products residing in multiple cellular compartments. Turnover numbers for several transporters have been measured and can be extracted from Brenda (https://www.brenda-enzymes.org) (Chang et al., 2021) or BioCyc with references to relevant publications. Given the inherent challenges in measuring turnover numbers for membrane-based transporters, we have established a range of default turnover number values for transporters, contingent on their classification as porins, ABC transporters, and so forth. This approach serves to enhance the accuracy of protein predictions.

While the adjustment of the turnover number appears to be a viable proposition, the modification of the copy number of proteins in multimeric enzymes does not initially appear feasible. However, there are instances where this adjustment could be proposed, particularly in the context of transporters. For instance, in the initial GECKO model, an ABC transporter consists of a transporter component with a fixed stoichiometry of the participating proteins, and one periplasmic binding protein, which is not tightly coupled. Proteomics suggests that the ratio between periplasmic binding proteins and related transporter components is much greater than 1.0. By adjusting the copy numbers of periplasmic binding proteins, we can improve protein predictions. ACP, flavodoxin, or glutaredoxin are some of the coenzymes implemented in the iML1515 GEM model. In instances where these coenzymes are present in a reaction as either a reactant or a product, they are also incorporated into the gene-product association for that specific reaction. For the improved GECKO model, the coenzymes will be removed from the gene-product association when they do not function as a catalyst. This approach results in a reduction of the necessary genes required in the GECKO model compared to the FBA model.

The subsequent parameter modifications were derived iteratively. This process entailed modifying parameters, creating versions of GECKO models, analyzing optimization results, identifying areas for improvement, and reviewing relevant literature.

Peter Schubert, Heinrich-Heine University Duesseldorf, Institute for Computational Cell Biology (Prof. Dr. M. Lercher), January, 2025

Step 1: Initial setup¶

In the initial step of the present tutorial and the subsequent tutorials, the environment is established for model creation and optimization. The necessary imports are executed, the model names are defined, media conditions are created, and proteomics data is loaded. Subsequently, the configuration data of a previous model, hereafter referred to as the baseline_model, is extended to create an improved or more realistic target_model.

[1]:

# Required imports and model names
import os
import re
import time
import pandas as pd
import cobra

from f2xba import XbaModel, EcModel
from f2xba import EcmOptimization, EcmResults
from f2xba.utils.mapping_utils import load_parameter_file, write_parameter_file

fba_model = 'iML1515'
baseline_model = 'iML1515_default_GECKO'
target_model = 'iML1515_modified_GECKO'
reference_cond = 'Glucose'

# Create media conditions
media_grs = {'Acetate': ['ac', 0.29], 'Glycerol': ['glyc', 0.47], 'Fructose': ['fru', 0.54],
             'L-Malate': ['mal__L', 0.55], 'Glucose': ['glc__D', 0.66], 'Glucose 6-Phosphate': ['g6p', 0.78]}
base_medium = ['ca2', 'cbl1', 'cl', 'co2', 'cobalt2', 'cu2', 'fe2', 'fe3', 'h2o', 'h', 'k', 'mg2',
               'mn2', 'mobd', 'na1', 'nh4', 'ni2', 'o2', 'pi', 'sel', 'slnt', 'so4', 'tungs', 'zn2']
conditions = {}
exp_grs = {}
for cond, (carbon_sid, exp_gr) in media_grs.items():
    conditions[cond] = {f'EX_{sidx}_e': 1000.0 for sidx in base_medium}
    conditions[cond][f'EX_{carbon_sid}_e'] = 1000.0
    exp_grs[cond] = exp_gr
print(f'{len(conditions)} minimal media conditions created for simulation')

# Load proteomics
fname = os.path.join('data', 'Ecoli_Schmidt_proteomics.xlsx')
with pd.ExcelFile(fname) as xlsx:
    df_mpmf = pd.read_excel(xlsx, sheet_name='proteomics', index_col=0)
    print(f'{len(df_mpmf)} records of proteomics loaded from {fname}')
min_confidence_level = 43.0
df_mpmf = df_mpmf[df_mpmf['confidence'] > min_confidence_level]
print(f'{len(df_mpmf)} records with confidence level above {min_confidence_level}')

6 minimal media conditions created for simulation
2347 records of proteomics loaded from data/Ecoli_Schmidt_proteomics.xlsx
2232 records with confidence level above 43.0

Step 2: Export kcats values and enzyme composition¶

The XBA configuration data employed in the creation of the initial GECKO model is updated, and gene products are removed via the table designated as remove gps. This includes structural proteins (G_b0957), proteins that are not expressed under normal conditions (G_b1319), and coenzymes that do not act as catalysts. Subsequent to the aforementioned modifications, an in-memory instance of XbaModel is generated, and it is utilized to export two tables, which are subsequently stored in the XLSX format.

The first table encompasses the turnover numbers, while the second table comprises the enzyme composition. The table with turnover numbers is indexed by the reaction identifier (key). The rid column provides the identifier of the net reaction, respectively the reaction identifier used in the foundational GEM. The direction of the reaction with respect to the net reaction is specified by the dirxn column, which can be forward (1) or reverse (-1). The catalyzing enzyme identifier (enzyme), the turnover number kcat_per_s in units of \(s^{-1}\), and other informative columns are also included. The table that enumerates enzyme compositions is indexed by the enzyme identifier (eid), and the composition itself specifies the required genes and protein copy numbers. Enzymes can have one or more active sites, and enzymes can be named (name). The table also contains optional information columns.

[2]:

# Create updated XBA parameter file
xba_params = load_parameter_file(os.path.join('data', f'{baseline_model}_xba_parameters.xlsx'))

data = [['G_b0957', 'ompA (30.4 mg/gP) used in R_H2Otex has mainly a structural'],
        ['G_b1319', 'ompG used in R_H2Otex, not expressed under lab conditions'],
        # remove proteins used stoichiometrically in iML1515 (add to RBA model as target)
        ['G_b1094', 'acp, Acyl carrier protein, is non-catalytic'],
        ['G_b2582', 'trxC, Thioredoxin-2, is non-catalytic'],
        ['G_b3781', 'trxA, Thioredoxin-1, is non-catalytic'],
        ['G_b2529', 'iscU, scaffold protein for [Fe-S] cluster assembly, is non-catalytic'],
        ['G_b1679', 'sufE, sulfur carrier protein SufE, is non-catalytic'],
        ['G_b1681', 'sufD, FeS cluster assembly protein sufD, is non-catalytic'],
        ['G_b1682', 'sufC, Probable ATP-dependent transporter SufC, is non-catalytic'],
        ['G_b1683', 'sufB, FeS cluster assembly protein sufB, is non-catalytic'],
        ['G_b3860', 'dsbA, Thiol:disulfide interchange protein DsbA, used stoichiometrically'],
        ['G_b2893', 'dsbC, Thiol:disulfide interchange protein DsbC, used stoichiometrically'],
        ['G_b4136', 'dsbD, Thiol:disulfide interchange protein DsbD, used stoichiometrically'],
        ['G_b0604', 'dsbG, Thiol:disulfide interchange protein DsbB, used stoichiometrically'],
        ['G_b1677', 'lpp, Major outer membrane lipoprotein Lpp, used stoichiometrically'],
        ['G_b0684', 'fldA, Flavodoxin-1, used stoichiometrically'],
        ['G_b2895', 'fldb, Flavodoxin-2, used stoichiometrically'],
        ['G_b0849', 'grxA, Glutaredoxin-1, used stoichiometrically'],
        ['G_b1064', 'grxB, Glutaredoxin-2, used stoichiometrically'],
        ['G_b3610', 'grxC, Glutaredoxin-3, used stoichiometrically'],
        ['G_b1654', 'grxD, Glutaredoxin-4, used stoichiometrically'],
       ]

df = pd.DataFrame(data, columns=['gp_id', 'notes']).set_index('gp_id')
xba_params['remove_gps'] = pd.concat((xba_params['remove_gps'], df))

write_parameter_file(os.path.join('data', f'export_xba_parameters.xlsx'), xba_params)

2 table(s) with parameters loaded from data/iML1515_default_GECKO_xba_parameters.xlsx (Thu Nov 20 18:48:27 2025)
2 table(s) with parameters written to  data/export_xba_parameters.xlsx

[3]:

# Extract kcat and enzyme composition
xba_model = XbaModel(os.path.join('SBML_models', f'{fba_model}.xml'))
xba_model.configure(os.path.join('data', f'export_xba_parameters.xlsx'))

xba_model.export_kcats(os.path.join('data', f'{baseline_model}_kcats.xlsx'))
xba_model.export_enz_composition(os.path.join('data', f'{baseline_model}_enzyme_composition.xlsx'))

loading: SBML_models/iML1515.xml (last modified: Thu Dec  5 10:03:46 2024)
2 table(s) with parameters loaded from data/export_xba_parameters.xlsx (Thu Nov 20 18:52:31 2025)
  22 gene product(s) removed from reactions (1494 gene products remaining)
extracting UniProt protein data from data/uniprot_organism_83333.tsv
1494 proteins created
1203 enzymes added with default stoichiometry
3855 enzymes extracted from Biocyc
1197 enzyme compositions updated from Biocyc Enzyme data
2221 reactions catalyzed by 1203 enzymes
default kcat values configured for ['metabolic', 'transporter'] reactions
   0 enzymes removed due to missing kcat values
1877 constraints (+0); 2712 variables (+0); 1494 genes (-22); 5 parameters (+0)
>>> BASELINE XBA model configured!

5357 reaction kcat values exported to data/iML1515_default_GECKO_kcats.xlsx
1203 enzyme compositions exported to data/iML1515_default_GECKO_enzyme_composition.xlsx

[4]:

# kcats file
load_parameter_file(os.path.join('data', f'{baseline_model}_kcats.xlsx'))['kcats'].head(3)

1 table(s) with parameters loaded from data/iML1515_default_GECKO_kcats.xlsx (Thu Nov 20 18:52:42 2025)

[4]:

	rid	dirxn	enzyme	kcat_per_s	notes	info_active_sites	info_ecns	info_type	info_genes	info_name	info_reaction
key
R_CYTDK2	R_CYTDK2	1	enz_b2066	12.5	XBA model export	1	2.7.1.213, 2.7.1.48	metabolic	b2066	Cytidine kinase (GTP)	M_cytd_c + M_gtp_c => M_cmp_c + M_gdp_c + M_h_c
R_XPPT	R_XPPT	1	enz_b0238	12.5	XBA model export	4	2.4.2.22, 2.4.2.8	metabolic	b0238	Xanthine phosphoribosyltransferase	M_prpp_c + M_xan_c => M_ppi_c + M_xmp_c
R_HXPRT_iso1	R_HXPRT	1	enz_b0125	12.5	XBA model export	4	2.4.2.8	metabolic	b0125	Hypoxanthine phosphoribosyltransferase (Hypoxa...	M_hxan_c + M_prpp_c => M_imp_c + M_ppi_c

[5]:

# enzyme composition file
load_parameter_file(os.path.join('data', f'{baseline_model}_enzyme_composition.xlsx'))['enzymes'].tail(3)

1 table(s) with parameters loaded from data/iML1515_default_GECKO_enzyme_composition.xlsx (Thu Nov 20 18:52:42 2025)

[5]:

	name	composition	active_sites	info_mw_kDa	info_n_reactions	info_sample_rid
eid
enz_b1395	NaN	gene=b1395, stoic=1.0	1	51.733	1	R_HADPCOADH3
enz_b1007_b1012	enz_b1007_b1012	gene=b1007, stoic=1.0; gene=b1012, stoic=1.0	1	59.968	1	R_PYROX
enz_b0784_b0785	molybdopterin synthase	gene=b0784, stoic=2.0; gene=b0785, stoic=2.0	2	51.478	1	R_MPTS

Step 3: Modify turnover numbers and enzyme composition¶

The data tables that were exported in the preceding step can be updated by employing a spreadsheet editor. However, this will not be the method employed in this tutorial. Instead, the tables will be updated using Python code.

3.1: modify turnover numbers¶

measured values¶

In instances where turnover numbers have been measured, they will selectively be used to replace the default values employed in the initial GECKO model.

outer membrane pores¶

The FBA model of iML1515 contains 283 transport reactions across the outer membrane that utilize any of four isoenzymes, namely the gene products of b0241/phoE, b0929/ompF, b1377/ompN, and b2215/ompC. Of these, OmpN, with one copy of ompN, has a molecular weight of 41 kD and is the ‘cheapest’ enzyme. The three other outer membrane pores are trimeric with molecular weights of approximately 117 kD (PhoE), 118 kD (OmpF), and 121 kD (OmpC). Two proteins, b0241/phoE and b1377/ompN, were not included in the proteomics data due to their low expression levels under minimal and rich media conditions, as reported by BioCyc. In contrast, b0929/ompF and b2215/ompC were identified as the top 20 most abundant protein masses in the proteomics data.

The GECKO formalism will invariably select the ‘cheapest’ iso-enzyme for a given reaction, i.e., b1377/ompN. By reducing the turnover numbers of all (sub-)reactions utilizing b1377 or b0241, we can redirect traffic to the other two pores, b0929 and b2215. The assignment of distinct turnover numbers to these (sub-)reactions, contingent on the charge state of the transported metabolite, could facilitate a balanced utilization of the two proteins.

transporter types¶

The model encompasses diverse transporter types, including ABC transporters and aquaporins, for which distinct default turnover values have been established. default kcat values.

Python code¶

It is not necessary to comprehend the Python code presented below in its entirety.

[6]:

# Load original kcats file and modify selected transporter kcat values
specific_tx_kcats = {'R_NADH16pp': [600.0, 'manual - E. coli, pubmed 672041'],
                     'R_NADH17pp': [600.0, 'manual - E. coli, pubmed 672041'],
                     'R_NADH18pp': [600.0, 'manual - E. coli, pubmed 672041'],
                     'R_CYTBO3_4pp': [320.0, 'manual - E. coli, pubmed 8382081'],
                     'R_ATPS4rpp_iso1': [285.0, 'manual - E. coli, pubmed 19411254'],
                     'R_ATPS4rpp_iso2': [285.0, 'manual - E. coli, pubmed 19411254'],
                     'R_FDH4pp_iso2': [563.0, 'manual - E. coli, pubmed 1099093'],
                     'R_FDH5pp_iso2': [563.0, 'manual - E. coli, pubmed 1099093'],
                     'R_SUCDi': [85.0, 'manual - E. coli, pubmed 11803023'],
                     'R_SULabcpp_iso2': [100.0, 'manual - adjusted to proteomics'],
                     'R_SO4t2pp': [0.1, 'manual - adjusted to proteomics'],}
rids_low_tx_flux = {'R_MG2tpp', 'R_MG2uabcpp','R_NI2tpp', 'R_NI2uabcpp', 'R_COBALT2tpp'}
enz2omp = {'enz_b0241': 'phoE', 'enz_b0929': 'ompF', 'enz_b1377': 'ompN', 'enz_b2215': 'ompC'}
default_kcats = {'general-tx': 25.0, 'aquaporin': 5000.0, 'pts': 100.0, 'abc': 50.0, 'low_flux': 2.5,
                 'phoE': 70.0, 'ompN': 30.0,
                 'ompC (ionic)': 275.0, 'ompC (neutral)': 385.0,
                 'ompF (ionic)': 275.0, 'ompF (neutral)': 370.0 }

# Create updated kcats file
df_kcats = load_parameter_file(os.path.join('data', f'{baseline_model}_kcats.xlsx'))['kcats']

# Update selected kcats for transporters
updates = {tx_type: 0 for tx_type in default_kcats.keys()}
n_specific_kcats = 0
for key, row in df_kcats.iterrows():
    if key in specific_tx_kcats:
        df_kcats.at[key, 'kcat_per_s'] = specific_tx_kcats[key][0]
        df_kcats.at[key, 'notes'] = specific_tx_kcats[key][1]
        n_specific_kcats += 1
    elif row['rid'] in rids_low_tx_flux:
        df_kcats.at[key, 'kcat_per_s'] = default_kcats['low_flux']
        df_kcats.at[key, 'notes'] = 'manual low flux reactions'
        updates['low_flux'] += 1
    elif row['info_type'] == 'transporter':
        r = xba_model.reactions[row['rid']]
        reactants = list(r.reactants)
        if len(reactants) == 1 and re.match('M_h2o_', reactants[0]):
            tx_type = 'aquaporin'
        elif 'pts' in row['rid']:
            tx_type = 'pts'
        elif 'abc' in row['rid']:
            tx_type = 'abc'
        elif r.compartment == 'e-p':
            tx_type = enz2omp.get(row['enzyme'], 'general-tx')
            species_charge = xba_model.species[reactants[0]].charge
            if tx_type == 'ompF':
                tx_type = 'ompF (neutral)' if species_charge == 0 else 'ompF (ionic)'
            elif tx_type == 'ompC':
                    tx_type = 'ompC (neutral)' if species_charge == 0 else 'ompC (ionic)'
        else:
            tx_type = 'general-tx'
        df_kcats.at[key, 'kcat_per_s'] = default_kcats[tx_type]
        df_kcats.at[key, 'notes'] = f'default for {tx_type} transporter'
        updates[tx_type] += 1

print(f'{len(df_kcats):4d} kcat records in total')
print(f'{sum(updates.values()) + n_specific_kcats:4d} kcat values updated')
print(f'{n_specific_kcats:9d} kcat records set to specific values')
for tx_type, count in updates.items():
    print(f'{count:9d} kcat records set to {default_kcats[tx_type]:6.1f} s-1 for {tx_type} transporter')

# Write updated kcats to file
write_parameter_file(os.path.join('data', f'{target_model}_kcats.xlsx'), {'kcats': df_kcats})

1 table(s) with parameters loaded from data/iML1515_default_GECKO_kcats.xlsx (Thu Nov 20 18:52:42 2025)
5357 kcat records in total
3009 kcat values updated
       11 kcat records set to specific values
      543 kcat records set to   25.0 s-1 for general-tx transporter
       12 kcat records set to 5000.0 s-1 for aquaporin transporter
       27 kcat records set to  100.0 s-1 for pts transporter
      151 kcat records set to   50.0 s-1 for abc transporter
        7 kcat records set to    2.5 s-1 for low_flux transporter
      562 kcat records set to   70.0 s-1 for phoE transporter
      564 kcat records set to   30.0 s-1 for ompN transporter
      327 kcat records set to  275.0 s-1 for ompC (ionic) transporter
      239 kcat records set to  385.0 s-1 for ompC (neutral) transporter
      327 kcat records set to  275.0 s-1 for ompF (ionic) transporter
      239 kcat records set to  370.0 s-1 for ompF (neutral) transporter
1 table(s) with parameters written to  data/iML1515_modified_GECKO_kcats.xlsx

3.2: modify enzyme compositions¶

According to proteomics data, the copy number of loosely coupled proteins required in ATP binding cassette (ABC) and phosphotransferase system (PTS) transport processes should be modified. Periplasmic binding proteins are identified by name or gene locus. The loosely attached components of PTS transporters are identified by gene locus. The number of active sites is modified for large multienzyme complexes.

[7]:

# Load enzyme composition template file
df_enz_comp = load_parameter_file(os.path.join('data', f'{baseline_model}_enzyme_composition.xlsx'))['enzymes']

# Collect genes related to periplasmic binding proteins. Accessing protein data in xba_model.
binding_proteins = {'b1857', 'b0698', 'b2177'}
for uid, p in xba_model.proteins.items():
    if re.match('.*binding.*protein', p.name) and 'ATP-binding' not in p.name:
        binding_proteins.add(p.locus)

# Update composition of selected enzymes
pts_update = {'b1101':1.0, 'b1817': 7.0, 'b2415': 9.0, 'b2416': 4.0, 'b2417': 12.0}
enz_modified = set()
for eid, row in df_enz_comp.iterrows():
        composition = []
        for kv in row['composition'].split(';'):
            gene = re.match(r'.*gene=(b\d{4})', kv).group(1)
            stoic = float(re.match(r'.*stoic=([0-9\.])', kv).group(1))
            if 'abc' in row['info_sample_rid'] and gene in binding_proteins:
                stoic = 50.0
                enz_modified.add(eid)
            elif gene in pts_update:
                stoic = pts_update[gene]
                enz_modified.add(eid)
            composition.append(f'gene={gene}, stoic={stoic}')
        if eid in enz_modified:
            df_enz_comp.at[eid, 'composition'] = '; '.join(composition)
            df_enz_comp.at[eid, 'notes'] = 'enzyme composition manually updated'
print(f'{len(df_enz_comp):4d} enzymes in total')
print(f'{len(enz_modified):4d} enzyme compositions manually updated')

# update number of active sites in PDH and AGKDH
df_enz_comp.at['enz_b0114_b0115_b0116', 'active_sites'] = 24  # PDH
df_enz_comp.at['enz_b0116_b0726_b0727', 'active_sites'] = 24  # AKGDH

write_parameter_file(os.path.join('data', f'{fba_model}_enzyme_composition_updated.xlsx'),
                     {'enzymes': df_enz_comp})

1 table(s) with parameters loaded from data/iML1515_default_GECKO_enzyme_composition.xlsx (Thu Nov 20 18:52:42 2025)
1203 enzymes in total
  62 enzyme compositions manually updated
1 table(s) with parameters written to  data/iML1515_enzyme_composition_updated.xlsx

Step 4: Create XBA and ECM configuration files¶

The creation of XBA and ECM configuration files for the target model is achieved through the extension of the configuration files from the baseline model. Rather than utilizing the ECM configuration parameter pm2totpm_val_or_paxdb to define the mass fraction of modeled protein to total protein, an alternative approach is explored, employing a protein abundance dataset for E. coli that is downloaded from PaxDB.

4.1 download PaxDb file¶

In order to proceed, it is necessary to navigate to the ‘PaxDb: Protein Abundance Database web portal’ (https://pax-db.org) and select the species ‘Escherichia coli str. K-12 substr. MG1655’. Subsequently, the dataset for ‘E. coli - Whole organism (Integrated)’ (511145-WHOLE_ORGANISM-integrated.txt) should be downloaded. The downloaded file should then be stored in the .\data folder.

PaxDb files for other organisms:¶

f2xba anticipates that the gene product references contained within the protein abundance dataset correspond to the gene labels employed in the model. In instances where PaxDb lacks a protein abundance dataset for a particular organism, it is imperative to ascertain the proportion of modeled protein mass to total protein mass and allocate this value to the ECM configuration parameter designated as pm2totpm_val_or_paxdb.

4.2 update XBA and ECM configuration files¶

XBA configuration¶

The XBA configuration file has been updated, with the following changes: Within the generals table, the biocyc_org_prefix parameter has been substituted with the enzyme_comp_fname parameter, thereby referencing the updated enzyme composition file. Additionally, the kcats_fname parameter has been incorporated, referencing the updated turnover numbers file. The new table, modify_attributes, is used to modify specific model parameters. The spontaneous reaction ‘R_MG2t3_2pp’ is blocked to enforce active transport through alternative reactions, and the reaction ‘R_GLCDpp’ is blocked since it requires the cofactor pyrroloquinoline quinone (PQQ), which is not part of the specified medium conditions.

ECM configuration¶

The ECM configuration involves the update of the parameters pm2totpm_val_or_paxdb, now referencing the PacDb file, and the adjustment of the avg_enz_sat.

[8]:

# Update XBA parameter file
xba_params = load_parameter_file(os.path.join('data', f'export_xba_parameters.xlsx'))

xba_params['general'].drop(index='biocyc_org_prefix', inplace=True)
data = [['kcats_fname', os.path.join('data', f'{target_model}_kcats.xlsx')],
        ['enzyme_comp_fname', os.path.join('data', f'{fba_model}_enzyme_composition_updated.xlsx')]]
df = pd.DataFrame(data, columns=['parameter', 'value']).set_index('parameter')
xba_params['general'] = pd.concat((xba_params['general'], df))

data = [['R_GLCDpp', 'reaction', 'fbc_upper_bound', 'cobra_0_bound',
         'block, as both isoenzymes gcd/b0124 and ylil/b0837 require PQQ as cofactor'],
        ['R_MG2t3_2pp', 'reaction', 'fbc_upper_bound', 'cobra_0_bound',
         'block, as this reaction has no enzyme attached'],]
cols=['id', 'component', 'attribute', 'value', 'notes']
xba_params['modify_attributes'] = pd.DataFrame(data, columns=cols).set_index('id')

write_parameter_file(os.path.join('data', f'{target_model}_xba_parameters.xlsx'), xba_params)

# Update ECM parameter file
ecm_params = load_parameter_file(os.path.join('data', f'{baseline_model}_ecm_parameters.xlsx'),['general'])

ecm_params['general'].at['avg_enz_sat', 'value'] = 0.93
paxdb_fname = os.path.join('data', '511145-WHOLE_ORGANISM-integrated.txt')
ecm_params['general'].at['pm2totpm_val_or_paxdb', 'value'] = paxdb_fname

write_parameter_file(os.path.join('data', f'{target_model}_ecm_parameters.xlsx'), ecm_params)

2 table(s) with parameters loaded from data/export_xba_parameters.xlsx (Thu Nov 20 18:52:31 2025)
3 table(s) with parameters written to  data/iML1515_modified_GECKO_xba_parameters.xlsx
1 table(s) with parameters loaded from data/iML1515_default_GECKO_ecm_parameters.xlsx (Thu Nov 20 18:48:29 2025)
1 table(s) with parameters written to  data/iML1515_modified_GECKO_ecm_parameters.xlsx

Step 5: Create GECKO model¶

This code section bears a strong resemblance to the preceding tutorial, with two notable exceptions. Firstly, validation of the model is no longer performed. Secondly, the model is not exported as a spreadsheet.

[9]:

# Create GECKO model
xba_model = XbaModel(os.path.join('SBML_models', f'{fba_model}.xml'))
xba_model.configure(os.path.join('data', f'{target_model}_xba_parameters.xlsx'))

ec_model = EcModel(xba_model)
ec_model.configure(os.path.join('data', f'{target_model}_ecm_parameters.xlsx'))
ec_model.export(os.path.join('SBML_models', f'{target_model}.xml'))

loading: SBML_models/iML1515.xml (last modified: Thu Dec  5 10:03:46 2024)
3 table(s) with parameters loaded from data/iML1515_modified_GECKO_xba_parameters.xlsx (Thu Nov 20 18:52:54 2025)
  22 gene product(s) removed from reactions (1494 gene products remaining)
   2 attributes on reaction instances updated
extracting UniProt protein data from data/uniprot_organism_83333.tsv
1494 proteins created
1203 enzymes added with default stoichiometry
1 table(s) with parameters loaded from data/iML1515_enzyme_composition_updated.xlsx (Thu Nov 20 18:52:47 2025)
1203 enzyme compositions updated from data/iML1515_enzyme_composition_updated.xlsx
2221 reactions catalyzed by 1203 enzymes
default kcat values configured for ['metabolic', 'transporter'] reactions
1 table(s) with parameters loaded from data/iML1515_modified_GECKO_kcats.xlsx (Thu Nov 20 18:52:45 2025)
5357 kcat values updated from data/iML1515_modified_GECKO_kcats.xlsx
   0 enzymes removed due to missing kcat values
1877 constraints (+0); 2712 variables (+0); 1494 genes (-22); 5 parameters (+0)
>>> BASELINE XBA model configured!

1 table(s) with parameters loaded from data/iML1515_modified_GECKO_ecm_parameters.xlsx (Thu Nov 20 18:52:54 2025)
PaxDb file E.coli - Whole organism (Integrated), year 2023, covering 3743 proteins (total 1000006.5 ppm) loaded from data/511145-WHOLE_ORGANISM-integrated.txt
modeled protein fraction of total protein mass 0.5379 g/g
1494 protein constraints to add
1494 constraint ids added to the model (3371 total constraints)
protein constraint: 306.59 mg/gDW - modeled protein
   1 constraint ids added to the model (3372 total constraints)
1495 protein variables to add
1495 variable ids added to the model (7343 total variables)
3372 constraints (+1495); 7343 variables (+4631); 1494 genes (-22); 11 parameters (+6)
model exported to SBML format: SBML_models/iML1515_modified_GECKO.xml

[9]:

True

Step 4: Load and optimize GECKO model (cobrapy)¶

[10]:

# Load model using cobrapy
start = time.time()
fname = os.path.join('SBML_models', f'{target_model}.xml')
ecm = cobra.io.read_sbml_model(fname)
total_protein = ecm.reactions.get_by_id('V_PC_total').upper_bound

# Load SBML model with EcmOptimization
eo = EcmOptimization(fname, ecm)
sigma = eo.avg_enz_saturation
print(f'total modeled protein: {total_protein:.2f} mg/gDW, average saturation level: {sigma}')
print(f'Duration: {time.time()-start:.1f} s')

Set parameter Username
Set parameter LicenseID to value 2731209
Academic license - for non-commercial use only - expires 2026-10-31
SBML model loaded by sbmlxdf: SBML_models/iML1515_modified_GECKO.xml (Thu Nov 20 18:53:18 2025)
total modeled protein: 306.59 mg/gDW, average saturation level: 0.93
Duration: 18.6 s

[11]:

# Optimize model using cobrapy and analyze results
start = time.time()
pred_results = {}
for cond, medium in conditions.items():
    ecm.medium = medium
    solution = ecm.optimize()
    if solution.status == 'optimal':
        gr = solution.objective_value
        pred_results[cond] = solution
        print(f'{cond:25s}: pred gr: {gr:.3f} h-1 vs. exp {exp_grs[cond]:.3f}, '
              f'diff: {gr - exp_grs[cond]:6.3f}')
    else:
        print(f'{cond} ended with status {solution.status}')

# Analyze results
er = EcmResults(eo, pred_results, df_mpmf)
df_fluxes = er.collect_fluxes()
df_net_fluxes = er.collect_fluxes(net=True)
df_proteins = er.collect_protein_results()
er.plot_grs(exp_grs, highlight='Glucose')

print(f'Protein mass fractions:')
er.report_proteomics_correlation(scale='lin')
er.report_proteomics_correlation(scale='log')
print()
er.report_protein_levels('Glucose')
er.plot_proteins('Glucose', plot_fname=os.path.join('plots', f'{target_model}_proteins_Glucose.pdf'))
er.save_to_escher(df_net_fluxes['Glucose'], os.path.join('escher', target_model))
print(f'Duration: {time.time()-start:.1f} s')

Acetate                  : pred gr: 0.404 h-1 vs. exp 0.290, diff:  0.114
Glycerol                 : pred gr: 0.702 h-1 vs. exp 0.470, diff:  0.232
Fructose                 : pred gr: 0.664 h-1 vs. exp 0.540, diff:  0.124
L-Malate                 : pred gr: 0.735 h-1 vs. exp 0.550, diff:  0.185
Glucose                  : pred gr: 0.661 h-1 vs. exp 0.660, diff:  0.001
Glucose 6-Phosphate      : pred gr: 0.695 h-1 vs. exp 0.780, diff: -0.085

../_images/notebooks_2_Create_iML1515_modified_GECKO_21_1.png

Protein mass fractions:
Acetate                  : r² = 0.1066, p = 3.05e-26 ( 999 proteins lin scale)
Glycerol                 : r² = 0.0315, p = 1.61e-08 ( 999 proteins lin scale)
Fructose                 : r² = 0.1877, p = 5.65e-47 ( 999 proteins lin scale)
Glucose                  : r² = 0.1360, p = 1.55e-33 ( 999 proteins lin scale)
Acetate                  : r² = 0.2412, p = 3.63e-20 ( 309 proteins log scale)
Glycerol                 : r² = 0.1787, p = 5.86e-15 ( 312 proteins log scale)
Fructose                 : r² = 0.2014, p = 8.11e-17 ( 311 proteins log scale)
Glucose                  : r² = 0.2016, p = 2.22e-17 ( 322 proteins log scale)

condition: Glucose
1494 proteins in model with total predicted mass fraction of 537.9 mg/gP
      999 have been measured with mpmf of  514.4 mg/gP vs. 507.8 mg/gP predicted
           762 metabolic proteins measured 414.0 mg/gP vs. 392.2 mg/gP predicted
           237 transport proteins measured 100.4 mg/gP vs. 115.5 mg/gP predicted
      495 proteins not measured vs.  30.1 mg/gP predicted
           495 actual  proteins      30.1 mg/gP predicted
total           : r² = 0.1360, p = 1.55e-33 ( 999 proteins lin scale)
 metabolic      : r² = 0.1035, p = 8.34e-20 ( 762 proteins lin scale)
 transport      : r² = 0.4379, p = 3.11e-31 ( 237 proteins lin scale)
total           : r² = 0.2016, p = 2.22e-17 ( 322 proteins log scale)
 metabolic      : r² = 0.2199, p = 2.83e-16 ( 272 proteins log scale)
 transport      : r² = 0.2573, p = 1.70e-04 (  50 proteins log scale)

../_images/notebooks_2_Create_iML1515_modified_GECKO_21_3.png

1 file(s) exported for "Load reaction data" into Escher maps
Duration: 3.9 s

(Optional) Track progress¶

[12]:

import scipy
import numpy as np

number = 2
xy = np.array([[df_mpmf.at[gene, reference_cond], df_proteins.at[gene, reference_cond]]
                for gene in df_proteins.index if gene in df_mpmf.index])
log10_x, log10_y = er.get_log10_xy(xy)
lin_pearson_r, _ = scipy.stats.pearsonr(xy[:, 0], xy[:, 1])
log_pearson_r, _ = scipy.stats.pearsonr(log10_x, log10_y)

predictions = load_parameter_file('protein_predictions.xlsx')
data = [[number, target_model, lin_pearson_r**2,log_pearson_r**2, len(xy), len(log10_x)]]
cols = ['No', 'model', 'lin r2', 'log r2', 'lin proteins', 'log proteins']
df = pd.DataFrame(data, columns=cols).set_index('No')
if number in predictions[reference_cond].index:
    predictions[reference_cond].drop(index=number, inplace=True)
predictions[reference_cond] = pd.concat((predictions[reference_cond], df)).sort_index()
write_parameter_file('protein_predictions.xlsx', predictions)
predictions[reference_cond]

1 table(s) with parameters loaded from protein_predictions.xlsx (Thu Nov 20 18:50:20 2025)
1 table(s) with parameters written to  protein_predictions.xlsx

[12]:

	model	lin r2	log r2	lin proteins	log proteins
No
1	iML1515_default_GECKO	0.033244	0.190225	1018	299
2	iML1515_modified_GECKO	0.135981	0.201613	999	322

Closing remarks¶

A few parameters, primarily those associated with transporters, were modified, leading to substantial improvements in the predicted protein levels. The modifications were made based on a comprehensive understanding of the organism and the manner in which the reaction network is modeled. This included adjustments to transporter turnover numbers, the enzyme composition of transporters, and the number of active sites for large multienzyme complexes. Additionally, coenzymes that did not function as catalysts were removed from the modeled gene products. It should be noted that metabolic reactions continue to be configured with the default turnover number. In the subsequent tutorial, we will utilize an AI-based tool to predict the turnover numbers of all metabolic reactions.

(Alternative) gurobipy - model optimization¶

[13]:

# Load model using gurobipy
fname = os.path.join('SBML_models', f'{target_model}.xml')
eo = EcmOptimization(fname)
total_protein = eo.get_variable_bounds('V_PC_total')['V_PC_total'][1]
sigma = eo.avg_enz_saturation
print(f'total modeled protein: {total_protein:.2f} mg/gDW, average saturation level: {sigma}')

SBML model loaded by sbmlxdf: SBML_models/iML1515_modified_GECKO.xml (Thu Nov 20 18:53:18 2025)
LP Model of iML1515_GECKO
7343 variables, 3372 constraints, 28438 non-zero matrix coefficients
total modeled protein: 306.59 mg/gDW, average saturation level: 0.93

[14]:

# Optimize model using gurobipy and analyze results
pred_results = {}
for cond, medium in conditions.items():
    eo.medium = medium
    solution = eo.optimize()

    if solution.status == 'optimal':
        pred_results[cond] = solution
        gr = solution.objective_value
        print(f'{cond:25s}: pred gr: {gr:.3f} h-1 vs. exp {exp_grs[cond]:.3f}, diff: {gr - exp_grs[cond]:6.3f}')
    else:
        print(f'{cond} ended with status {solution.status}')

# Analyze results
er = EcmResults(eo, pred_results, df_mpmf)
df_fluxes = er.collect_fluxes()
df_net_fluxes = er.collect_fluxes(net=True)
df_proteins = er.collect_protein_results()
er.plot_grs(exp_grs, highlight='Glucose')

print(f'Protein mass fractions:')
er.report_proteomics_correlation(scale='lin')
er.report_proteomics_correlation(scale='log')
print()
er.report_protein_levels('Glucose')
er.plot_proteins('Glucose', plot_fname=os.path.join('plots', f'{target_model}_proteins_Glucose.pdf'))
er.save_to_escher(df_net_fluxes['Glucose'], os.path.join('escher', target_model))

Acetate                  : pred gr: 0.404 h-1 vs. exp 0.290, diff:  0.114
Glycerol                 : pred gr: 0.702 h-1 vs. exp 0.470, diff:  0.232
Fructose                 : pred gr: 0.664 h-1 vs. exp 0.540, diff:  0.124
L-Malate                 : pred gr: 0.735 h-1 vs. exp 0.550, diff:  0.185
Glucose                  : pred gr: 0.661 h-1 vs. exp 0.660, diff:  0.001
Glucose 6-Phosphate      : pred gr: 0.695 h-1 vs. exp 0.780, diff: -0.085

../_images/notebooks_2_Create_iML1515_modified_GECKO_27_1.png

Protein mass fractions:
Acetate                  : r² = 0.1066, p = 3.05e-26 ( 999 proteins lin scale)
Glycerol                 : r² = 0.0315, p = 1.61e-08 ( 999 proteins lin scale)
Fructose                 : r² = 0.1877, p = 5.65e-47 ( 999 proteins lin scale)
Glucose                  : r² = 0.1360, p = 1.55e-33 ( 999 proteins lin scale)
Acetate                  : r² = 0.2412, p = 3.63e-20 ( 309 proteins log scale)
Glycerol                 : r² = 0.1787, p = 5.86e-15 ( 312 proteins log scale)
Fructose                 : r² = 0.2014, p = 8.11e-17 ( 311 proteins log scale)
Glucose                  : r² = 0.2016, p = 2.22e-17 ( 322 proteins log scale)

condition: Glucose
1494 proteins in model with total predicted mass fraction of 537.9 mg/gP
      999 have been measured with mpmf of  514.4 mg/gP vs. 507.8 mg/gP predicted
           762 metabolic proteins measured 414.0 mg/gP vs. 392.2 mg/gP predicted
           237 transport proteins measured 100.4 mg/gP vs. 115.5 mg/gP predicted
      495 proteins not measured vs.  30.1 mg/gP predicted
           495 actual  proteins      30.1 mg/gP predicted
total           : r² = 0.1360, p = 1.55e-33 ( 999 proteins lin scale)
 metabolic      : r² = 0.1035, p = 8.34e-20 ( 762 proteins lin scale)
 transport      : r² = 0.4379, p = 3.11e-31 ( 237 proteins lin scale)
total           : r² = 0.2016, p = 2.22e-17 ( 322 proteins log scale)
 metabolic      : r² = 0.2199, p = 2.83e-16 ( 272 proteins log scale)
 transport      : r² = 0.2573, p = 1.70e-04 (  50 proteins log scale)

../_images/notebooks_2_Create_iML1515_modified_GECKO_27_3.png

1 file(s) exported for "Load reaction data" into Escher maps

References¶

Chang, A., Jeske, L., Ulbrich, S., Hofmann, J., Koblitz, J., Schomburg, I., Neumann-Schaal, M., Jahn, D., & Schomburg, D. (2021). BRENDA, the ELIXIR core data resource in 2021: new developments and

[ ]: