{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 7. Tutorial: Create iML1515 TFA model\n",
    "\n",
    "The thermodynamics (TD) model imposes additional constraints on the metabolic model, with flux directions being coupled to the transformed Gibbs energy of reaction. These transformations are with respect to compartmental pH and ionic strength and aligned to the calculations performed in the pyTFA package (Salvy et al., 2019). The standard Gibbs energies of formation, which are prerequisites for the calculations, are obtained from the thermodynamics database (Alberty, 2003; Jankowski et al., 2008) utilized by the pyTFA package (https://github.com/EPFL-LCSB/pytfa/blob/master/data/thermo_data.thermodb). This TD database contains data records for more than 18,000 molecules, which are referenced by SEED identifiers (Seaver et al., 2021). The records include molecule names, chemical formula, electrical charge, Gibbs energy of formation estimated using the group contribution method (Mavrovouniotis, 1990), the composition of cues (groups) making up the molecule, and acid dissociation constants. The TD database also includes the list of cues used in estimating the Gibbs energy of formation, together with their estimated Gibbs energy contribution and the estimation error.\n",
    "\n",
    "The implementation of TD constraints is contingent upon the availability of valid TD data records for all participating metabolites. Consequently, the modeler must ensure that the majority of the metabolites in the model have a corresponding TD data record in the TD database. The mapping of model metabolites to TD database entries can be achieved through various methods. One approach involves manual mapping via the table `modify_td_sids` in the TFA configuration file. This approach necessitates the definition of a regular expression pattern (`mid_regex_pattern` of the `general` table) to extract metabolite identifiers from model species identifiers. Alternatively, the name of the model species, in lower case, can be searched in the TD database. In the event of a failure to locate a match, a sorted list of SEED compound identifiers, extracted from the model species MIRIAM annotation data, is examined against entries in the TD database. In the event that a matching TD data record is identified, a final verification is conducted to ensure that the TD data record contains valid data and that the chemical formula and charge are consistent with the data configured for the model species. A discrepancy in protonation state with adjusted charge is permissible.\n",
    "\n",
    "The workflow for generating TD extended models and optimizing these models aligns with the workflows for enzyme constraint and resource balanced constraint models. TD extended models will be exported in SBML-compliant format, thereby facilitating model sharing and downstream processing by SBML-compliant tools. The optimization of the models will be done using cobrapy and alternatively using the gurobipy interface. When employing cobrapy, it is imperative to utilize a solver that supports mixed integer linear programming (MILP), and it is recommended to use a commercial solver such as IBM CPLEX or Gurobi.\n",
    "\n",
    "The extension of a model through the incorporation of thermodynamic constraints necessitates the establishment of a TFA configuration file. The process commences with the creation of an XbaModel, derived from a genome-scale metabolic model and configured without additional data. The XbaModel is then provided as a reference to a TfaModel, which, based on a preliminary TFA configuration file, adds thermodynamic constraints to yield a preliminary TD constraint model. It is important to note that the optimization of this preliminary TD constraint model may result in the identification of infeasible solutions or a substantial reduction in solution spaces. Constraints must be relaxed in such cases, as error estimates may be too narrow, estimated energies may be incorrectly determined, and thermodynamics calculations may be inexact. Additionally, issues inherent to the metabolic model may arise if multistep enzymatic reactions with energy conservation inside the enzyme are split into individual, independent reactions. Employing a model relaxation step that utilizes a temporary TD constraint \"slack\" model can ensure that predicted growth rates are in proximity to the predicted growth rates for the model devoid of TD constraints. In this workflow, we also demonstrate how to constrain selected metabolite concentrations to measured values (Buckstein Michael et al., 2008).\n",
    "\n",
    "Upon completion of this tutorial, the reader will find a description of the variables and constraints that have been incorporated into the genome-scale metabolic model, as well as the technical details about the calculations performed.\n",
    "\n",
    "Peter Schubert, Heinrich-Heine University Duesseldorf, Institute for Computational Cell Biology (Prof. Dr. M. Lercher), March, 2025"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Initial setup\n",
    "\n",
    "In order to facilitate thermodynamics modeling, it is necessary to import the `TfaModel`. Furthermore, in order to optimize the TFA model, we use `FbaOptimization`. To analyze the results, we use `FbaResults`.\n",
    "]\n",
    "Furthermore, the dictionary `metabolite_molar_concs` is created, with minimum and maximum values based on average nucleotide concentrations determined in *E. coli* (Buckstein Michael et al., 2008). This data will be used to constrain selected metabolite concentrations in the TFA model. As the focus remains on the level of FBA modeling, proteomics data is not imported."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "6 minimal media conditions created for simulation\n"
     ]
    }
   ],
   "source": [
    "# Required imports and model names\n",
    "import os\n",
    "import re\n",
    "import time\n",
    "from collections import defaultdict\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import cobra\n",
    "\n",
    "from f2xba import XbaModel, TfaModel\n",
    "from f2xba import FbaOptimization, FbaResults\n",
    "from f2xba.utils.mapping_utils import load_parameter_file, write_parameter_file\n",
    "\n",
    "fba_model = 'iML1515'\n",
    "target_model = 'iML1515_TFA'\n",
    "reference_cond = 'Glucose'\n",
    "biomass_rid = 'BIOMASS_Ec_iML1515_core_75p37M'\n",
    "\n",
    "# Create media conditions\n",
    "media_grs = {'Acetate': ['ac', 0.29], 'Glycerol': ['glyc', 0.47], 'Fructose': ['fru', 0.54], \n",
    "             'L-Malate': ['mal__L', 0.55], 'Glucose': ['glc__D', 0.66], 'Glucose 6-Phosphate': ['g6p', 0.78]}\n",
    "base_medium = ['ca2', 'cbl1', 'cl', 'co2', 'cobalt2', 'cu2', 'fe2', 'fe3', 'h2o', 'h', 'k', 'mg2', \n",
    "               'mn2', 'mobd', 'na1', 'nh4', 'ni2', 'o2', 'pi', 'sel', 'slnt', 'so4', 'tungs', 'zn2']\n",
    "conditions = {}\n",
    "exp_grs = {}\n",
    "for cond, (carbon_sid, exp_gr )in media_grs.items():\n",
    "    conditions[cond] = {f'EX_{sidx}_e': 1000.0 for sidx in base_medium}\n",
    "    conditions[cond][f'EX_{carbon_sid}_e'] = 10.0  # << carbon source uptake in FBA limited at 10.0 mmol/gDWh\n",
    "    exp_grs[cond] = exp_gr\n",
    "print(f'{len(conditions)} minimal media conditions created for simulation')\n",
    "\n",
    "# Metabolite concentrations to be fixed with a margin (Buckstein, 2008)\n",
    "nt_concs_µmol_per_l = {\n",
    "    'atp_c': 3560, 'adp_c': 116, 'datp_c': 181, 'ctp_c': 325, 'dctp_c': 184,\n",
    "    'gtp_c': 1660, 'gdp_c': 203, 'dgtp_c': 92, 'utp_c': 667, 'udp_c': 54,\n",
    "    'dttp_c': 256, 'ppgpp_c': 113, 'accoa_c': 1390}\n",
    "factor = 2.0\n",
    "metabolite_molar_concs = {sid: (µmol_per_l * 1e-6/factor, µmol_per_l*1e-6*factor) \n",
    "                          for sid, µmol_per_l in  nt_concs_µmol_per_l.items()}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### download thermodynamics database\n",
    "\n",
    "In order to incorporate thermodynamic constraints into the model, it is necessary to obtain information on Gibbs energies of formation for metabolites. This information is extracted from a thermodynamics database obtained from the pyTFA (Salvy et al., 2019) repository (https://github.com/EPFL-LCSB/pytfa/raw/refs/heads/master/data/thermo_data.thermodb). This database contains Gibbs energies of formation for metabolites, which have been calculated using the group contribution method (Alberty, 2003; Jankowski et al., 2008)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "thermodynamics database (pyTFA): data/thermo_data.thermodb\n"
     ]
    }
   ],
   "source": [
    "# Download thermodynamics database\n",
    "import wget\n",
    "    \n",
    "base_url = 'https://github.com/EPFL-LCSB/pytfa/raw/refs/heads/master/data/'\n",
    "fname = 'thermo_data.thermodb'\n",
    "\n",
    "full_fname = os.path.join('data', fname)\n",
    "if not os.path.exists(full_fname):\n",
    "    url = base_url + fname\n",
    "    print(f'\\ndownloading {fname} from {url}')\n",
    "    wget.download(url, out=full_fname)\n",
    "print(f'thermodynamics database (pyTFA): {full_fname}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Create XBA and preliminary TFA configuration files\n",
    "\n",
    "It is not necessary to apply any changes to the genome-scale metabolic model; therefore, a XBA configuration file will not be provided.\n",
    "\n",
    "The configuration data pertaining to thermodynamics constraints is collated in multiple tables and exported to a TFA configuration file. The tables designated as `general` and `td_compartments` are obligatory, while the remaining tables are optional. Within the `general` table, the file name of the TD database must be specified in the `thermo_data_fname` parameter. The corresponding file must have the same structure as the file ‘thermo_data.thermodb’ used in the pyTFA package. As we intend to hard link several metabolite identifiers to TD data records using the table ‘modify_td_sids’, we have to assign a regular expression pattern to the parameter `mid_regex_pattern`. This pattern will be used to derive metabolite identifiers from model species identifiers. The table `td_compartments` contains compartment-specific configuration parameters, including the compartment pH, the ionic strength (`ionic_strength_M`) in mol/L, the minimum and maximum metabolite concentrations (`c_min_M`, `c_max_M`) in mol/L, and the membrane potentials (`<other_cid>_mV`) in mV. The latter are calculated by subtracting the local from the \\<other_cid\\> compartment electrical potential. The table `modify_thermo_data` is optional; its utilization is for modifying data in the TD database, which was found to be inconsistent. A specific TD data record is identified by its identifier (`td_sid)`, and the `value` of a certain `attribute` can be changed. The `notes` field is an optional field that is used to annotate modifications. The optional table `modify_td_sids` facilitates the hard linking of selected metabolites to specific TD data records, thereby overruling the automated matching procedure. Another important configuration table, designated `modify_drg0_bounds`, will be added after model relaxation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Data in thermo-db, which requires update\n",
    "mod_thermo_data_cols = ['td_sid', 'attribute', 'value', 'notes']\n",
    "mod_thermo_data = [\n",
    "    ['cpd00637', 'charge_std', 1, 'met__D'],\n",
    "    ['cpd15461', 'charge_std', 2, 'fe3hox_un'],\n",
    "    ['cpd19833', 'charge_std', 0, '3amac'],\n",
    "    ['cpd19835', 'charge_std', 1, 'athtp'],\n",
    "    ['cpd19836', 'charge_std', 0, 'cdg'],\n",
    "    ['cpd19837', 'charge_std', 1, 'chtbs6p'],\n",
    "    ['cpd19838', 'charge_std', 0, 'cph4'],\n",
    "    ['cpd19839', 'charge_std', 1, 'ddcap'],\n",
    "    ['cpd19844', 'charge_std', 1, 'hdcap'],\n",
    "    ['cpd19845', 'charge_std', 1, 'hdceap'],\n",
    "    ['cpd19854', 'charge_std', 1, 'ocdcap'],\n",
    "    ['cpd19855', 'charge_std', 1, 'ocdceap'],\n",
    "    ['cpd19861', 'charge_std', 1, 'rephaccoa'],\n",
    "    ['cpd19863', 'charge_std', 1, 'ttdcap'],\n",
    "    ['cpd19864', 'charge_std', 1, 'ttdceap'],\n",
    "    ['cpd19848', 'charge_std', 1, 'malcoame']\n",
    "]\n",
    "\n",
    "# Model annotations can contain several seed ids, here we hard link selected metabolites to a specific seed id\n",
    "modify_td_sids_data = [['h2o', 'cpd00001'], ['lcts', 'cpd00208'], ['feoxam', 'cpd04761'], \n",
    "                       ['ddcap', 'cpd19839'], ['pheme', 'cpd00028'], ['malcoame', 'cpd19848']]\n",
    "modify_td_sids_cols = ['mid', 'td_sid']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4 table(s) with parameters written to  data/preliminary_tfa_parameters.xlsx\n"
     ]
    }
   ],
   "source": [
    "# preliminary TFA configuration file\n",
    "tfa_params = {}\n",
    "data = [['thermo_data_fname', os.path.join('data', 'thermo_data.thermodb')], \n",
    "        ['mid_regex_pattern', r'^M_(\\w+)_\\w+$']]\n",
    "tfa_params['general'] = pd.DataFrame(data, columns=['parameter', 'value']).set_index('parameter')\n",
    "\n",
    "cols = ['cid', 'ph', 'ionic_strength_M', 'c_min_M', 'c_max_M', 'c_mV', 'p_mV', 'e_mV']\n",
    "data = [['c', 7.5, 0.25, 1.0e-8, 0.05, 0.0, 150.0, 150.0], \n",
    "        ['p', 7.0, 0.0, 1.0e-8, 0.05, -150.0, 0.0, 0.0],\n",
    "        ['e', 7.0, 0.0, 1.0e-8, 0.10, -150.0, 0.0, 0.0]]\n",
    "tfa_params['td_compartments'] = pd.DataFrame(data, columns=cols).set_index('cid')\n",
    "\n",
    "# try models without 'modify_td_sids' and/or without 'modify_thermo_data'\n",
    "tfa_params['modify_td_sids'] = pd.DataFrame(modify_td_sids_data, columns=modify_td_sids_cols).set_index('mid')\n",
    "tfa_params['modify_thermo_data'] = pd.DataFrame(mod_thermo_data, columns=mod_thermo_data_cols).set_index('td_sid')\n",
    "\n",
    "write_parameter_file(os.path.join('data', 'preliminary_tfa_parameters.xlsx'), tfa_params)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3: Create preliminary TFA model\n",
    "\n",
    "The preliminary TFA model will be utilized for constraints relaxation. The creation of the TFA model is analogous to the creation of other extended models, except that we do not provide additional configuration data to the Xba model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "loading: SBML_models/iML1515.xml (last modified: Thu Dec  5 10:03:46 2024)\n",
      "1877 constraints (+0); 2712 variables (+0); 1516 genes (+0); 5 parameters (+0)\n",
      ">>> BASELINE XBA model configured!\n",
      "\n",
      "4 table(s) with parameters loaded from data/preliminary_tfa_parameters.xlsx (Fri Nov 21 11:15:26 2025)\n",
      "  16 thermo data attributes updated\n",
      " 938 metabolites with TD data covering 1546 model species; (331 not supported)\n",
      "1899 reactions supported by TD data; (476 not supported)\n",
      "7221 constraints to add\n",
      "7221 constraint ids added to the model (9098 total constraints)\n",
      "3668 ∆rG'/∆rG'˚ variables to add\n",
      "3668 variable ids added to the model (6380 total variables)\n",
      "2407 forward/reverse use variables to add\n",
      "2407 variable ids added to the model (8787 total variables)\n",
      "1525 log concentration variables to add\n",
      "1525 variable ids added to the model (10312 total variables)\n",
      "1834 TD reactions split in forward/reverse, 0 opened reverse direction\n",
      "2407 fwd/rev reactions to couple with flux direction\n",
      "2407 attributes on reaction instances updated\n",
      "   2 RHS variables to add\n",
      "   2 variable ids added to the model (10887 total variables)\n",
      "3682 parameters\n",
      "1525 species (929 metabolites) with TD data from total 1877 (1169)\n",
      "1834 metabolic/transporter reactions with TD data from total 2375\n",
      "9098 constraints (+7221); 10887 variables (+8175); 1516 genes (+0); 3682 parameters (+3677)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create preliminary TFA model\n",
    "xba_model = XbaModel(os.path.join('SBML_models', f'{fba_model}.xml'))\n",
    "xba_model.configure()\n",
    "\n",
    "tfa_model = TfaModel(xba_model)\n",
    "tfa_model.configure(os.path.join('data', 'preliminary_tfa_parameters.xlsx'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4: Model relaxation (cobrapy)\n",
    "\n",
    "The optimization of our preliminary TFA model may result in infeasible solutions or solutions with significantly reduced solution space. By relaxing constraints, namely the bounds of selected standard-transformed Gibbs energy of reaction variables, we can recover feasible optimization solutions for our selected media conditions. As a reference, we use the predicted growth rates for the original FBA model. Though not required, we perform the relaxation considering stiffer bounds on selected metabolites.\n",
    "\n",
    "The model relaxation workflow initiates with the construction of a slack model, which is subsequently loaded with cobrapy. The FBA growth rates are then determined for utilization as a reference. Consequently, the slack model undergoes optimization across the designated conditions, facilitating the identification of variables necessitating relaxation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.1 create slack model\n",
    "\n",
    "The utilization of the function `tfa_model.export_slack_model()` facilitates the generation of a slack model, incorporating positive and negative slack variables within the constraints that determine the transformed Gibbs free energies or reactions. The optimization objective of the slack model is modified to minimize the sum of slack."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3668 slack variables for ∆rG'˙ to add\n",
      "3668 variable ids added to the model (14555 total variables)\n",
      "model exported to SBML format: SBML_models/iML1515_TFA_slack.xml\n"
     ]
    }
   ],
   "source": [
    "# Create slack model\n",
    "slack_fname = os.path.join('SBML_models', f'{target_model}_slack.xml')\n",
    "tfa_model.export_slack_model(slack_fname)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2 loading of slack model\n",
    "\n",
    "The slack is loaded with cobrapy, and the solver parameter ‘presolve’ is activated for model relaxation. Selected optimization variables are reconfigured from ‘continuous’ to ‘binary’ by `FbaOptimization`, and the sign of selected constraints is changed from ‘equality’ to ‘inequality’. The value ranges of selected nucleotide concentrations in the model are constrained by `fo.set_tfa_metab_concentrations()`, which is optional."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Set parameter Username\n",
      "Set parameter LicenseID to value 2731209\n",
      "Academic license - for non-commercial use only - expires 2026-10-31\n",
      "TD Slack model loaded from SBML_models/iML1515_TFA_slack.xml, last modified: Fri Nov 21 11:15:58 2025\n",
      "SBML model loaded by sbmlxdf: SBML_models/iML1515_TFA_slack.xml (Fri Nov 21 11:15:58 2025)\n",
      "Thermodynamic use variables (V_FU_xxx and V_RU_xxx) as binary\n",
      "Thermodynamic constraints (C_F[FR]C_xxx, C_G[FR]C_xxx, C_SU_xxx) ≤ 0\n",
      "1834 TD reaction constraints\n",
      "13 metabolite concentrations constrained\n"
     ]
    }
   ],
   "source": [
    "# Load slack model (cobrapy)\n",
    "tfa_slack_model = cobra.io.read_sbml_model(slack_fname)\n",
    "print(f'TD Slack model loaded from {slack_fname}, last modified: {time.ctime(os.path.getmtime(slack_fname))}')\n",
    "tfa_slack_model.solver.configuration.presolve = True\n",
    "\n",
    "# Reconfigure variables and constraints\n",
    "fo = FbaOptimization(slack_fname, tfa_slack_model)\n",
    "\n",
    "# Set nucleotide concentrations (optional)\n",
    "for sid, (lb, ub) in metabolite_molar_concs.items():\n",
    "    tfa_slack_model.reactions.get_by_id('V_LC_' + sid).bounds = (np.log(lb), np.log(ub))\n",
    "\n",
    "print(f'{len(metabolite_molar_concs)} metabolite concentrations constrained')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.3 FBA growth rates\n",
    "\n",
    "We employ cobrapy to predict the growth rates of the FBA model under our specific media conditions. These predictions subsequently serve as targets for the slack model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "6 growth conditions with FBA growth rates in range [0.21 - 0.91] h-1.\n"
     ]
    }
   ],
   "source": [
    "# Determine FBA growth rates (using cobrapy)\n",
    "fba_fname = os.path.join('SBML_models', f'{fba_model}.xml')\n",
    "gem_model = cobra.io.read_sbml_model(fba_fname)\n",
    "gem_model.objective = biomass_rid\n",
    "\n",
    "fba_grs = {}\n",
    "for condition, medium in conditions.items():\n",
    "    gem_model.medium = medium\n",
    "    fba_grs[condition] = gem_model.slim_optimize()\n",
    "print(f'{len(fba_grs)} growth conditions with FBA growth rates in range ' \n",
    "      f'[{min(fba_grs.values()):.2f} - {max(fba_grs.values()):.2f}] h-1.')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.4 relax variable bounds\n",
    "\n",
    "In this process, instances of standard-transformed Gibbs energy of reaction variables where bounds require modification are identified. The slack model is optimized under specific media conditions, while the lower bound of the biomass reactions is configured at a level of 95% of the respective predicted growth rate of the FBA model. The objective function of the slack model is to minimize the total slack. Any slack encountered is used to modify the bound of the respective standard-transformed Gibbs energy of reaction variable using `fo.update_relaxation()`. The modifications are collected in `drg0_relaxations` and subsequently used to modify the TFA configuration file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Acetate              (0.200 h-1): slack of  167.18(optimal) ->  3 ∆rG'˚ updates\n",
      "Glycerol             (0.470 h-1): slack of    0.00(optimal) ->  0 ∆rG'˚ updates\n",
      "Fructose             (0.833 h-1): slack of    0.00(optimal) ->  0 ∆rG'˚ updates\n",
      "L-Malate             (0.391 h-1): slack of    0.00(optimal) ->  0 ∆rG'˚ updates\n",
      "Glucose              (0.833 h-1): slack of    0.00(optimal) ->  0 ∆rG'˚ updates\n",
      "Glucose 6-Phosphate  (0.860 h-1): slack of    0.00(optimal) ->  0 ∆rG'˚ updates\n",
      "3 dgr0 relaxations required - update \n",
      "Duration: 6.5 s\n",
      "3 variables that need relaxation:\n",
      "{'V_DRG0_MECDPS': {'fbc_lower_bound': 83.39185422440065}, 'V_DRG0_ATPS4rpp': {'fbc_lower_bound': -29.10627304266519}, 'V_DRG0_MALCOAMT': {'fbc_lower_bound': 75.96556319818774}}\n"
     ]
    }
   ],
   "source": [
    "# relax drg0 bounds across several conditions\n",
    "start = time.time()\n",
    "\n",
    "pct_fba_gr = 0.95\n",
    "drg0_relaxations = {}\n",
    "for condition, medium in conditions.items():\n",
    "    \n",
    "    tfa_slack_model.medium = medium\n",
    "    tfa_slack_model.reactions.get_by_id(biomass_rid).lower_bound = pct_fba_gr * fba_grs[condition] \n",
    "\n",
    "    if np.isfinite(tfa_slack_model.slim_optimize()) is False:\n",
    "        print(f'{condition} no solution during slack minimization')\n",
    "    else:\n",
    "        solution = tfa_slack_model.optimize()\n",
    "        gr = solution.fluxes[biomass_rid]\n",
    "        \n",
    "        req_relaxations = fo.update_relaxation(solution.fluxes)\n",
    "        drg0_relaxations.update(req_relaxations)\n",
    "        print(f'{condition:20s} ({gr:5.3f} h-1): slack of {solution.objective_value:7.2f}' \n",
    "              f\"({solution.status}) -> {len(req_relaxations):2d} ∆rG'˚ updates\")\n",
    "        \n",
    "tfa_slack_model.reactions.get_by_id(biomass_rid).lower_bound = 0.0\n",
    "print(f'{len(drg0_relaxations)} dgr0 relaxations required - update ')\n",
    "print(f'Duration: {time.time()-start:.1f} s')\n",
    "print(f'{len(drg0_relaxations)} variables that need relaxation:')\n",
    "print(drg0_relaxations)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 5: Update TFA parameters and create final TFA model\n",
    "\n",
    "The TFA configuration file is extended with the table designated as `modify_drg0_bounds`, which encompasses the parameters determined during the process of model relaxation. For the purpose of model inspection, the option of exporting the model to spreadsheet format is available."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4 table(s) with parameters loaded from data/preliminary_tfa_parameters.xlsx (Fri Nov 21 11:15:26 2025)\n",
      "5 table(s) with parameters written to  data/iML1515_TFA_tfa_parameters.xlsx\n",
      "loading: SBML_models/iML1515.xml (last modified: Thu Dec  5 10:03:46 2024)\n",
      "1877 constraints (+0); 2712 variables (+0); 1516 genes (+0); 5 parameters (+0)\n",
      ">>> BASELINE XBA model configured!\n",
      "\n",
      "5 table(s) with parameters loaded from data/iML1515_TFA_tfa_parameters.xlsx (Fri Nov 21 11:16:34 2025)\n",
      "  16 thermo data attributes updated\n",
      " 938 metabolites with TD data covering 1546 model species; (331 not supported)\n",
      "1899 reactions supported by TD data; (476 not supported)\n",
      "7221 constraints to add\n",
      "7221 constraint ids added to the model (9098 total constraints)\n",
      "3668 ∆rG'/∆rG'˚ variables to add\n",
      "3668 variable ids added to the model (6380 total variables)\n",
      "2407 forward/reverse use variables to add\n",
      "2407 variable ids added to the model (8787 total variables)\n",
      "1525 log concentration variables to add\n",
      "1525 variable ids added to the model (10312 total variables)\n",
      "1834 TD reactions split in forward/reverse, 0 opened reverse direction\n",
      "2407 fwd/rev reactions to couple with flux direction\n",
      "2407 attributes on reaction instances updated\n",
      "   2 RHS variables to add\n",
      "   2 variable ids added to the model (10887 total variables)\n",
      "3682 parameters\n",
      "   3 ∆Gr'˚ variables need relaxation.\n",
      "   3 attributes on parameter instances updated\n",
      "1525 species (929 metabolites) with TD data from total 1877 (1169)\n",
      "1834 metabolic/transporter reactions with TD data from total 2375\n",
      "9098 constraints (+7221); 10887 variables (+8175); 1516 genes (+0); 3682 parameters (+3677)\n",
      "model exported to SBML format: SBML_models/iML1515_TFA.xml\n",
      "model exported to Excel Spreadsheet: xlsx_models/iML1515_TFA.xlsx\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# update TFA configuration file\n",
    "tfa_params = load_parameter_file(os.path.join('data', 'preliminary_tfa_parameters.xlsx'))\n",
    "data = []\n",
    "for var_id, bounds in drg0_relaxations.items():\n",
    "    for bound_id, val in bounds.items():\n",
    "        data.append([var_id, 'reaction', bound_id, val])\n",
    "cols = ['id', 'component', 'attribute', 'value']\n",
    "tfa_params['modify_drg0_bounds'] = pd.DataFrame(data, columns=cols).set_index('id')\n",
    "write_parameter_file(os.path.join('data', f'{target_model}_tfa_parameters.xlsx'), tfa_params)\n",
    "\n",
    "# create TFA model\n",
    "xba_model = XbaModel(os.path.join('SBML_models', f'{fba_model}.xml'))\n",
    "xba_model.configure()\n",
    "tfa_model = TfaModel(xba_model)\n",
    "tfa_model.configure(os.path.join('data', f'{target_model}_tfa_parameters.xlsx'))\n",
    "tfa_model.export(os.path.join('SBML_models', f'{target_model}.xml'))\n",
    "tfa_model.export(os.path.join('xlsx_models', f'{target_model}.xlsx')) # optional export to spreadsheet format"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "---\n",
    "## Step 6. Load and optimize TFA model (cobrapy)\n",
    "\n",
    "In this section, we will demonstrate the utilization of cobrapy for the purpose of TFA model loading and optimization. In the following section, we will show the optimization using the gurobipy interface. The optimization problem is a mixed-integer linear optimization (MILP) problem. While MILP optimization is supported by cobrapy, some variables and constraints need to be modified. This is done automatically by `FbaOptimization`. As an alternative option, in a non-Python environment, these variables and constraints could be executed separately after model loading. The following Python example code is provided:\n",
    "\n",
    "        tfam = cobra.io.read_sbml_model(os.path.join('SBML_models', f'{target_model}.xml'))\n",
    "        for var in tfam.variables:\n",
    "            for vprefix in ['V_FU_', 'V_RU_']: \n",
    "                if re.match(vprefix, var.name) and 'reverse' not in var.name:\n",
    "                    var.type = 'binary'        \n",
    "        for constr in tfam.constraints:\n",
    "            for cprefix in ['C_FFC_', 'C_FRC_', 'C_GFC_', 'C_GRC_', 'C_SU_']:\n",
    "                if re.match(cprefix, constr.name):\n",
    "                    constr.lb = None\n",
    "                    \n",
    "Constraining the concentrations of nucleotides is an optional process that can be carried out by utilizing the function `fo.set_tfa_metab_concentrations()`. In instances where the ‘FbaOptimization’ class is unavailable, the concentrations can be set using the Python code provided below.\n",
    "\n",
    "        import numpy as np\n",
    "        for sid, (lb, ub) in metabolite_molar_concs.items():\n",
    "            tfam.reactions.get_by_id('V_LC_' + sid).bounds = (np.log(lb), np.log(ub))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SBML model loaded by sbmlxdf: SBML_models/iML1515_TFA.xml (Fri Nov 21 11:16:59 2025)\n",
      "Thermodynamic use variables (V_FU_xxx and V_RU_xxx) as binary\n",
      "Thermodynamic constraints (C_F[FR]C_xxx, C_G[FR]C_xxx, C_SU_xxx) ≤ 0\n",
      "1834 TD reaction constraints\n",
      "13 metabolite concentrations constrained\n"
     ]
    }
   ],
   "source": [
    "# Load model using cobrapy\n",
    "fname = os.path.join('SBML_models', f'{target_model}.xml')\n",
    "tfam = cobra.io.read_sbml_model(fname)\n",
    "for sid, (lb, ub) in metabolite_molar_concs.items():\n",
    "    tfam.reactions.get_by_id('V_LC_' + sid).bounds = (np.log(lb), np.log(ub))\n",
    "\n",
    "# Reconfigure variables and constraints\n",
    "fo = FbaOptimization(fname, tfam)\n",
    "\n",
    "# Set nucleotide concentrations (optional)\n",
    "for sid, (lb, ub) in metabolite_molar_concs.items():\n",
    "    tfam.reactions.get_by_id('V_LC_' + sid).bounds = (np.log(lb), np.log(ub))\n",
    "\n",
    "print(f'{len(metabolite_molar_concs)} metabolite concentrations constrained')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Acetate                  : TFA gr: 0.210 h-1 vs. FBA 0.210, diff: -0.000\n",
      "Glycerol                 : TFA gr: 0.495 h-1 vs. FBA 0.495, diff: -0.000\n",
      "Fructose                 : TFA gr: 0.877 h-1 vs. FBA 0.877, diff: -0.000\n",
      "L-Malate                 : TFA gr: 0.411 h-1 vs. FBA 0.411, diff: -0.000\n",
      "Glucose                  : TFA gr: 0.877 h-1 vs. FBA 0.877, diff: -0.000\n",
      "Glucose 6-Phosphate      : TFA gr: 0.905 h-1 vs. FBA 0.905, diff: -0.000\n",
      "1 file(s) exported for \"Load reaction data\" into Escher maps\n",
      "1 file(s) exported for \"Load metabolite data\" into Escher maps\n"
     ]
    }
   ],
   "source": [
    "# Optimize model using cobrapy and analyze results\n",
    "pred_results = {}\n",
    "for cond, medium in conditions.items():\n",
    "    with tfam as model:    \n",
    "        model.medium = medium \n",
    "        solution = model.optimize()\n",
    "        if solution.status == 'optimal':\n",
    "            gr = solution.objective_value\n",
    "            pred_results[cond] = solution\n",
    "            print(f'{cond:25s}: TFA gr: {gr:.3f} h-1 vs. FBA {fba_grs[cond]:.3f}, '\n",
    "                  f'diff: {gr - fba_grs[cond]:6.3f}')\n",
    "        else:    \n",
    "            print(f'{cond} ended with status {solution.status}')\n",
    "\n",
    "fr = FbaResults(fo, pred_results)\n",
    "df_fluxes = fr.collect_fluxes()\n",
    "df_net_fluxes = fr.collect_fluxes(net=True)\n",
    "df_species_conc = fr.collect_species_conc()\n",
    "fr.save_to_escher(df_net_fluxes[reference_cond], os.path.join('escher', target_model))\n",
    "fr.save_to_escher(df_species_conc[reference_cond], os.path.join('escher', target_model))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### species concentrations\n",
    "\n",
    "The function `fr.collect_species_conc()` enables the retrieval of a table that contains metabolite concentrations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1525 species concentrations\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>rank</th>\n",
       "      <th>mean mmol_per_l</th>\n",
       "      <th>stdev</th>\n",
       "      <th>Acetate</th>\n",
       "      <th>Glycerol</th>\n",
       "      <th>Fructose</th>\n",
       "      <th>L-Malate</th>\n",
       "      <th>Glucose</th>\n",
       "      <th>Glucose 6-Phosphate</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sid</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>atp_c</th>\n",
       "      <td>ATP C10H12N5O13P3</td>\n",
       "      <td>1046</td>\n",
       "      <td>1.780000</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>1.7800</td>\n",
       "      <td>1.7800</td>\n",
       "      <td>1.7800</td>\n",
       "      <td>1.7800</td>\n",
       "      <td>1.780000</td>\n",
       "      <td>1.780000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>adp_c</th>\n",
       "      <td>ADP C10H12N5O10P2</td>\n",
       "      <td>1130</td>\n",
       "      <td>0.203000</td>\n",
       "      <td>7.103520e-02</td>\n",
       "      <td>0.2320</td>\n",
       "      <td>0.2320</td>\n",
       "      <td>0.0580</td>\n",
       "      <td>0.2320</td>\n",
       "      <td>0.232000</td>\n",
       "      <td>0.232000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>datp_c</th>\n",
       "      <td>DATP C10H12N5O12P3</td>\n",
       "      <td>1154</td>\n",
       "      <td>0.090500</td>\n",
       "      <td>1.520235e-17</td>\n",
       "      <td>0.0905</td>\n",
       "      <td>0.0905</td>\n",
       "      <td>0.0905</td>\n",
       "      <td>0.0905</td>\n",
       "      <td>0.090500</td>\n",
       "      <td>0.090500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ctp_c</th>\n",
       "      <td>CTP C9H12N3O14P3</td>\n",
       "      <td>1069</td>\n",
       "      <td>0.645782</td>\n",
       "      <td>6.534249e-03</td>\n",
       "      <td>0.6500</td>\n",
       "      <td>0.6500</td>\n",
       "      <td>0.6500</td>\n",
       "      <td>0.6500</td>\n",
       "      <td>0.637346</td>\n",
       "      <td>0.637346</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>dctp_c</th>\n",
       "      <td>DCTP C9H12N3O13P3</td>\n",
       "      <td>1153</td>\n",
       "      <td>0.092000</td>\n",
       "      <td>1.520235e-17</td>\n",
       "      <td>0.0920</td>\n",
       "      <td>0.0920</td>\n",
       "      <td>0.0920</td>\n",
       "      <td>0.0920</td>\n",
       "      <td>0.092000</td>\n",
       "      <td>0.092000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                      name  rank  mean mmol_per_l         stdev  Acetate  \\\n",
       "sid                                                                        \n",
       "atp_c    ATP C10H12N5O13P3  1046         1.780000  0.000000e+00   1.7800   \n",
       "adp_c    ADP C10H12N5O10P2  1130         0.203000  7.103520e-02   0.2320   \n",
       "datp_c  DATP C10H12N5O12P3  1154         0.090500  1.520235e-17   0.0905   \n",
       "ctp_c     CTP C9H12N3O14P3  1069         0.645782  6.534249e-03   0.6500   \n",
       "dctp_c   DCTP C9H12N3O13P3  1153         0.092000  1.520235e-17   0.0920   \n",
       "\n",
       "        Glycerol  Fructose  L-Malate   Glucose  Glucose 6-Phosphate  \n",
       "sid                                                                  \n",
       "atp_c     1.7800    1.7800    1.7800  1.780000             1.780000  \n",
       "adp_c     0.2320    0.0580    0.2320  0.232000             0.232000  \n",
       "datp_c    0.0905    0.0905    0.0905  0.090500             0.090500  \n",
       "ctp_c     0.6500    0.6500    0.6500  0.637346             0.637346  \n",
       "dctp_c    0.0920    0.0920    0.0920  0.092000             0.092000  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(f'{len(df_species_conc)} species concentrations')\n",
    "df_species_conc.loc[list(metabolite_molar_concs)].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Closing remarks\n",
    "\n",
    "The generation of a thermodynamics constraint model of iML1515 is a straightforward process requiring minimal effort in configuration. A preliminary TFA model was used to relax some of the thermodynamics constraints, and the final TFA model predicts similar growth rates as the FBA model. It also determines a consistent set of metabolite concentrations that ensure that flux directions align with negative values of transformed Gibbs energies of reaction. \n",
    "\n",
    "In the subsequent tutorials, we will integrate the TFA-specific configuration with ECM, and the RBA-specific configuration data, to generate thermodynamics constraint TGECKO and TRBA models of iML1515."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "---\n",
    "## (Alternative) gurobipy – TFA model creation\n",
    "\n",
    "The workflow of generating a TFA model using the gurobipy interface involves similar steps as presented above; however, the workflow deviates with the loading of the slack model using `FbaOptimization`. In this section, we will demonstrate the use of gurobipy to predict the growth rates of the FBA model.During parameter relaxation of the slack model, the minimal expected growth rates are set using `fo.set_variable_bounds()`. The update for the TFA configuration file and the creation of the final TFA model remain unchanged."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SBML model loaded by sbmlxdf: SBML_models/iML1515_TFA_slack.xml (Fri Nov 21 11:15:58 2025)\n",
      "MILP Model of iML1515_TFA\n",
      "14555 variables, 9098 constraints, 39194 non-zero matrix coefficients\n",
      "1834 TD reaction constraints\n",
      "13 metabolite concentrations constrained\n"
     ]
    }
   ],
   "source": [
    "# Load slack model (gurobipy)\n",
    "fo = FbaOptimization(slack_fname)\n",
    "\n",
    "# Set nucleotide concentrations (optional)\n",
    "orig_concs = fo.set_tfa_metab_concentrations(metabolite_molar_concs)\n",
    "print(f'{len(metabolite_molar_concs)} metabolite concentrations constrained')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SBML model loaded by sbmlxdf: SBML_models/iML1515.xml (Thu Dec  5 10:03:46 2024)\n",
      "LP Model of iML1515\n",
      "2712 variables, 1877 constraints, 10575 non-zero matrix coefficients\n",
      "6 growth conditions with FBA growth rates in range [0.21 - 0.91] h-1.\n"
     ]
    }
   ],
   "source": [
    "# Determine FBA growth rates (gurobipy)\n",
    "fba_fo = FbaOptimization(os.path.join('SBML_models', f'{fba_model}.xml'))\n",
    "fba_fo.set_objective({biomass_rid: 1.0}, 'max')\n",
    "fba_grs = {}\n",
    "for condition, medium in conditions.items():\n",
    "    fba_fo.medium = medium\n",
    "    fba_grs[condition] = fba_fo.optimize().objective_value\n",
    "print(f'{len(fba_grs)} growth conditions with FBA growth rates in range ' \n",
    "      f'[{min(fba_grs.values()):.2f} - {max(fba_grs.values()):.2f}] h-1.')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Acetate              (0.200 h-1): slack of  167.18(optimal) ->  4 ∆rG'˚ updates\n",
      "Glycerol             (0.470 h-1): slack of    0.00(optimal) ->  0 ∆rG'˚ updates\n",
      "Fructose             (0.833 h-1): slack of    0.00(optimal) ->  0 ∆rG'˚ updates\n",
      "L-Malate             (0.391 h-1): slack of    0.00(optimal) ->  0 ∆rG'˚ updates\n",
      "Glucose              (0.833 h-1): slack of    0.00(optimal) ->  0 ∆rG'˚ updates\n",
      "Glucose 6-Phosphate  (0.860 h-1): slack of    0.00(optimal) ->  0 ∆rG'˚ updates\n",
      "4 dgr0 relaxations required - update \n",
      "Duration: 4.7 s\n",
      "4 variables that need relaxation:\n",
      "{'V_DRG0_MECDPS': {'fbc_lower_bound': 83.39185422440065}, 'V_DRG0_ATPS4rpp': {'fbc_lower_bound': -25.456120671846797}, 'V_DRG0_PRAIS': {'fbc_lower_bound': 62.67390227094072}, 'V_DRG0_MALCOAMT': {'fbc_lower_bound': 75.96556319818777}}\n"
     ]
    }
   ],
   "source": [
    "# relax drg0 bounds across several conditions (gurobipy)\n",
    "start = time.time()\n",
    "\n",
    "pct_fba_gr = 0.95\n",
    "drg0_relaxations = {}\n",
    "for condition, medium in conditions.items():\n",
    "    \n",
    "    fo.medium = medium\n",
    "    orig_bounds = fo.set_variable_bounds({biomass_rid:(pct_fba_gr * fba_grs[condition], None)})\n",
    "\n",
    "    solution = fo.optimize()\n",
    "    if solution.status == 'optimal':\n",
    "        gr = solution.fluxes[biomass_rid]\n",
    "        req_relaxations = fo.update_relaxation(solution.fluxes)\n",
    "        drg0_relaxations.update(req_relaxations)\n",
    "        print(f'{condition:20s} ({gr:5.3f} h-1): slack of {solution.objective_value:7.2f}' \n",
    "              f\"({solution.status}) -> {len(req_relaxations):2d} ∆rG'˚ updates\")\n",
    "    else:\n",
    "        print(f'{condition} no solution during slack minimization')\n",
    "fo.set_variable_bounds({biomass_rid: (0.0, 1000.0)})\n",
    "\n",
    "print(f'{len(drg0_relaxations)} dgr0 relaxations required - update ')\n",
    "print(f'Duration: {time.time()-start:.1f} s')\n",
    "print(f'{len(drg0_relaxations)} variables that need relaxation:')\n",
    "print(drg0_relaxations)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4 table(s) with parameters loaded from data/preliminary_tfa_parameters.xlsx (Fri Apr 11 15:06:56 2025)\n",
      "5 table(s) with parameters written to  data/iML1515_TFA_tfa_parameters.xlsx\n",
      "loading: SBML_models/iML1515.xml (last modified: Thu Dec  5 10:03:46 2024)\n",
      "1877 constraints (+0); 2712 variables (+0); 1516 genes (+0); 5 parameters (+0)\n",
      ">>> BASELINE XBA model configured!\n",
      "\n",
      "5 table(s) with parameters loaded from data/iML1515_TFA_tfa_parameters.xlsx (Fri Apr 11 15:09:45 2025)\n",
      "  16 thermo data attributes updated\n",
      " 938 metabolites with TD data covering 1546 model species; (331 not supported)\n",
      "1899 reactions supported by TD data; (476 not supported)\n",
      "7221 constraints to add\n",
      "7221 constraint ids added to the model (9098 total constraints)\n",
      "3668 ∆rG'/∆rG'˚ variables to add\n",
      "3668 variable ids added to the model (6380 total variables)\n",
      "2407 forward/reverse use variables to add\n",
      "2407 variable ids added to the model (8787 total variables)\n",
      "1525 log concentration variables to add\n",
      "1525 variable ids added to the model (10312 total variables)\n",
      "1834 TD reactions split in forward/reverse, 0 opened reverse direction\n",
      "2407 fwd/rev reactions to couple with flux direction\n",
      "2407 attributes on reaction instances updated\n",
      "   2 RHS variables to add\n",
      "   2 variable ids added to the model (10887 total variables)\n",
      "3682 parameters\n",
      "   3 ∆Gr'˚ variables need relaxation.\n",
      "   3 attributes on parameter instances updated\n",
      "1525 species (929 metabolites) with TD data from total 1877 (1169)\n",
      "1834 metabolic/transporter reactions with TD data from total 2375\n",
      "9098 constraints (+7221); 10887 variables (+8175); 1516 genes (+0); 3682 parameters (+3677)\n",
      "model exported to SBML: SBML_models/iML1515_TFA.xml\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# update TFA configuration file\n",
    "tfa_params = load_parameter_file(os.path.join('data', 'preliminary_tfa_parameters.xlsx'))\n",
    "data = []\n",
    "for var_id, bounds in drg0_relaxations.items():\n",
    "    for bound_id, val in bounds.items():\n",
    "        data.append([var_id, 'reaction', bound_id, val])\n",
    "cols = ['id', 'component', 'attribute', 'value']\n",
    "tfa_params['modify_drg0_bounds'] = pd.DataFrame(data, columns=cols).set_index('id')\n",
    "write_parameter_file(os.path.join('data', f'{target_model}_tfa_parameters.xlsx'), tfa_params)\n",
    "\n",
    "# create TFA model\n",
    "xba_model = XbaModel(os.path.join('SBML_models', f'{fba_model}.xml'))\n",
    "xba_model.configure()\n",
    "tfa_model = TfaModel(xba_model)\n",
    "tfa_model.configure(os.path.join('data', f'{target_model}_tfa_parameters.xlsx'))\n",
    "tfa_model.export(os.path.join('SBML_models', f'{target_model}.xml'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## (Alternative) gurobipy – TFA model optimization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SBML model loaded by sbmlxdf: SBML_models/iML1515_TFA.xml (Fri Nov 21 11:16:59 2025)\n",
      "MILP Model of iML1515_TFA\n",
      "10887 variables, 9098 constraints, 35526 non-zero matrix coefficients\n",
      "1834 TD reaction constraints\n",
      "13 metabolite concentrations constrained\n"
     ]
    }
   ],
   "source": [
    "# Load TFA model using gurobipy\n",
    "fname = os.path.join('SBML_models', f'{target_model}.xml')\n",
    "fo = FbaOptimization(fname)\n",
    "\n",
    "# Set nucleotide concentrations (optional)\n",
    "orig_concs = fo.set_tfa_metab_concentrations(metabolite_molar_concs)\n",
    "print(f'{len(orig_concs)} metabolite concentrations constrained')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Acetate                  : TFA gr: 0.210 h-1 vs. FBA 0.210, diff: -0.000\n",
      "Glycerol                 : TFA gr: 0.495 h-1 vs. FBA 0.495, diff: -0.000\n",
      "Fructose                 : TFA gr: 0.877 h-1 vs. FBA 0.877, diff: -0.000\n",
      "L-Malate                 : TFA gr: 0.411 h-1 vs. FBA 0.411, diff: -0.000\n",
      "Glucose                  : TFA gr: 0.877 h-1 vs. FBA 0.877, diff: -0.000\n",
      "Glucose 6-Phosphate      : TFA gr: 0.905 h-1 vs. FBA 0.905, diff: -0.000\n",
      "1 file(s) exported for \"Load reaction data\" into Escher maps\n",
      "1 file(s) exported for \"Load metabolite data\" into Escher maps\n"
     ]
    }
   ],
   "source": [
    "# Optimize model using gurobipy and analyze results\n",
    "pred_results = {}\n",
    "for cond, medium in conditions.items():\n",
    "    fo.medium = medium\n",
    "    solution = fo.optimize()\n",
    "    if solution.status == 'optimal':\n",
    "        gr = solution.objective_value\n",
    "        pred_results[cond] = solution\n",
    "        print(f'{cond:25s}: TFA gr: {gr:.3f} h-1 vs. FBA {fba_grs[cond]:.3f}, '\n",
    "              f'diff: {gr - fba_grs[cond]:6.3f}')\n",
    "    else:    \n",
    "        print(f'{cond} ended with status {solution.status}')\n",
    "\n",
    "fr = FbaResults(fo, pred_results)\n",
    "df_fluxes = fr.collect_fluxes()\n",
    "df_net_fluxes = fr.collect_fluxes(net=True)\n",
    "df_species_conc = fr.collect_species_conc()\n",
    "fr.save_to_escher(df_net_fluxes[reference_cond], os.path.join('escher', target_model))\n",
    "fr.save_to_escher(df_species_conc[reference_cond], os.path.join('escher', target_model))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "---\n",
    "## Implementation details\n",
    "\n",
    "The ensuing sections delineate the variables and constraints that have been integrated into the genome-scale metabolic model during its conversion into a thermodynamics constraint model. These sections also elucidate the formulas employed to derive the numerical values.\n",
    "\n",
    "### Variables and Constraints\n",
    "\n",
    "The following paragraphs detail the variables and constraints that have been incorporated into the genome-scale metabolic model. To illustrate this, the reaction of fructose-bisphosphate aldolase is examined, which is implemented as a reversible reaction in the iML1515 model with the identifier R_FBA: 'fdp_c -> dhap_c + g3p_c'.  During the extension of the model, the reaction is rendered irreversible, designated as `R_FBA`: 'fdp_c => dhap_c + g3p_c', and a new reaction catalyzing the reverse direction is incorporated. This reverse reaction is designated as `R_FBA_REV`: 'dhap_c + g3p_c => fdp_c'. It is noteworthy that the reverse reaction is not included for reactions that have been configured as irreversible in the original model.\n",
    "\n",
    "Two additional variables are incorporated to represent the transformed Gibbs energy of reaction, designated as `V_DRG_FBA` and `V_DRG0_FBA`, respectively. The former is assigned unlimited bounds, while the latter is constrained to the calculated standard transformed Gibbs energy of reaction plus or minus the estimation error.\n",
    "\n",
    "The log concentration variables, designated as `V_LC_fdp_c`, `V_LC_dhap_c`, and `V_LC_g3p_c`, are incorporated with default bounds as per compartmental configuration, and the concentrations are expressed in units of mol/L. The calculation formula for the transformed Gibbs energy of reaction is implemented by the constraint `C_DRG_FBA`: 'V_DRG_FBA = V_DRG0_FBA - 2.48 V_LC_fdp_c + 2.48 V_LC_dhap_c + 2.48 V_LC_g3p_c' (RT = 2.48 kJ/mol).\n",
    "\n",
    "Two binary variables (values 0 or 1) designated \"V_FU_FBA\" (\"forward use\") and \"V_RU_FBA\" (\"reverse use\") are introduced to couple the transformed Gibbs energy of reaction to the flux direction. The implementation of a \"simultaneous use\" constraint, denoted as `C_SU_FBA`, ensures that only one of the use variables can take the value \"1.\" This is expressed as \"V_FU_FBA + V_RU_FBA ≤ 1.\" Two Gibbs energy coupling constraints, designated as `C_GFC_FBA` and `C_GRC_FBA`, couple the forward and reverse use variables to the transformed Gibbs energy of reaction with inequalities ‘V_DRG_FBA ≤ 999.99 - 1000 V_FU_FBA’, thereby forcing V_DRG_FBA ≤ 0.01 kJ/mol when V_FU_FBA is active, and 'V_DRG_FBA ≥ 1000 V_RU_FBA - 999.99', forcing V_DRG_FBA ≥ 0.01 kJ/mol when V_RU_FBA is active. In a similar fashion, reactions fluxes in forward and reverse direction are coupled to the forward and reverse use variables via the flux coupling constraints C_FFC_FBA: R_FBA ≤ 1000 V_FU_FBA and C_FRC_FBA: R_FBA_REV ≤ 1000 V_FB_FBA. This configuration can be readily verified by exporting the TFA model to the '.xlsx' format.\n",
    "\n",
    "### Calculation Details \n",
    "\n",
    "Thermodynamic constraints couple reaction flux directionality with Gibbs energy of reaction. The transformed Gibbs energy values employed in this context are derived through transformations with respect to compartmental pH and ionic strength at the default temperature of 298.15 K (25˚C). Negative values of the transformed Gibbs energy of reaction will drive the reaction in the forward direction, while positive values will drive it in the reverse direction. The incorporation of thermodynamic constraints into a model is a straightforward process that necessitates minimal configuration input. However, it should be noted that the underlying calculations are of a highly complex nature. The f2xba model is aligned with the formulation implemented in the pyTFA package (Salvy et al., 2019), with the formulas being based on the book by Alberty (Alberty, 2003).\n",
    "\n",
    "The natural log of metabolite concentrations is a variable in the optimization problem, with lower and upper bounds defined in the TFA configuration file and potentially further limited prior to optimization. The factor of gas constant times temperature, 'RT', used in subsequent equations, is evaluated to 2.48 kJ/mol at T = 298.15 K.\n",
    "\n",
    "The transformed Gibbs energy of reaction, denoted by $\\Delta_r \\text G^{'}$, is calculated from the standard transformed Gibbs energy of reaction, denoted by $\\Delta_r \\text G^{'0}$, the metabolite concentrations $c_i$ [mol/L], and the stoichiometric coefficients $\\nu_i$ of reactants (negative) and products (positive) using the following equation 4.5-10 (Alberty, 2003):\n",
    "\n",
    "$\\Delta_r \\text G^{'} = \\Delta_r \\text G^{'0} + \\text{RT} \\sum_i {\\nu_i \\ln c_i}$\n",
    "\n",
    "The group contribution method involves the decomposition of a molecule into cues (groups) for which the Gibbs free energy of formation has been estimated with high confidence. During a reaction, only some of the cues of reactants and products, the net cues, undergo modification, while the majority of the other cues remain unmodified. The TD database contains estimated errors for each of the cues. The estimated errors of the net cues, denoted by $\\text{cue_est_error}_j$, are then utilized to ascertain the estimation error for the Gibbs energy of reaction. This estimation error is subsequently employed to establish the bounds of the variable $\\Delta_r \\text G^{'0}$. The calculation is performed as per pyTFA:\n",
    "\n",
    "$\\text{estimation_error} = \\sqrt {\\sum_j (\\nu_j \\, \\text{cue_est_error}_j)^2}$\n",
    "\n",
    "The standard-transformed Gibbs energy of reaction, denoted by $\\Delta_r \\text G^{'0}$, is derived from the standard-transformed Gibbs energies of formation, denoted by $\\Delta_f \\text G_i^{'0}$, of the reactants and products. This derivation is expressed through the following equation 4.4-2 (Alberty, 2003):\n",
    "\n",
    "$\\Delta_r \\text G^{'0} = \\sum_i \\nu_i \\Delta_f \\text G_i^{'0}$ \n",
    "\n",
    "In the context of a transport process, it is imperative to incorporate electrical work terms, which are calculated from the membrane potentials, denoted by the symbol $\\Delta \\varphi_{sd}$ (destination minus source potential), and the transported charges, denoted by the symbol $z_{sd}$ (source to destination compartment), which can assume positive or negative values. $F$ is the Faraday constant (96.485 kJ mol-1 V-1). To derive the equation, we have consulted the work of Jol (Jol et al., 2010).\n",
    "\n",
    "$\\Delta_r \\text G^{'0} = \\sum_i \\nu_i \\Delta_f \\text G_i^{'0} + F \\Delta \\sum_{sd} \\varphi_{ds} z_{sd}$\n",
    "\n",
    "The reaction network implemented by the genome-scale metabolic model consists of biochemical reactions and biochemical reactants (metabolites). At the molecular level, a biochemical reactant, such as ATP, can be regarded as a group (pseudoisomer group) of related chemical species in different protonation states and different complexations with metal ions, such as $\\text{ATP}^{4-}$, $\\text{HATP}^{3-}$, $\\text{H}_2\\text{ATP}^{2-}$, $\\text{MgATP}^{2-}$ or $\\text{MgHATP}^-$. The f2xba modeling framework considers different protonation states, but not different complexations with metal ions. \n",
    "\n",
    "The Gibbs energy of formation, denoted as $\\Delta_f \\text G^{'0}(I)$, of a biochemical reactant can be determined from the Gibbs energy of formation of the least protonated chemical species, denoted as $\\Delta_f \\text {G1}^{'0}(I)$, in the pseudoisomeric group and the contribution of the other chemical species in the pseudoisomeric group, which are considered by the binding polynomial, denoted as $\\text P(I)$. It is imperative to note that these values are contingent on the isomeric strength $I$ [mol/L] of the compartment, as elucidated in equations 4.5-6 (Alberty, 2003).\n",
    "\t\n",
    "$\\Delta_f \\text G^{'0}(I) = \\Delta_f \\text{G1}^{'0}(I) - \\text{RT} \\ln(\\text P(I))$\n",
    "\n",
    "The determination of the least protonated chemical species is contingent upon the compartmental pH, as defined in the TFA configuration file, in conjunction with the $\\text{pKa}_j$  values and the electrical charge extracted from the pertinent TD data record. The standard-transformed Gibbs energy of formation for the least protonated species, denoted as $\\Delta_f \\text{G1}^{'0}$, is derived from the standard Gibbs energy of formation, denoted as $\\Delta_f \\text G^{0}$, extracted from the corresponding TD database record. The latter is transformed to the compartmental $pH$ using equation 4.10-12 (Alberty, 2003):\n",
    "\n",
    "$\\Delta_f \\text {G1}^{'0} = \\Delta_f \\text G^0 + RT \\ln{(10)} \\sum_j {\\text{pKa}_j}$\n",
    "\n",
    "The standard transformed Gibbs energy of formation for this least protonated state, denoted as $\\Delta_f \\text{G1}^{'0}(I)$ at a given ionic strength, is calculated from its value at zero ionic strength, denoted as $\\Delta_f \\text{G1}^{'0}$, and adjustments with respect to the energy contribution by the number of protons, denoted as $nH$, in the structure and electrical charge, denoted as $z$. Ionic strength $I$, defined as the measure of ion concentration, exerts a significant influence on the activity coefficients employed in equilibrium equations by means of shielding charges. This adjustment is achieved through the utilization of equation 4.4-10 (Alberty, 2003), which is founded on the extended Debye-Hueckel equation with constants $A = 0.51065 \\sqrt\\frac{l}{mol}$ and $B = 1.6\\sqrt\\frac{l}{mol}$.\n",
    "\n",
    "$\\Delta_f \\text{G1}^{'0}(I) = \\Delta_f \\text{G1}^{'0} + \\text{RT} \\cdot \\ln(10) \\cdot \\text{nH} \\cdot \\text{pH}  - \\text{RT} \\cdot \\ln(10) \\cdot (z^2 - \\text{nH}) \\cdot \\frac{A \\sqrt I}{1 + B \\sqrt I}$\n",
    "\n",
    "Ionic strength-adjusted acid dissociation constants, denoted by the symbol $\\text{pKa}^{'}(I)$, are essential for determining the binding polynomial. These constants can be calculated from the corresponding $\\text{pKa}$ values using the following equation 4.10-11 (Alberty, 2003):\n",
    "\n",
    "$\\text{pKa}^{'}(I) = \\text{pKa} - \\sum_j \\nu_j z_j^2 \\cdot \\frac{A \\sqrt I}{1 + B \\sqrt I}$\n",
    "\n",
    "The binding polynomial, denoted by $P(I)$, which accounts for the energy contribution of the other chemical species in the pseudoisomer group, is calculated from the equilibrium constants $K_x$, which relate to the ionic strength-adjusted acid dissociation constants ($K_x = 10^{-\\text{pKa}_x^{'}}$), with $K_1$ having the smallest value (highest pKa) and the compartmental proton concentration $[H^+] = 10^{-pH}$, using equation 4.5-7 (Alberty, 2003).\n",
    "\n",
    "$P(I) = 1 + \\frac {[H^+]} {K_1} + \\frac {[H^+]^2} {K_1 \\cdot K_2} + \\dotsb$\n",
    "\n",
    "The mean number of bound hydrogens, denoted by $\\text{avg_h_binding}$, in the pseudoisomeric group can be calculated from the binding polynomial, as outlined in equation 1.3-9 (Alberty, 2003).\n",
    "\n",
    "$\\text{avg_h_binding} = \\frac {[H+]} {\\text P}  \\frac {d \\text P} {d[H+]} = \\frac{[H+]/K_1 + 2[H+]^2/K_1K_2 + \\dotsb } {\\text P(I)}$\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "---\n",
    "## References\n",
    "\n",
    "- Alberty, R. A. (2003). Thermodynamics of Biochemical Reactions. Massachusetts Institute of Technology Press, Cambridge, MA. \n",
    "- Buckstein Michael, H., He, J., & Rubin, H. (2008). Characterization of Nucleotide Pools as a Function of Physiological State in Escherichia coli. Journal of Bacteriology, 190(2), 718-726. https://doi.org/10.1128/jb.01020-07 \n",
    "- Jankowski, M. D., Henry, C. S., Broadbelt, L. J., & Hatzimanikatis, V. (2008). Group contribution method for thermodynamic analysis of complex metabolic networks. Biophys J, 95(3), 1487-1499. https://doi.org/10.1529/biophysj.107.124784 \n",
    "- Jol, S. J., Kümmel, A., Hatzimanikatis, V., Beard, D. A., & Heinemann, M. (2010). Thermodynamic calculations for biochemical transport and reaction processes in metabolic networks. Biophys J, 99(10), 3139-3144. https://doi.org/10.1016/j.bpj.2010.09.043 \n",
    "- Mavrovouniotis, M. L. (1990). Group contributions for estimating standard gibbs energies of formation of biochemical compounds in aqueous solution. Biotechnol Bioeng, 36(10), 1070-1082. https://doi.org/10.1002/bit.260361013 \n",
    "- Salvy, P., Fengos, G., Ataman, M., Pathier, T., Soh, K. C., & Hatzimanikatis, V. (2019). pyTFA and matTFA: a Python package and a Matlab toolbox for Thermodynamics-based Flux Analysis. Bioinformatics, 35(1), 167-169. https://doi.org/10.1093/bioinformatics/bty499 \n",
    "- Seaver, S. M. D., Liu, F., Zhang, Q., Jeffryes, J., Faria, J. P., Edirisinghe, J. N., Mundy, M., Chia, N., Noor, E., Beber, M. E., Best, A. A., DeJongh, M., Kimbrel, J. A., D'Haeseleer, P., McCorkle, S. R., Bolton, J. R., Pearson, E., Canon, S., Wood-Charlson, E. M.,…Henry, C. S. (2021). The ModelSEED Biochemistry Database for the integration of metabolic annotations and the reconstruction, comparison and analysis of metabolic models for plants, fungi and microbes. Nucleic Acids Res, 49(D1), D575-d588. https://doi.org/10.1093/nar/gkaa746 "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.12",
   "language": "python",
   "name": "python310"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}