Support functions

Managing configuration files

Support functions to load and write configuration files used in extended model generation.

f2xba.utils.mapping_utils.load_parameter_file(fname, sheet_names=None)[source]

Load configuration data from a spreadsheet file.

Configuration files are required during extended model creation. These files are Microsoft Excel spreadsheets (.xlsx) containing one or several sheets.

While configuration files can be created and updated using spreadsheet editors, it may be more convenient to create and modify these files using program code.

Using this function all sheets (default) or selected sheets (parameter sheet_names) can be loaded from file.

from f2xba.utils.mapping_utils import load_parameter_file

xba_params = load_parameter_file('xba_parameters.xlsx')
Parameters:
  • fname (str) – filename of configuration file (.xlsx)

  • sheet_names (list(str)) – (optional) sheet names of tables to import

Returns:

imported tables

Return type:

dict(str, pandas.DataFrame)

f2xba.utils.mapping_utils.write_parameter_file(fname, tables)[source]

Export configuation data to a spreadsheet file.

Configuration files are required during extended model creation. These files are Microsoft Excel spreadsheets (.xlsx) containing one or several sheets.

While configuration files can be created and updated using spreadsheet editors, it may be more convenient to create and modify these files using program code.

from f2xba.utils.mapping_utils import write_parameter_file

write_parameter_file('xba_parameters.xlsx', xba_params)
Parameters:
  • fname (str) – filename of configuration file (.xlsx)

  • tables (dict(pandas.DataFrame)) – sheet names and tables with configuration data

Calculate molecular weights

Calculate molecular weights for DNA, RNA, proteins and metabolites.

f2xba.utils.calc_mw.calc_mw_from_formula(formula)[source]

Calculate metabolite molecular weight based on chemical formula

using NIST atomic weights table (standard atomic weight):

https://physics.nist.gov/cgi-bin/Compositions/stand_alone.pl

E.g. ‘C10H12N5O7P’ for AMP -> 345.050 g/mol

Parameters:

formula (str) – chemical formula, e.g. ‘H2O’

Returns:

molecular weight in Da (g/mol)

Return type:

float

f2xba.utils.calc_mw.protein_mw_from_aa_comp(aa_dict)[source]

Calculate protein molecular weight from amino acid composition.

Based on Expasy Compute pI/Mw tool one H20 is removed from amino acid per peptide bond

Parameters:

aa_dict (dict(char, float)) – dictionary with amino acid one-letter code and stoichiometry

Returns:

molecular weight in g/mol (Da)

Return type:

float

f2xba.utils.calc_mw.rna_mw_from_nt_comp(nt_dict)[source]

Calculate RNA molecular from nucleotide composition.

Parameters:

nt_dict (dict(char, float)) – nucleotide composition (‘A’, ‘C’, ‘G’, ‘U’)

Returns:

molecular weight in g/mol (Da)

Return type:

float

f2xba.utils.calc_mw.dsdna_mw_from_dnt_comp(dnt_dict)[source]

Calculate DNA molecular from deoxy nucleotide composition (double strand).

Adding deoxy nucleotides for the complementary strand.

Parameters:

dnt_dict (dict(char, float)) – deoxy nucleotide compositions (‘A’, ‘C’, ‘G’, ‘T’)

Returns:

molecular weight in g/mol (Da)

Return type:

float

SGKO

Support functions for single gene knockout analysis.

f2xba.utils.sgko_utils.confusion_matrix(act_classification, pred_classification)[source]

Create a 2D confusion matrix based on actual and predicted classifications.

Statistics, set of items and confusion matrix are returned in a dictionary.

Example: Perform single gene deletion simulation (using gurobipy interface) and plot confusion matrix. keio_ess and keio_red hold lists of genes that are considered essential/redundant for selected condition.

eo = EcmOptimization('iML1515_GECKO.xml')
eo.medium = {rid: 1000.0 for rid in lb_medium}
df_sgko = eo.single_gene_deletion()

act_classification = {gene: False for gene in keio_red}
act_classification.update({gene: True for gene in keio_ess})
pred_classification = (df_sgko['fitness'] < 0.05).to_dict()
pred = confusion_matrix(act_classification, pred_classification)

print('recall:', pred['recall'])
pred['cm']
Parameters:
  • act_classification (dict(str, bool)) – actual classifications

  • pred_classification (dict(str, bool)) – predicted classifications

Returns:

prediction results

Return type:

dict

f2xba.utils.sgko_utils.export_gene_predictions(pred_results, exp_fitness, pred_fitness, pred_status, uniprot_data, exp_mpmf, fname=None)[source]

Export gene predictions with additional information.

Using the structure returned by confusion_matrix() a table is generated, indexed by gene id. The table will be written to an Excel file, if fname is provided. Table contains additional data, extracted from information provided in the parameters.

For gene essentiality analysis, set parameter exp_fitness to {}.

Example: Perform single gene deletion simulation (using gurobipy interface) and export prediction results. keio_ess and keio_red hold lists of genes that are considered essential/redundant for selected condition. df_mpmf contains proteomics data for reference. Uniprot data is collected for the organism in question.

from f2xba.uniprot.uniprot_data import UniprotData

uniprot_data = UniprotData(83333, 'data_refs')

eo = EcmOptimization('iML1515_GECKO.xml')
eo.medium = {rid: 1000.0 for rid in lb_medium}
df_sgko = eo.single_gene_deletion()

act_classification = {gene: False for gene in keio_red}
act_classification.update({gene: True for gene in keio_ess})
pred_classification = (df_sgko['fitness'] < 0.05).to_dict()
pred = confusion_matrix(act_classification, pred_classification)

pred_fitness = df_sgko['fitness'].to_dict()
pred_status = df_sgko['status'].to_dict()
exp_mpmf = df_mpmf['LB'].to_dict()
fname = 'essentiality_predictions.xlsx'
df_predictions = export_gene_predictions(pred, {}, pred_fitness, pred_status, uniprot_data, exp_mpmf, fname)
Parameters:
  • pred_results (dict) – SGKO prediction results generated by confusion_matrix()

  • exp_fitness (dict(str, float)) – fitness data from experiment, if available, otherwise {}

  • pred_fitness (dict(str, float)) – fitness data determined from SGKO analysis

  • pred_status (dict(str, str)) – optimization status of SGKO predictions

  • uniprot_data (UniprotData) – instance containing UniProt protein data for given model/organism

  • exp_mpmf (dict(str, float)) – experimental values of protein mass fractions in mg/g

  • fname (str) – (optional) Excel file name of spreadsheet with`.xlsx`

Returns:

table with detailed prediction data

Rdata:

pandas.DataFrame