Support functions¶
Managing configuration files¶
Support functions to load and write configuration files used in extended model generation.
- f2xba.utils.mapping_utils.load_parameter_file(fname, sheet_names=None)[source]¶
Load configuration data from a spreadsheet file.
Configuration files are required during extended model creation. These files are Microsoft Excel spreadsheets (.xlsx) containing one or several sheets.
While configuration files can be created and updated using spreadsheet editors, it may be more convenient to create and modify these files using program code.
Using this function all sheets (default) or selected sheets (parameter sheet_names) can be loaded from file.
from f2xba.utils.mapping_utils import load_parameter_file xba_params = load_parameter_file('xba_parameters.xlsx')
- Parameters:
fname (str) – filename of configuration file (.xlsx)
sheet_names (list(str)) – (optional) sheet names of tables to import
- Returns:
imported tables
- Return type:
dict(str, pandas.DataFrame)
- f2xba.utils.mapping_utils.write_parameter_file(fname, tables)[source]¶
Export configuation data to a spreadsheet file.
Configuration files are required during extended model creation. These files are Microsoft Excel spreadsheets (.xlsx) containing one or several sheets.
While configuration files can be created and updated using spreadsheet editors, it may be more convenient to create and modify these files using program code.
from f2xba.utils.mapping_utils import write_parameter_file write_parameter_file('xba_parameters.xlsx', xba_params)
- Parameters:
fname (str) – filename of configuration file (.xlsx)
tables (dict(pandas.DataFrame)) – sheet names and tables with configuration data
Calculate molecular weights¶
Calculate molecular weights for DNA, RNA, proteins and metabolites.
- f2xba.utils.calc_mw.calc_mw_from_formula(formula)[source]¶
Calculate metabolite molecular weight based on chemical formula
- using NIST atomic weights table (standard atomic weight):
https://physics.nist.gov/cgi-bin/Compositions/stand_alone.pl
E.g. ‘C10H12N5O7P’ for AMP -> 345.050 g/mol
- Parameters:
formula (str) – chemical formula, e.g. ‘H2O’
- Returns:
molecular weight in Da (g/mol)
- Return type:
float
- f2xba.utils.calc_mw.protein_mw_from_aa_comp(aa_dict)[source]¶
Calculate protein molecular weight from amino acid composition.
Based on Expasy Compute pI/Mw tool one H20 is removed from amino acid per peptide bond
- Parameters:
aa_dict (dict(char, float)) – dictionary with amino acid one-letter code and stoichiometry
- Returns:
molecular weight in g/mol (Da)
- Return type:
float
- f2xba.utils.calc_mw.rna_mw_from_nt_comp(nt_dict)[source]¶
Calculate RNA molecular from nucleotide composition.
- Parameters:
nt_dict (dict(char, float)) – nucleotide composition (‘A’, ‘C’, ‘G’, ‘U’)
- Returns:
molecular weight in g/mol (Da)
- Return type:
float
- f2xba.utils.calc_mw.dsdna_mw_from_dnt_comp(dnt_dict)[source]¶
Calculate DNA molecular from deoxy nucleotide composition (double strand).
Adding deoxy nucleotides for the complementary strand.
- Parameters:
dnt_dict (dict(char, float)) – deoxy nucleotide compositions (‘A’, ‘C’, ‘G’, ‘T’)
- Returns:
molecular weight in g/mol (Da)
- Return type:
float
SGKO¶
Support functions for single gene knockout analysis.
- f2xba.utils.sgko_utils.confusion_matrix(act_classification, pred_classification)[source]¶
Create a 2D confusion matrix based on actual and predicted classifications.
Statistics, set of items and confusion matrix are returned in a dictionary.
Example: Perform single gene deletion simulation (using gurobipy interface) and plot confusion matrix. keio_ess and keio_red hold lists of genes that are considered essential/redundant for selected condition.
eo = EcmOptimization('iML1515_GECKO.xml') eo.medium = {rid: 1000.0 for rid in lb_medium} df_sgko = eo.single_gene_deletion() act_classification = {gene: False for gene in keio_red} act_classification.update({gene: True for gene in keio_ess}) pred_classification = (df_sgko['fitness'] < 0.05).to_dict() pred = confusion_matrix(act_classification, pred_classification) print('recall:', pred['recall']) pred['cm']
- Parameters:
act_classification (dict(str, bool)) – actual classifications
pred_classification (dict(str, bool)) – predicted classifications
- Returns:
prediction results
- Return type:
dict
- f2xba.utils.sgko_utils.export_gene_predictions(pred_results, exp_fitness, pred_fitness, pred_status, uniprot_data, exp_mpmf, fname=None)[source]¶
Export gene predictions with additional information.
Using the structure returned by confusion_matrix() a table is generated, indexed by gene id. The table will be written to an Excel file, if fname is provided. Table contains additional data, extracted from information provided in the parameters.
For gene essentiality analysis, set parameter exp_fitness to {}.
Example: Perform single gene deletion simulation (using gurobipy interface) and export prediction results. keio_ess and keio_red hold lists of genes that are considered essential/redundant for selected condition. df_mpmf contains proteomics data for reference. Uniprot data is collected for the organism in question.
from f2xba.uniprot.uniprot_data import UniprotData uniprot_data = UniprotData(83333, 'data_refs') eo = EcmOptimization('iML1515_GECKO.xml') eo.medium = {rid: 1000.0 for rid in lb_medium} df_sgko = eo.single_gene_deletion() act_classification = {gene: False for gene in keio_red} act_classification.update({gene: True for gene in keio_ess}) pred_classification = (df_sgko['fitness'] < 0.05).to_dict() pred = confusion_matrix(act_classification, pred_classification) pred_fitness = df_sgko['fitness'].to_dict() pred_status = df_sgko['status'].to_dict() exp_mpmf = df_mpmf['LB'].to_dict() fname = 'essentiality_predictions.xlsx' df_predictions = export_gene_predictions(pred, {}, pred_fitness, pred_status, uniprot_data, exp_mpmf, fname)
- Parameters:
pred_results (dict) – SGKO prediction results generated by confusion_matrix()
exp_fitness (dict(str, float)) – fitness data from experiment, if available, otherwise {}
pred_fitness (dict(str, float)) – fitness data determined from SGKO analysis
pred_status (dict(str, str)) – optimization status of SGKO predictions
uniprot_data (
UniprotData) – instance containing UniProt protein data for given model/organismexp_mpmf (dict(str, float)) – experimental values of protein mass fractions in mg/g
fname (str) – (optional) Excel file name of spreadsheet with`.xlsx`
- Returns:
table with detailed prediction data
- Rdata:
pandas.DataFrame