beetroots.approx_optim package

Subpackages

Submodules

beetroots.approx_optim.abstract_approx_optim module

class beetroots.approx_optim.abstract_approx_optim.ApproxParamsOptim(list_lines: List[str], simu_name: str, D: int, D_no_kappa: int, K: int, log10_f_grid_size: int, N_samples_y: int, max_workers: int, sigma_a_raw: ndarray | float, sigma_m: ndarray | float, path_outputs: str, path_models: str, N_clusters_a_priori: int | None, small_size: int = 16, medium_size: int = 20, bigger_size: int = 24)[source]

Bases: ABC

Optimization of an approximation parameter for the likelihood function beetroots.modelling.likelihoods.approx_censored_add_mult.MixingModelsLikelihood. The optimization framework used to adjust this parameter is introduced in Appendix A of Palud et al. [2023].

Parameters:

list_lines (List[str]) – names of the observables for which the likelihood parameter needs to be adjusted
simu_name (str) – simu_name of the process, to be used in the output folder simu_name
D (int) – total number of physical parameters involved in the forward map
D_no_kappa (int) – total number of physical parameters involved in the forward map except for the scaling parameter \(\kappa\) (if it is part of the considered physical parameters)
K (int) – the number of sampled theta values is \(K^D\)
log10_f_grid_size (int) – number of points in the grid on \(\log f_\ell(\theta)\)
N_samples_y (int) – number of samples for \(y_\ell\)
max_workers (int) – maximum number of workers that can be used for optimization or results extraction
sigma_a_raw (Union[np.ndarray, float]) – standard deviation of the additive Gaussian noise
sigma_m (Union[np.ndarray, float]) – standard deviation parameter of the multiplicative lognormal noise
path_outputs (str) – path to the output folder (to be created), where the run results are to be saved
path_models (str) – path to the folder containing the forward models
N_clusters_a_priori (Optional[int]) – The number of different values of sigma_a_raw to consider for each line (and thus of optimization problem to solve per line), computed with a clustering algorithm on the N of values sigma_a_raw. Raises an error if N_clusters_a_priori > N.
small_size (int, optional) – size for basic text, axes titles, xticks and yticks, by default 16
medium_size (int, optional) – size of the axis labels, by default 20
bigger_size (int, optional) – size of the figure title, by default 24

D

total number of physical parameters involved in the forward map

Type:: int

D_no_kappa

total number of physical parameters involved in the forward map except for the scaling parameter \(\kappa\) (if it is part of the considered physical parameters)

Type:: int

K

the number of sampled theta values is K^D_sampling

Type:: int

L

number of observables for which the likelihood parameter needs to be adjusted

Type:: int

MODELS_PATH

path to the folder containing all the already defined and saved models (i.e., polynomials or neural networks)

Type:: str

N

number of pixels / components for which the optimization needs to be performed

Type:: int

N_clusters

The number of different values of sigma_a to consider for each line (and thus of optimization problem to solve per line), computed with a clustering algorithm on the N of values sigma_a.

Type:: Optional[int]

N_optim_per_line

number of optimization procedures to run per line

Type:: int

N_samples_theta

number of samples for \(\theta\) used to build the histogram of \(\log_{10} f_\ell(\theta)\). To be defined in the daughter classes.

Type:: int

N_samples_y

number of samples for \(y_\ell\)

Type:: int

check_num_uniques(sigma_a_raw: ndarray, N_clusters_a_priori: int | None) → int | None[source]

Sets the number of clusters to consider N_clusters, that is, the number of optimization procedure to run per line. There are 4 possible cases:

case 1: set N_clusters with the number of distinct values of sigma_a
case 2: the number of distinct values of sigma_a is lower than the number of clusters provided by the user. In this case, use the value that minimizes the number of optimization procedures to run.
case 3: use the number of clusters indicated by the user.
case 4: last case : run one optim per pixel, ie run self.N optimizations.

Parameters:

sigma_a_raw (np.ndarray of shape (N, L)) – set of standard deviations in the
N_clusters_a_priori (Optional[int]) – number of clusters to consider indicated by the user

Returns:

definitive number of clusters to consider

Return type:

Optional[int]

cluster_sigma_a_raw(sigma_a_raw: ndarray) → ndarray[source]

runs self.L k-means clustering algorithms (one per line) on the sets of standard deviation of the additive noise sigma_a. The number of clusters is defined with self.N_clusters. The obtained dataframe is saved as a csv file.

Parameters:: sigma_a_raw (np.ndarray of shape (N, L)) – array of all the sigma_a values associated with the observations.
Returns:: reduced sigma_a array, with N_clusters lines instead of N (with N_clusters potentially much smaller than N)
Return type:: np.ndarray of shape (N_clusters, L)

create_empty_output_folders(simu_name: str, path_outputs: str) → None[source]

creates the directories that receive the results of the likelihood parameter optimization

Parameters:

simu_name (str) – name of the simulation to be run
path_yaml_file (str) – path of the folder containing the data and yaml files
path_outputs (str) – folder where to write outputs

extract_optimal_params() → DataFrame[source]

extracts the adjusted likelihood parameters from the log files and gather them in a DataFrame

Returns:: DataFrame with the set of evaluated optimal parameters for each component n and line ell
Return type:: pd.DataFrame

list_lines

names of the observables for which the likelihood parameter needs to be adjusted

Type:: List[str]

log10_f_grid_size

number of points in the grid on \(\log_{10} f_\ell(\theta)\)

Type:: int

max_workers

maximum number of workers that can be used for optimization or results extraction

Type:: int

classmethod parse_args() → Tuple[str, str, str, str][source]

parses the inputs of the command-line, that should contain

the name of the input YAML file
path to the data folder
path to the models folder
path to the outputs folder to be created (by default ‘.’)

Returns:

str – name of the input YAML file
str – path to the data folder
str – path to the models folder
str – path to the outputs folder to be created (by default ‘.’)

path_centroids

path to the csv file containing the sigma_a centroid values for each cluster (saved in output folder)

Type:: str

path_intermediate_result

path to the intermediate results (per cluster, saved in output folder)

Type:: str

plot_clusters_sigma_a(line: str, log10_sigma_a_ell_nonnan: ndarray, log10_centroids: ndarray) → None[source]

plots one one-dimension histogram on the log10 of the standard deviation on the additive noise in the observation, and the computed centroids.

Parameters:

line (str) – name of the line
log10_sigma_a_ell_nonnan (np.ndarray) – non-nan values of sigma_a for the considered line
log10_centroids (np.ndarray of shape (N_clusters,)) – values of the sigma_a centroids for the considered line

plot_hist_log10_f_Theta(log10_f_Theta: ndarray, log10_f_Theta_low: float, log10_f_Theta_high: float, list_log10_f_grid: ndarray, pdf_kde_log10_f_Theta: ndarray, ell: int) → None[source]

plots histogram of \(log_{10}(f_\ell(\theta))\)

Parameters:

log10_f_Theta (np.ndarray of shape (-1, 1)) – array of values of \(\log_{10} f_\ell (\theta)\) for considered line
log10_f_Theta_low (float) – lower bound for \(\log_{10} f_\ell (\theta)\) for the considered line
log10_f_Theta_high (float) – upper bound for \(\log_{10} f_\ell (\theta)\) for the considered line
list_log10_f_grid (np.ndarray) – grid values of \(\log_{10} f_\ell (\theta)\) for the considered line
pdf_kde_log10_f_Theta (np.ndarray) – pdf of \(\log_{10} f_\ell (\theta)\) evaluated with a kernel density estimator
ell (int) – index of the line

plot_hist_log10_f_Theta_with_optim_results(log10_f_Theta: ndarray, log10_f_Theta_low: float, log10_f_Theta_high: float, list_log10_f_grid: ndarray, pdf_kde_log10_f_Theta: ndarray, n: int, ell: int, best_point: ndarray) → None[source]

plots histogram of \(\log_{10} f_\ell (\theta)\)

Parameters:

log10_f_Theta (np.ndarray of shape (-1, 1)) – array of values of \(\log_{10} f_\ell (\theta)\) for considered line
log10_f_Theta_low (float) – lower bound for \(\log_{10} f_\ell (\theta)\) for the considered line
log10_f_Theta_high (float) – upper bound for \(\log_{10} f_\ell (\theta)\) for the considered line
list_log10_f_grid (np.ndarray) – grid values of \(\log_{10} f_\ell (\theta)\) for the considered line
pdf_kde_log10_f_Theta (np.ndarray) – pdf of \(\log_{10} f_\ell (\theta)\) evaluated with a kernel density estimator
n (int) – pixel / component index
ell (int) – index of the line
best_point (np.ndarray) – position for the best point, to be displayed

plot_params_with_sigma_a(df_best: DataFrame) → None[source]

rewrite_logs_correct_json_format() → None[source]: rewrites the log files with correct json format

sample_theta(lower_bounds: ndarray, upper_bounds: ndarray) → ndarray[source]

sample \(\theta\) from Stratified MC in cube

Parameters:

K (int) – total number of samples per axis
lower_bounds (np.ndarray of shape (D,)) – lower bounds of cube on \(\theta\)
upper_bounds (np.ndarray of shape (D,)) – upper bounds of cube on \(\theta\)

Returns:

\(\theta\) samples

Return type:

np.ndarray of shape (N_samples, D)

save_results_in_data_folder(path_data: str, filename_err: str) → None[source]

Parameters:

path_data (str) – path to the data folder
filename_err (str) – name of the file containing the sigma_a values

save_setup_to_json(n: int, ell: int, pbounds: Dict[str, ndarray]) → None[source]

save optimization context and parameters

Parameters:

n (int) – pixel / component index
ell (int) – observable index
pbounds (dict[str, np.ndarray]) – contains the bounds on the parameters to be adjusted

setup_params_bounds() → Tuple[ndarray, ndarray, ndarray, float, float][source]

sets the bounds on the parameters to be adjusted, defined here as transition interval center for a0 position and width for a1

Returns:

log10_f0 (np.ndarray of shape (N, L)) – values of \(f_\ell(\theta)\) at which additive and multiplicative noise variances are equal
bounds_a0_low (np.ndarray of shape (N, L)) – lower bounds on the center of the transition interval (defined as deltas around the log10_f0)
bounds_a0_high (np.ndarray of shape (N, L)) – upper bounds on the center of the transition interval (defined as deltas around the log10_f0)
bounds_a1_low (float) – lower value for the transition interval size
bounds_a1_high (float) – upper value for the transition interval size

setup_plot_text_sizes(small_size: int = 16, medium_size: int = 20, bigger_size: int = 24) → None[source]

defines text sizes on matplotlib plots

Parameters:

small_size (int, optional) – size for basic text, axes titles, xticks and yticks, by default 16
medium_size (int, optional) – size of the axis labels, by default 20
bigger_size (int, optional) – size of the figure title, by default 24

sigma_a

standard deviations of the additive Gaussian noise

Type:: np.ndarray

sigma_m

standard deviation parameter of the multiplicative lognormal noise

Type:: np.ndarray

beetroots.approx_optim.nn_bo module

class beetroots.approx_optim.nn_bo.ApproxParamsOptimNNBO(list_lines: List[str], simu_name: str, D: int, D_no_kappa: int, K: int, log10_f_grid_size: int, N_samples_y: int, max_workers: int, sigma_a_raw: ndarray | float, sigma_m: ndarray | float, path_outputs: str, path_models: str, N_clusters_a_priori: int | None, small_size: int = 16, medium_size: int = 20, bigger_size: int = 24)[source]

Bases: ApproxParamsOptim, ApproxOptimNN, BayesianOptimizationApproach

class that performs likelihood parameter optimization using Bayesian optimization for a neural network forward map

Parameters:

list_lines (List[str]) – names of the observables for which the likelihood parameter needs to be adjusted
simu_name (str) – simu_name of the process, to be used in the output folder simu_name
D (int) – total number of physical parameters involved in the forward map
D_no_kappa (int) – total number of physical parameters involved in the forward map except for the scaling parameter \(\kappa\) (if it is part of the considered physical parameters)
K (int) – the number of sampled theta values is \(K^D\)
log10_f_grid_size (int) – number of points in the grid on \(\log f_\ell(\theta)\)
N_samples_y (int) – number of samples for \(y_\ell\)
max_workers (int) – maximum number of workers that can be used for optimization or results extraction
sigma_a_raw (Union[np.ndarray, float]) – standard deviation of the additive Gaussian noise
sigma_m (Union[np.ndarray, float]) – standard deviation parameter of the multiplicative lognormal noise
path_outputs (str) – path to the output folder (to be created), where the run results are to be saved
path_models (str) – path to the folder containing the forward models
N_clusters_a_priori (Optional[int]) – The number of different values of sigma_a_raw to consider for each line (and thus of optimization problem to solve per line), computed with a clustering algorithm on the N of values sigma_a_raw. Raises an error if N_clusters_a_priori > N.
small_size (int, optional) – size for basic text, axes titles, xticks and yticks, by default 16
medium_size (int, optional) – size of the axis labels, by default 20
bigger_size (int, optional) – size of the figure title, by default 24

main method of the class, sets up the optimization problems and solves them

Parameters:

dict_forward_model (Dict[str, Union[str, bool, List[bool], List[float], Dict[str, float]]]) – contains the necessary information to load the forward model with the NeuralNetworkApprox
lower_bounds_lin (Union[np.ndarray, List]) – lower bounds on the physical parameters (in linear scale)
upper_bounds_lin (Union[np.ndarray, List]) – upper bounds on the physical parameters (in linear scale)
n_iter (int) – number of iterations for the Bayesian optimization

beetroots.approx_optim.nn_bo_real_data module

class beetroots.approx_optim.nn_bo_real_data.ReadDataRealData(list_lines: List[str])[source]

Bases: SimulationRealData

implements an __init__ method for SimulationRealData

Parameters:: list_lines (List[str]) – observables for which the likelihood approximation parameters are to be adjusted

beetroots.approx_optim package

Subpackages

Submodules

beetroots.approx_optim.abstract_approx_optim module

beetroots.approx_optim.nn_bo module

beetroots.approx_optim.nn_bo_real_data module

beetroots.approx_optim.nn_bo_toycase module

Module contents