beetroots.approx_optim package

Subpackages

Submodules

beetroots.approx_optim.abstract_approx_optim module

class beetroots.approx_optim.abstract_approx_optim.ApproxParamsOptim(list_lines: List[str], simu_name: str, D: int, D_no_kappa: int, K: int, log10_f_grid_size: int, N_samples_y: int, max_workers: int, sigma_a_raw: ndarray | float, sigma_m: ndarray | float, path_outputs: str, path_models: str, N_clusters_a_priori: int | None, small_size: int = 16, medium_size: int = 20, bigger_size: int = 24)[source]

Bases: ABC

Optimization of an approximation parameter for the likelihood function beetroots.modelling.likelihoods.approx_censored_add_mult.MixingModelsLikelihood. The optimization framework used to adjust this parameter is introduced in Appendix A of Palud et al. [2023].

Parameters:
  • list_lines (List[str]) – names of the observables for which the likelihood parameter needs to be adjusted

  • simu_name (str) – simu_name of the process, to be used in the output folder simu_name

  • D (int) – total number of physical parameters involved in the forward map

  • D_no_kappa (int) – total number of physical parameters involved in the forward map except for the scaling parameter \(\kappa\) (if it is part of the considered physical parameters)

  • K (int) – the number of sampled theta values is \(K^D\)

  • log10_f_grid_size (int) – number of points in the grid on \(\log f_\ell(\theta)\)

  • N_samples_y (int) – number of samples for \(y_\ell\)

  • max_workers (int) – maximum number of workers that can be used for optimization or results extraction

  • sigma_a_raw (Union[np.ndarray, float]) – standard deviation of the additive Gaussian noise

  • sigma_m (Union[np.ndarray, float]) – standard deviation parameter of the multiplicative lognormal noise

  • path_outputs (str) – path to the output folder (to be created), where the run results are to be saved

  • path_models (str) – path to the folder containing the forward models

  • N_clusters_a_priori (Optional[int]) – The number of different values of sigma_a_raw to consider for each line (and thus of optimization problem to solve per line), computed with a clustering algorithm on the N of values sigma_a_raw. Raises an error if N_clusters_a_priori > N.

  • small_size (int, optional) – size for basic text, axes titles, xticks and yticks, by default 16

  • medium_size (int, optional) – size of the axis labels, by default 20

  • bigger_size (int, optional) – size of the figure title, by default 24

D

total number of physical parameters involved in the forward map

Type:

int

D_no_kappa

total number of physical parameters involved in the forward map except for the scaling parameter \(\kappa\) (if it is part of the considered physical parameters)

Type:

int

K

the number of sampled theta values is K^D_sampling

Type:

int

L

number of observables for which the likelihood parameter needs to be adjusted

Type:

int

MODELS_PATH

path to the folder containing all the already defined and saved models (i.e., polynomials or neural networks)

Type:

str

N

number of pixels / components for which the optimization needs to be performed

Type:

int

N_clusters

The number of different values of sigma_a to consider for each line (and thus of optimization problem to solve per line), computed with a clustering algorithm on the N of values sigma_a.

Type:

Optional[int]

N_optim_per_line

number of optimization procedures to run per line

Type:

int

N_samples_theta

number of samples for \(\theta\) used to build the histogram of \(\log_{10} f_\ell(\theta)\). To be defined in the daughter classes.

Type:

int

N_samples_y

number of samples for \(y_\ell\)

Type:

int

check_num_uniques(sigma_a_raw: ndarray, N_clusters_a_priori: int | None) int | None[source]

Sets the number of clusters to consider N_clusters, that is, the number of optimization procedure to run per line. There are 4 possible cases:

  • case 1: set N_clusters with the number of distinct values of sigma_a

  • case 2: the number of distinct values of sigma_a is lower than the number of clusters provided by the user. In this case, use the value that minimizes the number of optimization procedures to run.

  • case 3: use the number of clusters indicated by the user.

  • case 4: last case : run one optim per pixel, ie run self.N optimizations.

Parameters:
  • sigma_a_raw (np.ndarray of shape (N, L)) – set of standard deviations in the

  • N_clusters_a_priori (Optional[int]) – number of clusters to consider indicated by the user

Returns:

definitive number of clusters to consider

Return type:

Optional[int]

cluster_sigma_a_raw(sigma_a_raw: ndarray) ndarray[source]

runs self.L k-means clustering algorithms (one per line) on the sets of standard deviation of the additive noise sigma_a. The number of clusters is defined with self.N_clusters. The obtained dataframe is saved as a csv file.

Parameters:

sigma_a_raw (np.ndarray of shape (N, L)) – array of all the sigma_a values associated with the observations.

Returns:

reduced sigma_a array, with N_clusters lines instead of N (with N_clusters potentially much smaller than N)

Return type:

np.ndarray of shape (N_clusters, L)

create_empty_output_folders(simu_name: str, path_outputs: str) None[source]

creates the directories that receive the results of the likelihood parameter optimization

Parameters:
  • simu_name (str) – name of the simulation to be run

  • path_yaml_file (str) – path of the folder containing the data and yaml files

  • path_outputs (str) – folder where to write outputs

extract_optimal_params() DataFrame[source]

extracts the adjusted likelihood parameters from the log files and gather them in a DataFrame

Returns:

DataFrame with the set of evaluated optimal parameters for each component n and line ell

Return type:

pd.DataFrame

list_lines

names of the observables for which the likelihood parameter needs to be adjusted

Type:

List[str]

log10_f_grid_size

number of points in the grid on \(\log_{10} f_\ell(\theta)\)

Type:

int

max_workers

maximum number of workers that can be used for optimization or results extraction

Type:

int

classmethod parse_args() Tuple[str, str, str, str][source]

parses the inputs of the command-line, that should contain

  • the name of the input YAML file

  • path to the data folder

  • path to the models folder

  • path to the outputs folder to be created (by default ‘.’)

Returns:

  • str – name of the input YAML file

  • str – path to the data folder

  • str – path to the models folder

  • str – path to the outputs folder to be created (by default ‘.’)

path_centroids

path to the csv file containing the sigma_a centroid values for each cluster (saved in output folder)

Type:

str

path_intermediate_result

path to the intermediate results (per cluster, saved in output folder)

Type:

str

plot_clusters_sigma_a(line: str, log10_sigma_a_ell_nonnan: ndarray, log10_centroids: ndarray) None[source]

plots one one-dimension histogram on the log10 of the standard deviation on the additive noise in the observation, and the computed centroids.

Parameters:
  • line (str) – name of the line

  • log10_sigma_a_ell_nonnan (np.ndarray) – non-nan values of sigma_a for the considered line

  • log10_centroids (np.ndarray of shape (N_clusters,)) – values of the sigma_a centroids for the considered line

plot_hist_log10_f_Theta(log10_f_Theta: ndarray, log10_f_Theta_low: float, log10_f_Theta_high: float, list_log10_f_grid: ndarray, pdf_kde_log10_f_Theta: ndarray, ell: int) None[source]

plots histogram of \(log_{10}(f_\ell(\theta))\)

Parameters:
  • log10_f_Theta (np.ndarray of shape (-1, 1)) – array of values of \(\log_{10} f_\ell (\theta)\) for considered line

  • log10_f_Theta_low (float) – lower bound for \(\log_{10} f_\ell (\theta)\) for the considered line

  • log10_f_Theta_high (float) – upper bound for \(\log_{10} f_\ell (\theta)\) for the considered line

  • list_log10_f_grid (np.ndarray) – grid values of \(\log_{10} f_\ell (\theta)\) for the considered line

  • pdf_kde_log10_f_Theta (np.ndarray) – pdf of \(\log_{10} f_\ell (\theta)\) evaluated with a kernel density estimator

  • ell (int) – index of the line

plot_hist_log10_f_Theta_with_optim_results(log10_f_Theta: ndarray, log10_f_Theta_low: float, log10_f_Theta_high: float, list_log10_f_grid: ndarray, pdf_kde_log10_f_Theta: ndarray, n: int, ell: int, best_point: ndarray) None[source]

plots histogram of \(\log_{10} f_\ell (\theta)\)

Parameters:
  • log10_f_Theta (np.ndarray of shape (-1, 1)) – array of values of \(\log_{10} f_\ell (\theta)\) for considered line

  • log10_f_Theta_low (float) – lower bound for \(\log_{10} f_\ell (\theta)\) for the considered line

  • log10_f_Theta_high (float) – upper bound for \(\log_{10} f_\ell (\theta)\) for the considered line

  • list_log10_f_grid (np.ndarray) – grid values of \(\log_{10} f_\ell (\theta)\) for the considered line

  • pdf_kde_log10_f_Theta (np.ndarray) – pdf of \(\log_{10} f_\ell (\theta)\) evaluated with a kernel density estimator

  • n (int) – pixel / component index

  • ell (int) – index of the line

  • best_point (np.ndarray) – position for the best point, to be displayed

plot_params_with_sigma_a(df_best: DataFrame) None[source]
rewrite_logs_correct_json_format() None[source]

rewrites the log files with correct json format

sample_theta(lower_bounds: ndarray, upper_bounds: ndarray) ndarray[source]

sample \(\theta\) from Stratified MC in cube

Parameters:
  • K (int) – total number of samples per axis

  • lower_bounds (np.ndarray of shape (D,)) – lower bounds of cube on \(\theta\)

  • upper_bounds (np.ndarray of shape (D,)) – upper bounds of cube on \(\theta\)

Returns:

\(\theta\) samples

Return type:

np.ndarray of shape (N_samples, D)

save_results_in_data_folder(path_data: str, filename_err: str) None[source]
Parameters:
  • path_data (str) – path to the data folder

  • filename_err (str) – name of the file containing the sigma_a values

save_setup_to_json(n: int, ell: int, pbounds: Dict[str, ndarray]) None[source]

save optimization context and parameters

Parameters:
  • n (int) – pixel / component index

  • ell (int) – observable index

  • pbounds (dict[str, np.ndarray]) – contains the bounds on the parameters to be adjusted

setup_params_bounds() Tuple[ndarray, ndarray, ndarray, float, float][source]

sets the bounds on the parameters to be adjusted, defined here as transition interval center for a0 position and width for a1

Returns:

  • log10_f0 (np.ndarray of shape (N, L)) – values of \(f_\ell(\theta)\) at which additive and multiplicative noise variances are equal

  • bounds_a0_low (np.ndarray of shape (N, L)) – lower bounds on the center of the transition interval (defined as deltas around the log10_f0)

  • bounds_a0_high (np.ndarray of shape (N, L)) – upper bounds on the center of the transition interval (defined as deltas around the log10_f0)

  • bounds_a1_low (float) – lower value for the transition interval size

  • bounds_a1_high (float) – upper value for the transition interval size

setup_plot_text_sizes(small_size: int = 16, medium_size: int = 20, bigger_size: int = 24) None[source]

defines text sizes on matplotlib plots

Parameters:
  • small_size (int, optional) – size for basic text, axes titles, xticks and yticks, by default 16

  • medium_size (int, optional) – size of the axis labels, by default 20

  • bigger_size (int, optional) – size of the figure title, by default 24

sigma_a

standard deviations of the additive Gaussian noise

Type:

np.ndarray

sigma_m

standard deviation parameter of the multiplicative lognormal noise

Type:

np.ndarray

beetroots.approx_optim.nn_bo module

class beetroots.approx_optim.nn_bo.ApproxParamsOptimNNBO(list_lines: List[str], simu_name: str, D: int, D_no_kappa: int, K: int, log10_f_grid_size: int, N_samples_y: int, max_workers: int, sigma_a_raw: ndarray | float, sigma_m: ndarray | float, path_outputs: str, path_models: str, N_clusters_a_priori: int | None, small_size: int = 16, medium_size: int = 20, bigger_size: int = 24)[source]

Bases: ApproxParamsOptim, ApproxOptimNN, BayesianOptimizationApproach

class that performs likelihood parameter optimization using Bayesian optimization for a neural network forward map

Parameters:
  • list_lines (List[str]) – names of the observables for which the likelihood parameter needs to be adjusted

  • simu_name (str) – simu_name of the process, to be used in the output folder simu_name

  • D (int) – total number of physical parameters involved in the forward map

  • D_no_kappa (int) – total number of physical parameters involved in the forward map except for the scaling parameter \(\kappa\) (if it is part of the considered physical parameters)

  • K (int) – the number of sampled theta values is \(K^D\)

  • log10_f_grid_size (int) – number of points in the grid on \(\log f_\ell(\theta)\)

  • N_samples_y (int) – number of samples for \(y_\ell\)

  • max_workers (int) – maximum number of workers that can be used for optimization or results extraction

  • sigma_a_raw (Union[np.ndarray, float]) – standard deviation of the additive Gaussian noise

  • sigma_m (Union[np.ndarray, float]) – standard deviation parameter of the multiplicative lognormal noise

  • path_outputs (str) – path to the output folder (to be created), where the run results are to be saved

  • path_models (str) – path to the folder containing the forward models

  • N_clusters_a_priori (Optional[int]) – The number of different values of sigma_a_raw to consider for each line (and thus of optimization problem to solve per line), computed with a clustering algorithm on the N of values sigma_a_raw. Raises an error if N_clusters_a_priori > N.

  • small_size (int, optional) – size for basic text, axes titles, xticks and yticks, by default 16

  • medium_size (int, optional) – size of the axis labels, by default 20

  • bigger_size (int, optional) – size of the figure title, by default 24

main(dict_forward_model: Dict[str, str | bool | List[bool] | List[float]], lower_bounds_lin: ndarray | List, upper_bounds_lin: ndarray | List, n_iter: int)[source]

main method of the class, sets up the optimization problems and solves them

Parameters:
  • dict_forward_model (Dict[str, Union[str, bool, List[bool], List[float], Dict[str, float]]]) – contains the necessary information to load the forward model with the NeuralNetworkApprox

  • lower_bounds_lin (Union[np.ndarray, List]) – lower bounds on the physical parameters (in linear scale)

  • upper_bounds_lin (Union[np.ndarray, List]) – upper bounds on the physical parameters (in linear scale)

  • n_iter (int) – number of iterations for the Bayesian optimization

beetroots.approx_optim.nn_bo_real_data module

class beetroots.approx_optim.nn_bo_real_data.ReadDataRealData(list_lines: List[str])[source]

Bases: SimulationRealData

implements an __init__ method for SimulationRealData

Parameters:

list_lines (List[str]) – observables for which the likelihood approximation parameters are to be adjusted

beetroots.approx_optim.nn_bo_toycase module

Module contents