beetroots.approx_optim package
Subpackages
- beetroots.approx_optim.approach_type package
- beetroots.approx_optim.forward_map package
Submodules
beetroots.approx_optim.abstract_approx_optim module
- class beetroots.approx_optim.abstract_approx_optim.ApproxParamsOptim(list_lines: List[str], simu_name: str, D: int, D_no_kappa: int, K: int, log10_f_grid_size: int, N_samples_y: int, max_workers: int, sigma_a_raw: ndarray | float, sigma_m: ndarray | float, path_outputs: str, path_models: str, N_clusters_a_priori: int | None, small_size: int = 16, medium_size: int = 20, bigger_size: int = 24)[source]
Bases:
ABCOptimization of an approximation parameter for the likelihood function
beetroots.modelling.likelihoods.approx_censored_add_mult.MixingModelsLikelihood. The optimization framework used to adjust this parameter is introduced in Appendix A of Palud et al. [2023].- Parameters:
list_lines (List[str]) – names of the observables for which the likelihood parameter needs to be adjusted
simu_name (str) – simu_name of the process, to be used in the output folder simu_name
D (int) – total number of physical parameters involved in the forward map
D_no_kappa (int) – total number of physical parameters involved in the forward map except for the scaling parameter \(\kappa\) (if it is part of the considered physical parameters)
K (int) – the number of sampled theta values is \(K^D\)
log10_f_grid_size (int) – number of points in the grid on \(\log f_\ell(\theta)\)
N_samples_y (int) – number of samples for \(y_\ell\)
max_workers (int) – maximum number of workers that can be used for optimization or results extraction
sigma_a_raw (Union[np.ndarray, float]) – standard deviation of the additive Gaussian noise
sigma_m (Union[np.ndarray, float]) – standard deviation parameter of the multiplicative lognormal noise
path_outputs (str) – path to the output folder (to be created), where the run results are to be saved
path_models (str) – path to the folder containing the forward models
N_clusters_a_priori (Optional[int]) – The number of different values of sigma_a_raw to consider for each line (and thus of optimization problem to solve per line), computed with a clustering algorithm on the N of values sigma_a_raw. Raises an error if N_clusters_a_priori > N.
small_size (int, optional) – size for basic text, axes titles, xticks and yticks, by default 16
medium_size (int, optional) – size of the axis labels, by default 20
bigger_size (int, optional) – size of the figure title, by default 24
- D
total number of physical parameters involved in the forward map
- Type:
int
- D_no_kappa
total number of physical parameters involved in the forward map except for the scaling parameter \(\kappa\) (if it is part of the considered physical parameters)
- Type:
int
- K
the number of sampled theta values is
K^D_sampling- Type:
int
- L
number of observables for which the likelihood parameter needs to be adjusted
- Type:
int
- MODELS_PATH
path to the folder containing all the already defined and saved models (i.e., polynomials or neural networks)
- Type:
str
- N
number of pixels / components for which the optimization needs to be performed
- Type:
int
- N_clusters
The number of different values of sigma_a to consider for each line (and thus of optimization problem to solve per line), computed with a clustering algorithm on the N of values sigma_a.
- Type:
Optional[int]
- N_optim_per_line
number of optimization procedures to run per line
- Type:
int
- N_samples_theta
number of samples for \(\theta\) used to build the histogram of \(\log_{10} f_\ell(\theta)\). To be defined in the daughter classes.
- Type:
int
- N_samples_y
number of samples for \(y_\ell\)
- Type:
int
- check_num_uniques(sigma_a_raw: ndarray, N_clusters_a_priori: int | None) int | None[source]
Sets the number of clusters to consider N_clusters, that is, the number of optimization procedure to run per line. There are 4 possible cases:
case 1: set N_clusters with the number of distinct values of sigma_a
case 2: the number of distinct values of sigma_a is lower than the number of clusters provided by the user. In this case, use the value that minimizes the number of optimization procedures to run.
case 3: use the number of clusters indicated by the user.
case 4: last case : run one optim per pixel, ie run self.N optimizations.
- Parameters:
sigma_a_raw (np.ndarray of shape (N, L)) – set of standard deviations in the
N_clusters_a_priori (Optional[int]) – number of clusters to consider indicated by the user
- Returns:
definitive number of clusters to consider
- Return type:
Optional[int]
- cluster_sigma_a_raw(sigma_a_raw: ndarray) ndarray[source]
runs self.L k-means clustering algorithms (one per line) on the sets of standard deviation of the additive noise sigma_a. The number of clusters is defined with self.N_clusters. The obtained dataframe is saved as a csv file.
- Parameters:
sigma_a_raw (np.ndarray of shape (N, L)) – array of all the sigma_a values associated with the observations.
- Returns:
reduced sigma_a array, with N_clusters lines instead of N (with N_clusters potentially much smaller than N)
- Return type:
np.ndarray of shape (N_clusters, L)
- create_empty_output_folders(simu_name: str, path_outputs: str) None[source]
creates the directories that receive the results of the likelihood parameter optimization
- Parameters:
simu_name (str) – name of the simulation to be run
path_yaml_file (str) – path of the folder containing the data and yaml files
path_outputs (str) – folder where to write outputs
- extract_optimal_params() DataFrame[source]
extracts the adjusted likelihood parameters from the log files and gather them in a DataFrame
- Returns:
DataFrame with the set of evaluated optimal parameters for each component n and line ell
- Return type:
pd.DataFrame
- list_lines
names of the observables for which the likelihood parameter needs to be adjusted
- Type:
List[str]
- log10_f_grid_size
number of points in the grid on \(\log_{10} f_\ell(\theta)\)
- Type:
int
- max_workers
maximum number of workers that can be used for optimization or results extraction
- Type:
int
- classmethod parse_args() Tuple[str, str, str, str][source]
parses the inputs of the command-line, that should contain
the name of the input YAML file
path to the data folder
path to the models folder
path to the outputs folder to be created (by default ‘.’)
- Returns:
str – name of the input YAML file
str – path to the data folder
str – path to the models folder
str – path to the outputs folder to be created (by default ‘.’)
- path_centroids
path to the csv file containing the sigma_a centroid values for each cluster (saved in output folder)
- Type:
str
- path_intermediate_result
path to the intermediate results (per cluster, saved in output folder)
- Type:
str
- plot_clusters_sigma_a(line: str, log10_sigma_a_ell_nonnan: ndarray, log10_centroids: ndarray) None[source]
plots one one-dimension histogram on the log10 of the standard deviation on the additive noise in the observation, and the computed centroids.
- Parameters:
line (str) – name of the line
log10_sigma_a_ell_nonnan (np.ndarray) – non-nan values of sigma_a for the considered line
log10_centroids (np.ndarray of shape (N_clusters,)) – values of the sigma_a centroids for the considered line
- plot_hist_log10_f_Theta(log10_f_Theta: ndarray, log10_f_Theta_low: float, log10_f_Theta_high: float, list_log10_f_grid: ndarray, pdf_kde_log10_f_Theta: ndarray, ell: int) None[source]
plots histogram of \(log_{10}(f_\ell(\theta))\)
- Parameters:
log10_f_Theta (np.ndarray of shape (-1, 1)) – array of values of \(\log_{10} f_\ell (\theta)\) for considered line
log10_f_Theta_low (float) – lower bound for \(\log_{10} f_\ell (\theta)\) for the considered line
log10_f_Theta_high (float) – upper bound for \(\log_{10} f_\ell (\theta)\) for the considered line
list_log10_f_grid (np.ndarray) – grid values of \(\log_{10} f_\ell (\theta)\) for the considered line
pdf_kde_log10_f_Theta (np.ndarray) – pdf of \(\log_{10} f_\ell (\theta)\) evaluated with a kernel density estimator
ell (int) – index of the line
- plot_hist_log10_f_Theta_with_optim_results(log10_f_Theta: ndarray, log10_f_Theta_low: float, log10_f_Theta_high: float, list_log10_f_grid: ndarray, pdf_kde_log10_f_Theta: ndarray, n: int, ell: int, best_point: ndarray) None[source]
plots histogram of \(\log_{10} f_\ell (\theta)\)
- Parameters:
log10_f_Theta (np.ndarray of shape (-1, 1)) – array of values of \(\log_{10} f_\ell (\theta)\) for considered line
log10_f_Theta_low (float) – lower bound for \(\log_{10} f_\ell (\theta)\) for the considered line
log10_f_Theta_high (float) – upper bound for \(\log_{10} f_\ell (\theta)\) for the considered line
list_log10_f_grid (np.ndarray) – grid values of \(\log_{10} f_\ell (\theta)\) for the considered line
pdf_kde_log10_f_Theta (np.ndarray) – pdf of \(\log_{10} f_\ell (\theta)\) evaluated with a kernel density estimator
n (int) – pixel / component index
ell (int) – index of the line
best_point (np.ndarray) – position for the best point, to be displayed
- sample_theta(lower_bounds: ndarray, upper_bounds: ndarray) ndarray[source]
sample \(\theta\) from Stratified MC in cube
- Parameters:
K (int) – total number of samples per axis
lower_bounds (np.ndarray of shape (D,)) – lower bounds of cube on \(\theta\)
upper_bounds (np.ndarray of shape (D,)) – upper bounds of cube on \(\theta\)
- Returns:
\(\theta\) samples
- Return type:
np.ndarray of shape (N_samples, D)
- save_results_in_data_folder(path_data: str, filename_err: str) None[source]
- Parameters:
path_data (str) – path to the data folder
filename_err (str) – name of the file containing the sigma_a values
- save_setup_to_json(n: int, ell: int, pbounds: Dict[str, ndarray]) None[source]
save optimization context and parameters
- Parameters:
n (int) – pixel / component index
ell (int) – observable index
pbounds (dict[str, np.ndarray]) – contains the bounds on the parameters to be adjusted
- setup_params_bounds() Tuple[ndarray, ndarray, ndarray, float, float][source]
sets the bounds on the parameters to be adjusted, defined here as transition interval center for
a0position and width fora1- Returns:
log10_f0 (np.ndarray of shape (N, L)) – values of \(f_\ell(\theta)\) at which additive and multiplicative noise variances are equal
bounds_a0_low (np.ndarray of shape (N, L)) – lower bounds on the center of the transition interval (defined as deltas around the
log10_f0)bounds_a0_high (np.ndarray of shape (N, L)) – upper bounds on the center of the transition interval (defined as deltas around the
log10_f0)bounds_a1_low (float) – lower value for the transition interval size
bounds_a1_high (float) – upper value for the transition interval size
- setup_plot_text_sizes(small_size: int = 16, medium_size: int = 20, bigger_size: int = 24) None[source]
defines text sizes on matplotlib plots
- Parameters:
small_size (int, optional) – size for basic text, axes titles, xticks and yticks, by default 16
medium_size (int, optional) – size of the axis labels, by default 20
bigger_size (int, optional) – size of the figure title, by default 24
- sigma_a
standard deviations of the additive Gaussian noise
- Type:
np.ndarray
- sigma_m
standard deviation parameter of the multiplicative lognormal noise
- Type:
np.ndarray
beetroots.approx_optim.nn_bo module
- class beetroots.approx_optim.nn_bo.ApproxParamsOptimNNBO(list_lines: List[str], simu_name: str, D: int, D_no_kappa: int, K: int, log10_f_grid_size: int, N_samples_y: int, max_workers: int, sigma_a_raw: ndarray | float, sigma_m: ndarray | float, path_outputs: str, path_models: str, N_clusters_a_priori: int | None, small_size: int = 16, medium_size: int = 20, bigger_size: int = 24)[source]
Bases:
ApproxParamsOptim,ApproxOptimNN,BayesianOptimizationApproachclass that performs likelihood parameter optimization using Bayesian optimization for a neural network forward map
- Parameters:
list_lines (List[str]) – names of the observables for which the likelihood parameter needs to be adjusted
simu_name (str) – simu_name of the process, to be used in the output folder simu_name
D (int) – total number of physical parameters involved in the forward map
D_no_kappa (int) – total number of physical parameters involved in the forward map except for the scaling parameter \(\kappa\) (if it is part of the considered physical parameters)
K (int) – the number of sampled theta values is \(K^D\)
log10_f_grid_size (int) – number of points in the grid on \(\log f_\ell(\theta)\)
N_samples_y (int) – number of samples for \(y_\ell\)
max_workers (int) – maximum number of workers that can be used for optimization or results extraction
sigma_a_raw (Union[np.ndarray, float]) – standard deviation of the additive Gaussian noise
sigma_m (Union[np.ndarray, float]) – standard deviation parameter of the multiplicative lognormal noise
path_outputs (str) – path to the output folder (to be created), where the run results are to be saved
path_models (str) – path to the folder containing the forward models
N_clusters_a_priori (Optional[int]) – The number of different values of sigma_a_raw to consider for each line (and thus of optimization problem to solve per line), computed with a clustering algorithm on the N of values sigma_a_raw. Raises an error if N_clusters_a_priori > N.
small_size (int, optional) – size for basic text, axes titles, xticks and yticks, by default 16
medium_size (int, optional) – size of the axis labels, by default 20
bigger_size (int, optional) – size of the figure title, by default 24
- main(dict_forward_model: Dict[str, str | bool | List[bool] | List[float]], lower_bounds_lin: ndarray | List, upper_bounds_lin: ndarray | List, n_iter: int)[source]
main method of the class, sets up the optimization problems and solves them
- Parameters:
dict_forward_model (Dict[str, Union[str, bool, List[bool], List[float], Dict[str, float]]]) – contains the necessary information to load the forward model with the
NeuralNetworkApproxlower_bounds_lin (Union[np.ndarray, List]) – lower bounds on the physical parameters (in linear scale)
upper_bounds_lin (Union[np.ndarray, List]) – upper bounds on the physical parameters (in linear scale)
n_iter (int) – number of iterations for the Bayesian optimization
beetroots.approx_optim.nn_bo_real_data module
- class beetroots.approx_optim.nn_bo_real_data.ReadDataRealData(list_lines: List[str])[source]
Bases:
SimulationRealDataimplements an
__init__method forSimulationRealData- Parameters:
list_lines (List[str]) – observables for which the likelihood approximation parameters are to be adjusted