MID Benchmark Suite -- Datasets
One major component of the Maryland Inverse Design Benchmark Suite is the availability of datasets needed to train the various Inverse Design models. This page describes our general approach for how we construct these datasets and provides links to the various environments where datasets are located. In general, each environment will possess one or more datasets that was generated specific to that environment, though it is certainly possible to use datasets generated for one environment one another environment or ID task.
Dataset Management
The $(\bm{d}, \bm{b}, \bm{p})$---design, condition, performance---tuples generated by the environments are handled by the following two classes---DataEntry
and DataBase
. DataBase can be used to produce the required dataset object for inverse design learning.
DataEntry
class midbench.data.DataEntry(designs: Design, conditions: Condition, performances: Performance, specs: Union[SimSpec, OptSpec, str, NoneType]=None)
This is an iterable object for storing and processing the $(\bm{d,b,p})$ tuple that user received during a single simulation or optimization process. It is the output of the Env.simulate()
and Env.optimize()
method. It is built for easy indexing and slicing across all the three elements.
It can be subclassed to get environment specific DataBase.
-
Parameters
- designs (Design): The designs.
- conditions (Condition): The boundary conditions.
- performances (Performance): The corresponding performances.
- specs (Union[SimSpec, OptSpec, str, NoneType]): The specifications of the operation by which this data is obtained. This is for experiment replicability. Can also be filled with text or just left empty.
-
Methods
-
_is_compatible(designs: Design, conditions: Condition, performances: Performance) → Bool
This checks if the size of the first dimension of the designs, conditions and performances are compatible when a new
DataEntry
instance is to be created. - Paramaters - designs (Design): The designs. - conditions (Condition): The boundary conditions. - performances (Performance): The corresponding performances. - Returns - True or False - convert(design_specs: DesignSpec, c_units: Union[list, tuple, str], p_units: Union[list, tuple, str]) → DataEntry
This is for generating a new data entry of the given formats. - Paramaters - design_specs (DesignSpec): The specifications of the design space to which the current design variables are to be converted. - c_units (Union[list, tuple, str]): The boundary condition units sent to the convert method. - p_units (Union[list, tuple, str]): The performance units sent to the convert method. - Returns - A new
DataEntry
instance of the given formats. - ...maybe some statistics and plotting methods. - __getitem__(idx) → DataEntry - __iter__() - __next__() → DataEntry - Attributes - designs (Design) - conditions (Conditions) - performances (Performances) -
DataBase
class midbench.data.DataBase(*entries: DataEntry)
Database is a list-like mutable object for managing all the data entries that users collected over time, and producing unified dataset for inverse design training tasks. It should also provide necessary methods for curating, e.g., ones that can remove duplicate entries or produce dataset of certain diversity.
It can be subclassed to get environment specific DataBase.
-
Methods
-
generate_dataset(design_specs: DesignSpec, c_units: Union[list, tuple, str], p_units: Union[list, tuple, str], p_query: Tensor) → Dataset:
Generate the PyTorch dataset tensor of given format and size for inverse design training.
- Parameters
- design_specs (DesignSpec): The specifications of the design space to which the current design variables are to be converted.
- c_units (Union[list, tuple, str]): The boundary condition units sent to the convert method.
- p_units (Union[list, tuple, str]): The performance units sent to the convert method.
- p_query (Tensor): Only data entries of matched p_query can be selected to compose new dataset, as we don't need NaN.
- Returns
- A Pytorch or TensorFlow Dataset instance.
- Parameters
-
append(entry: DataEntry) → None:
- Parameters:
- entry (DataEntry) - The data entry to be merged into the database.
- Parameters:
-
diversify()
- view(class)
-
Dataset Compatibility
Even if DataEntry
and DataBase
might be environment specific through subclassing, different pairs of these two classes may still be compatible with each other as long as they have the same parent class. After all, what really matters here is the value of the dbp tuple. We provide a view()
method similar to the one in NumPy to enable easy migration between different DataEntry and DataBase classes.
Available Dataset
SU2 Airfoil 2D