概览
MLatom的Python API(简称PyAPI)主要有三个部分: mlatom.data
, mlatom.models
,和 mlatom.simulations
。
使用 mlatom.data
可以创建/操作/保存/加载化学数据到 mlatom.data.atom
, mlatom.data.molecule
,和 mlatom.data.molecular_database
中。
mlatom.models
包含许多基于量子力学和机器学习的计算化学模型,可对分子进行预测。这些模型可分为3类:
methods :直接使用现有方法的模型,无需用户进行训练。
ml_model :需要用户训练的机器学习模型。
model_tree_node :由其他模型对象组成的模型(
mlatom.models.methods
,mlatom.models.ml_model
或mlatom.models.model_tree_node
)。
mlatom.simulations
使用 mlatom.models
来执行分子模拟,例如几何优化或动力学。
这里我们提供了一个说明这些组件用法的 简单示例 以供参考。
Data
!---------------------------------------------------------------------------!
! data: Module for working with data !
! Implementations by: Pavlo O. Dral, Fuchun Ge, !
! Shuang Zhang, Yi-Fan Hou, Yanchi Ou !
!---------------------------------------------------------------------------!
- class mlatom.data.atom(nuclear_charge: int | None = None, atomic_number: int | None = None, element_symbol: str | None = None, nuclear_mass: float | None = None, xyz_coordinates: ndarray | List | None = None)[源代码]
创建一个原子对象。
- 参数:
nuclear_charge (int, optional) – 提供核电荷来定义原子。
atomic_number (int, optional) – 提供原子序数来定义原子。
element_symbol (int, optional) – 提供元素符号来定义原子。
nuclear_mass (int, optional) – 提供原子核质量来定义原子。
xyz_coordinates (Array-like, optional) – 在笛卡尔坐标系中指定原子的位置。
- class mlatom.data.molecule(charge: int = 0, multiplicity: int = 1, atoms: List[atom] = None, pbc: ndarray | bool | None = None, cell: ndarray | None = None)[源代码]
创建一个分子对象。
- 参数:
charge (float, optional) – 指定分子的电荷。
multiplicity (int, optional) – 指定分子的多重度。
atoms (List[
atom
], optional) – 指定分子中的原子。
示例
选择一个带下标的原子:
from mlatom.data import atom, molecule at = atom(element_symbol = 'C') mol = molecule(atoms = [at]) print(id(at), id(mol[0]))
- id
这个分子的唯一ID。
- charge
分子的电荷。
- multiplicity
分子的多重度。
- load(filename: stringe, format: string):
从转储文件中加载一个分子对象。
Updates a molecule object if initialized:
mol = molecule(); mol.load(filename='mymol.json')
Returns a molecule object if called as class method:
mol = molecule.load(filename='mymol.json')
- 参数:
filename (str): filename or path
format (str, optional): currently, only ‘json’ format is supported.
- property pbc
The periodic boundary conditions of the molecule. Setting it with
mol.pbc = True
is equal tomol.pbc = [True, True, True]
.
- property cell
The matrix of 3 vectors that defines the unicell. The setter of it simply wraps ase.geometry.cell.cellpar_to_cell().
- property cell_coordinates: ndarray
The relative coordinates in the cell.
- read_from_xyz_file(filename: str, format: str | None = None) molecule [源代码]
从xyz文件加载分子构型。
如果没有指定参数格式
format
,可以读取标准xyz格式的数据。支持的其他格式有:
'COLUMBUS'
'NEWTON-X'
or'NX'
'turbomol'
- 参数:
filename (str) – 待读取的文件的名称。
format (str, optional) – 文件的格式。
- read_from_xyz_string(string: str = None, format: str | None = None) molecule [源代码]
从xyz字符串加载分子的几何构型。
如果没有指定参数格式
format
,可以读取标准xyz格式的数据。支持的其他格式有:
'COLUMBUS'
'NEWTON-X'
or'NX'
'turbomol'
- 参数:
string (str) – 字符串输入。
format (str, optional) – 字符串的格式。
- read_from_numpy(coordinates: ndarray, species: ndarray) molecule [源代码]
从一个含有坐标的numpy数组以及一个包含分子种类的numpy数组中加载分子结构。
坐标
coordinates
的输入格式为(N, 3)
,species
的输入格式为(N,)
其中
N
代表原子个数。
- read_from_smiles_string(smi_string: str) molecule [源代码]
根据提供的SMILES字符串生成分子的结构。
使用 Pybel 的
make3D()
方法生成优化后的几何构型。
- classmethod from_xyz_file(filename: str, format: str | None = None) molecule [源代码]
molecule.read_from_xyz_file()
的类方法版本,返回一个molecule
对象。
- classmethod from_xyz_string(string: str = None, format: str | None = None) molecule [源代码]
molecule.read_from_xyz_string()
的类方法版本,返回一个molecule
对象。
- classmethod from_numpy(coordinates: ndarray, species: ndarray) molecule [源代码]
molecule.read_from_numpy()
的类方法版本,返回一个molecule
对象。
- classmethod from_smiles_string(smi_string: str) molecule [源代码]
molecule.read_from_smiles_string()
的类方法版本,返回一个molecule
对象。
- add_scalar_property(scalar, property_name: str = 'y') None [源代码]
为分子添加标量属性。这个性质可以通过
molecule.<property_name>
来调用。- 参数:
scalar – 要添加的标量。
property_name (str, optional) – 为标量属性设定的名称。
- add_xyz_derivative_property(derivative, property_name: str = 'y', xyz_derivative_property: str = 'xyz_derivatives') None [源代码]
为分子添加xyz导数属性。
- 参数:
derivative – 要添加的导数属性。
property_name (str, optional) – 关联的非导数属性的名称。
xyz_derivative_property (str, optional) – 为导数属性设定的名称。
- add_xyz_vectorial_property(vector, xyz_vectorial_property: str = 'xyz_vector') None [源代码]
为分子添加xyz矢量属性。
- 参数:
vector – 要添加的矢量。
xyz_vectorial_property (str, optional) – 为矢量属性设定的名称。
- write_file_with_xyz_coordinates(filename: str, format: str | None = None) None [源代码]
将分子几何数据写入文件。如果没有指定参数格式
format
,则读取标准xyz格式的数据。支持的其他格式有:
'COLUMBUS'
'NEWTON-X'
or'NX'
'turbomol'
- 参数:
filename (str) – 待写入的文件的名称。
format (str, optional) – 文件的格式。
- property atomic_numbers: ndarray
分子中的原子个数。
- property element_symbols: ndarray
分子中原子的元素符号。
- property smiles: str
分子的SMILES表示。
- property xyz_coordinates: ndarray
分子的xyz构型。
- property kinetic_energy: float
根据速度的xyz文件给出动能(A.U.)。
- proliferate(shifts: Iterable | None = None, XYZshifts: Iterable | None = None, Xshifts: Iterable | None = [0], Yshifts: Iterable | None = [0], Zshifts: Iterable | None = [0], PBC_constrained: bool = True) molecule [源代码]
Proliferate the unicell by specified shifts along cell vectors (called X/Y/Z here).
Returns a new
molecule
object.- 参数:
shifts (Iterable, optional) – The list of shifts to perform. Each shift should be a 3D vector that indicates the coefficient applies to the corresponding cell vector.
XYZshifts (Iterable, optional) – Generate all possible shifts with given shift coefficients in all three directions when a list is specified. When a list of 3 lists is specified, it’s equal to setting X/Y/Zshifts
Xshifts (Iterable, optional) – Specify all possible shift coefficients in the direction of the first cell vector.
Yshifts (Iterable, optional) – Specify all possible shift coefficients in the direction of the second cell vector.
Zshifts (Iterable, optional) – Specify all possible shift coefficients in the direction of the third cell vector.
PBC_constrained (bool) – Controls whether the shifts in some directions are disabled where corresponding PBC is false. Only applies to XYZshifts.
备注
- Priorities for different types of shifts:
shifts
>XYZshifts
>X/Y/Zshifts
示例
Single H atom in the centre of a cubic cell (2x2x2):
mol = ml.molecule.from_numpy(np.ones((1, 3)), np.array([1])) mol.pbc = True mol.cell = 2
Proliferate to get two periods in all three directions, with shifts:
new_mol = mol.proliferate( shifts = [ [0, 0, 0], [1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 1, 0], [1, 0, 1], [0, 1, 1], [1, 1, 1], ] )
with XYZshifts:
new_mol = mol.proliferate(XYZshifts=range(2)) # or new_mol = mol.proliferate(XYZshifts=[range(2)]*3)
with X/Y/Zshifts:
new_mol = mol.proliferate(Xshifts=range(2), Yshifts=(0, 1), Zshifts=[0, 1]))
All scripts above will make
new_mol.xyz_coordinates
be:array([[1., 1., 1.], [3., 1., 1.], [1., 3., 1.], [3., 3., 1.], [1., 1., 3.], [3., 1., 3.], [1., 3., 3.], [3., 3., 3.]])
- property state_energies: ndarray
分子的电子态能量。
- property state_gradients: ndarray
分子的电子态能量梯度。
- property energy_gaps: ndarray
不同状态的能隙。
- property excitation_energies: ndarray
分子从基态的激发能。
- property nstates: np.int
The number of electronic states.
- get_xyzvib_string(normal_mode=0)[源代码]
Get the xyz string with geometries and displacements along the vibrational normal modes
- view(normal_mode=None, slider=True)[源代码]
Visualize the molecule and its vibrations if requested. Uses
py3Dmol
. :param normal_mode: the index of a normal mode to visualize. Default: None. :type normal_mode: integer, optional :param slider: show interactive slider to choose the mode.Default: True (only works if normal_mode is not None).
- class mlatom.data.molecular_database(molecules: List[molecule] = None)[源代码]
为分子对象生成一个数据库。
- 参数:
molecules (List[
molecule
]) – 包含在分子数据库中的分子列表。
示例
选择一个带下标的原子:
from mlatom.data import atom, molecule, molecular_database at = atom(element_symbol = 'C') mol = molecule(atoms = [at]) molDB = molecular_database([mol]) print(id(mol) == id(molDB[0])) # the output should be 'True'
像numpy数组一样对数据库进行切片:
from mlatom.data import molecular_database molDB = molecular_database.from_xyz_file('devtests/al/h2_fci_db/xyz.dat') print(len(molDB)) # 451 print(len(molDB[:100:4])) # 25
- read_from_xyz_file(filename: str, append: bool = False) molecular_database [源代码]
从xyz文件加载分子构型。
- 参数:
filename (str) – 待读取的文件的名称。
append (bool, optional) – 若为True,则追加到当前数据库,否则清除当前数据库。
- read_from_xyz_string(string: str, append=False) molecular_database [源代码]
从xyz字符串加载分子的几何构型。
- 参数:
string (str) – 待读取的文件的名称。
append (bool, optional) – 若为True,则追加到当前数据库,否则清除当前数据库。
- read_from_numpy(coordinates: ndarray, species: ndarray, append: bool = False) molecular_database [源代码]
从一个包含坐标的numpy数组以及一个包含分子类别的numpy数组中加载多个分子结构。
坐标
coordinates
的输入格式为(M, N, 3)
,species
的输入格式为(M, N,)
。其中
N
是原子数,M
是分子数。
- read_from_smiles_file(smi_file: str, append: bool = False) molecular_database [源代码]
从提供的SMILES文件生成分子几何构型。
使用 Pybel 的
make3D()
方法生成优化后的几何构型。
- read_from_smiles_string(smi_string: str, append: bool = False) molecular_database [源代码]
从提供的SMILES字符串生成分子几何构型。
使用 Pybel 的
make3D()
方法生成优化后的几何构型。
- read_from_h5_file(h5_file: str = '', properties: list = None, parallel: bool | int | tuple = False, verbose: bool = False) molecular_database [源代码]
Generate molecular database from formatted h5 file. The first level should be configurations (or ensemble of molecules with same number of atoms) and the second level should be conformations and their properties. ‘species’ and ‘coordinates’ are required to construct molecule. An example format of h5 file:
` /003 dict /003/species array (624, 3) [int8] /003/coordinates array (624, 3, 3) [float32] /003/energies array (624,) [float64] /003/property1 ['wb97x/def2tzvpp'] /003/property2 array (624, 2) [int8] `
If the first two dimensions of the size of the value equals (number_of_configurations, number_of_atoms), the remaining dimension of the value will be assigned to each atom as xyz derivative properties. If the first dimensions of the size of the value equals to number of configurations, corresponging value will be assigned to each molecule. If only one value is provided for the property, it will be copied into each molecule. For example, in the above case, the properties stored in each molecule object would be: {‘energies’: float, ‘property1’:’wb97x/def2tzvpp’, ‘property2’: numpy.ndarray of size (2,0)}
- 参数:
h5file (str) – path to h5 file.
properties (list) – the properties to be stored in molecular database. By default all the properties presented in h5 file will be stored.
parallel (int or tuple or bool) –
If int is provided, the value will be assigned to the number of workers, Batch size will be calculated automatically.
If tuple is provided, the first value will be assigned to the number of workers and the second value will be assigned to the batch size.
If bool is provided, True means all the CPUs available will be used and batch size will be adjusted accordingly.
verbose (bool) – whether to print the loading message.
- classmethod from_xyz_file(filename: str) molecular_database [源代码]
molecular_database.read_from_xyz_file()
的类方法版本,返回一个molecular_database
对象。
- classmethod from_xyz_string(string: str) molecular_database [源代码]
molecular_database.read_from_xyz_string()
的类方法版本,返回一个molecular_database
对象。
- classmethod from_numpy(coordinates: ndarray, species: ndarray) molecular_database [源代码]
molecular_database.read_from_numpy()
的类方法版本,返回一个molecular_database
对象。
- classmethod from_smiles_file(smi_file: str) molecular_database [源代码]
molecular_database.read_from_smiles_file()
的类方法版本,返回一个molecular_database
对象。
- classmethod from_smiles_string(smi_string: str | List) molecular_database [源代码]
molecular_database.read_from_smiles_string()
的类方法版本,返回一个molecular_database
对象。
- add_scalar_properties(scalars, property_name: str = 'y') None [源代码]
给分子添加标量属性。
- 参数:
scalars – 要添加的标量。
property_name (str, optional) – 为标量属性设定的名称。
- add_scalar_properties_from_file(filename: str, property_name: str = 'y') None [源代码]
将文件中的标量属性添加到分子中。
- 参数:
filename (str) – 指定包含属性的文本文件。
property_name (str, optional) – 为标量属性设定的名称。
- add_xyz_derivative_properties(derivatives, property_name: str = 'y', xyz_derivative_property: str = 'xyz_derivatives') None [源代码]
为分子添加xyz导数属性。
- 参数:
derivatives – 要添加的导数。
property_name (str, optional) – 关联的非导数属性的名称。
xyz_derivative_property (str, optional) – 为导数属性设定的名称。
- add_xyz_derivative_properties_from_file(filename: str, property_name: str = 'y', xyz_derivative_property: str = 'xyz_derivatives') None [源代码]
将导数的xyz文件文本添加到分子中。
- 参数:
filename (str) – 待添加导数的文件名。
property_name (str, optional) – 关联的非导数属性的名称。
xyz_derivative_property (str, optional) – 为导数属性设定的名称。
- add_xyz_vectorial_properties(vectors, xyz_vectorial_property: str = 'xyz_vector') None [源代码]
给分子添加一个xyz矢量属性。
- 参数:
vectors – 要添加的矢量。
xyz_vectorial_property (str, optional) – 为矢量属性设定的名称。
- add_xyz_vectorial_properties_from_file(filename: str, xyz_vectorial_property: str = 'xyz_vector') None [源代码]
将导数的xyz文件文本添加到分子中。
- 参数:
filename (str) – 待添加矢量属性的文件名。
xyz_vectorial_property (str, optional) – 为矢量属性设定的名称。
- write_file_with_xyz_coordinates(filename: str) None [源代码]
将分子几何构型写入xyz文件中。
- 参数:
filename (str) – 待写入的文件的名称。
- property atomic_numbers: ndarray
数据库中所有分子中每个原子的原子序数的二维数组。
- property element_symbols: ndarray
数据库中所有分子每个原子元素符号的二维数组。
- property ids
数据库中分子的ID。
- property smiles: str
数据库中分子的SMILES字符串。
- property nuclear_masses
数据库中分子的核质量。
- property charges
数据库中分子的电荷。
- property multiplicities
数据库中分子的多重度。
- write_file_with_xyz_derivative_properties(filename, xyz_derivative_property_to_write='xyz_derivatives')[源代码]
将xyz导数属性写入文件。
- write_file_with_xyz_vectorial_properties(filename, xyz_vectorial_property_to_write='xyz_vector')[源代码]
将xyz矢量属性写入文件。
- proliferate(*args, **kwargs) molecular_database [源代码]
Proliferate the unicell by specified shifts along cell vectors.
Returns a new
molecular_databse
object.Check
molecule.proliferate()
for details on options.
- split(sampling='random', number_of_splits=2, split_equally=None, fraction_of_points_in_splits=None)[源代码]
Splits molecular database.
- 参数:
sampling (str, optional) – default ‘random’. Can be also ‘none’.
split_equally (bool, optinoal) – default
False
; if set toTrue
splits 50:50.fraction_of_points_in_splits (list, optional) – e.g., [0.8, 0.2] is the default one
indices
- property xyz_coordinates
各个分子中每个原子的xyz坐标。
- class mlatom.data.molecular_trajectory(steps=None)[源代码]
用于存储/访问分子轨迹数据的类,这些数据是由动力学或几何优化生成的。
- dump(filename=None, format=None)[源代码]
将分子轨迹molecular_trajectory转储到文件中。
可用的格式有:
'h5md'
(需要Python模块h5py
和pyh5md
)'json'
'plain_text'
- to_database() molecular_database [源代码]
Return a molecular database comprising the molecules in the trajectory.
- class mlatom.data.h5md(filename: str, data: Dict[str, Any] = {}, mode: str = 'w')[源代码]
将轨迹数据保存为 H5MD 格式文件
- 参数:
filename (str) – 待输出的h5md文件的文件名。
data (Dict) – 待存储的数据(可选,如果用户为此项提供参数,文件将在存储数据后关闭)。
mode (str, optional) – 控制文件处理模式的字符串 (默认值:‘ w ’表示新文件,‘ r+ ’表示现有文件)。下表中列出了与
pyh5md.File()
一致的选项
r
只读,文件必须已经存在
r+
可读可写,文件必须已经存在
w
创建文件,如果文件存在则覆盖
w- or x
创建文件,如果文件存在则报错
示例:
traj0 = h5md('traj.h5') # open 'traj.h5' traj1 = h5md('/tmp/test.h5', mode='r') # open an existing file in readonly mode traj2 = h5md('/tmp/traj2.h5', data={'time': 1.0, 'total_energy': -32.1, 'test': 8848}) # add some data to the file, then close the file traj0.write(data) # write data to opened file traj0(data) # an alternative way to write data data = traj0.export() # export the data in the opened file data = traj0() # an alternative way to export data with h5md('test.h5') as traj: # export with a with statement data = traj.export() traj0.close() # close the file
备注
HDF5文件中的默认数据路径
- particles/all:
‘box’, ‘gradients’, ‘mass’, ‘nad’, ‘names’, ‘position’, ‘species’, ‘velocities’
- observables:
‘angular_momentum’, ‘generated_random_number’, ‘kinetic_energy’, ‘linear_momentum’, ‘nstatdyn’, ‘oscillator_strengths’, ‘populations’, ‘potential_energy’, ‘random_seed’, ‘sh_probabilities’, ‘total_energy’, ‘wavefunctions’,
以及其他关键字
- h5
HDF5文件对象
- __call__() Dict[str, ndarray]
导出打开的H5文件中的数据。
- 返回:
H5文件中轨迹数据的字典。
Models
!---------------------------------------------------------------------------!
! models: Module with models !
! Implementations by: Pavlo O. Dral, Fuchun Ge, Yi-Fan Hou, Yuxinxin Chen, !
! Peikun Zheng !
!---------------------------------------------------------------------------!
- class mlatom.models.model[源代码]
Parent (super) class for models to enable useful features such as logging during geometry optimizations.
- config_multiprocessing()[源代码]
for scripts that need to be executed before running model in parallel
- predict(molecular_database: molecular_database = None, molecule: molecule = None, calculate_energy: bool = False, calculate_energy_gradients: bool = False, calculate_hessian: bool = False, **kwargs)[源代码]
使用模型预测分子的几何构型。
- 参数:
molecular_database (
mlatom.data.molecular_database
, optional) – 数据库中包含需要由模型预测其性质的分子。molecule (
mlatom.models.molecule
, optional) – 需要用模型预测其性质的分子对象。calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。
methods
!---------------------------------------------------------------------------!
! models: Module with models !
! Implementations by: Pavlo O. Dral, Fuchun Ge, Yi-Fan Hou, Yuxinxin Chen, !
! Peikun Zheng !
!---------------------------------------------------------------------------!
- class mlatom.models.methods(method: str = None, program: str = None, **kwargs)[源代码]
用指定的方法创建一个模型对象。
- 参数:
method (str) – 指定方法。下一节列出了可用的方法。
program (str, optional) – 指定要使用的程序。
**kwargs – 其他对于方法的特定选项。
可用的方法:
'AIQM1'
,'AIQM1@DFT'
,'AIQM1@DFT*'
,'AM1'
,'ANI-1ccx'
,'ANI-1x'
,'ANI-1x-D4'
,'ANI-2x'
,'ANI-2x-D4'
,'CCSD(T)*/CBS'
,'CNDO/2'
,'D4'
,'DFTB0'
,'DFTB2'
,'DFTB3'
,'GFN2-xTB'
,'MINDO/3'
,'MNDO'
,'MNDO/H'
,'MNDO/d'
,'MNDO/dH'
,'MNDOC'
,'ODM2'
,'ODM2*'
,'ODM3'
,'ODM3*'
,'OM1'
,'OM2'
,'OM3'
,'PM3'
,'PM6'
,'RM1'
,'SCC-DFTB'
,'SCC-DFTB-heats'
.上面列出的方法可以不指定程序直接使用。如安装手册中所述,仍然需要安装所需的程序。
可用程序及其相应方法:
程序
方法
TorchANI
'AIQM1'
,'AIQM1@DFT'
,'AIQM1@DFT*'
,'ANI-1ccx'
,'ANI-1x'
,'ANI-1x-D4'
,'ANI-2x'
,'ANI-2x-D4'
,'ANI-1xnr'
dftd4
'AIQM1'
,'AIQM1@DFT'
,'ANI-1x-D4'
,'ANI-2x-D4'
,'D4'
MNDO or Sparrow
'AIQM1'
,'AIQM1@DFT'
,'AIQM1@DFT*'
,'MNDO'
,'MNDO/d'
,'ODM2*'
,'ODM3*'
,'OM2'
,'OM3'
,'PM3'
,'SCC-DFTB'
,'SCC-DFTB-heats'
MNDO
'CNDO/2'
,'MINDO/3'
,'MNDO/H'
,'MNDO/dH'
,'MNDOC'
,'ODM2'
,'ODM3'
,'OM1'
, 半经验的 OMx, DFTB, NDDO-type 方法Sparrow
'DFTB0'
,'DFTB2'
,'DFTB3'
,'PM6'
,'RM1'
, 半经验的OMx, DFTB, NDDO-type 方法xTB
'GFN2-xTB'
,半经验GFNx-TB方法Orca
'CCSD(T)*/CBS'
, DFTGaussian
从头算方法,DFT
PySCF
从头算方法,DFT
- property nthreads
int([x]) -> integer int(x, base=10) -> integer
将数字或字符串转换为整数,如果没有给出参数将返回0。如果x是一个数字,返回x.__int__()。对于浮点数,会向零截断。
如果x不是数字或给定基数,则x必须是表示给定基数中的整数值的字符串、字节或字节数组实例。该值的前面可以有‘+’或‘-’,前后可以有空格。基数默认为10。有效基数为0和2-36。基数为0表示将字符串中的基数解释为整数值。>>> int(‘0b100’, base=0) 4
- predict(*args, **kwargs)[源代码]
使用模型预测分子的几何构型。
- 参数:
molecular_database (
mlatom.data.molecular_database
, optional) – 数据库中包含需要由模型预测其性质的分子。molecule (
mlatom.models.molecule
, optional) – 需要用模型预测其性质的分子对象。calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。
AIQM1
ml_model
!---------------------------------------------------------------------------!
! models: Module with models !
! Implementations by: Pavlo O. Dral, Fuchun Ge, Yi-Fan Hou, Yuxinxin Chen, !
! Peikun Zheng !
!---------------------------------------------------------------------------!
- class mlatom.models.ml_model[源代码]
Useful as a superclass for the ML models that need to be trained.
- train(molecular_database: molecular_database, property_to_learn: str | None = 'y', xyz_derivative_property_to_learn: str = None) None [源代码]
使用提供的分子数据库训练模型。
- 参数:
molecular_database (
mlatom.data.molecular_database
) – 用于训练模型的分子数据库。property_to_learn (str, optional) – 在模型训练中要学习的属性标签。
xyz_derivative_property_to_learn (str, optional) – 要学习的xyz导数属性的标签。
- predict(molecular_database: data.molecular_database = None, molecule: data.molecule = None, calculate_energy: bool = False, property_to_predict: str | None = 'estimated_y', calculate_energy_gradients: bool = False, xyz_derivative_property_to_predict: str | None = 'estimated_xyz_derivatives_y', calculate_hessian: bool = False, hessian_to_predict: str | None = 'estimated_hessian_y') None [源代码]
使用模型预测分子的几何构型。
- 参数:
molecular_database (
mlatom.data.molecular_database
, optional) – 数据库中包含需要由模型预测其性质的分子。molecule (
mlatom.models.molecule
, optional) – 需要用模型预测其性质的分子对象。calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。
property_to_predict (str, optional) – 待保存的预测属性的标签名称。
xyz_derivative_property_to_predict (str, optional) – 待保存的预测的xy导数属性的标签名称。
hessian_to_predict (str, optional) – 待保存的预测的Hessians的标签名称。
- dump(filename=None, format='json')[源代码]
Dumps model class object information in a json file (do not confused with saving the model itself, i.e., its parameters!).
- calculate_validation_loss(training_kwargs=None, prediction_kwargs=None, cv_splits_molecular_databases=None, calculate_CV_split_errors=False, subtraining_molecular_database=None, validation_molecular_database=None, validation_loss_function=None, validation_loss_function_kwargs={}, debug=False)[源代码]
Returns the validation loss for the given hyperparameters.
By default, the validation loss is RMSE evaluated as a geometric mean of scalar and vectorial properties, e.g., energies and gradients.
- 参数:
training_kwargs (dict, optional) – the kwargs to be passed to
yourmodel.train()
function.prediction_kwargs (dict, optional) – the kwargs to be passed to
yourmodel.predict()
function.cv_splits_molecular_databases (list, optional) – the list with cross-validation splits, each element is
molecular_database
.calculate_CV_split_errors (bool, optional) – requests to return the errors for each cross-validation split as a list in addtion to the aggregate cross-validation error.
subtraining_molecular_database (
molecular_database
, optional) – molecular database for sub-training to be passed toyourmodel.train()
function.validation_molecular_database (
molecular_database
, optional) – molecular database for validation to be passed toyourmodel.predict()
function.validation_loss_function (function, optional) – user-defined validation function.
validation_loss_function_kwargs (dict, optional) – kwargs for above
validation_loss_function
.
- optimize_hyperparameters(hyperparameters=None, training_kwargs=None, prediction_kwargs=None, cv_splits_molecular_databases=None, subtraining_molecular_database=None, validation_molecular_database=None, optimization_algorithm=None, optimization_algorithm_kwargs={}, maximum_evaluations=10000, validation_loss_function=None, validation_loss_function_kwargs={}, debug=False)[源代码]
Optimizes hyperparameters by minimizing the validation loss.
By default, the validation loss is RMSE evaluated as a geometric mean of scalar and vectorial properties, e.g., energies and gradients.
- 参数:
hyperparameters (list, required) – the list with strings - names of hyperparameters. Hyperparameters themselves must be in
youmodel.hyperparameters
defined with class instancehyperparameters
consisting ofhyperparameter
defining the optimization space.training_kwargs (dict, optional) – the kwargs to be passed to
yourmodel.train()
function.prediction_kwargs (dict, optional) – the kwargs to be passed to
yourmodel.predict()
function.cv_splits_molecular_databases (list, optional) – the list with cross-validation splits, each element is
molecular_database
.calculate_CV_split_errors (bool, optional) – requests to return the errors for each cross-validation split as a list in addtion to the aggregate cross-validation error.
subtraining_molecular_database (
molecular_database
, optional) – molecular database for sub-training to be passed toyourmodel.train()
function.validation_molecular_database (
molecular_database
, optional) – molecular database for validation to be passed toyourmodel.predict()
function.validation_loss_function (function, optional) – user-defined validation function.
validation_loss_function_kwargs (dict, optional) – kwargs for above
validation_loss_function
.optimization_algorithm (str, required) – optimization algorithm. No default, must be specified among: ‘grid’ (‘brute’), ‘TPE’, ‘Nelder-Mead’, ‘BFGS’, ‘L-BFGS-B’, ‘Powell’, ‘CG’, ‘Newton-CG’, ‘TNC’, ‘COBYLA’, ‘SLSQP’, ‘trust-constr’, ‘dogleg’, ‘trust-krylov’, ‘trust-exact’.
optimization_algorithm_kwargs (dict, optional) – kwargs to be passed to optimization algorithm, e.g.,
{'grid_size': 5}
(default 9 for the grid search).maximum_evaluations (int, optional) – maximum number of optimization evaluations (default: 10000) supported by all optimizers except for grid search.
Saves the final hyperparameters in
yourmodel.hyperparameters
adn validation loss inyourmodel.validation_loss
.
- class mlatom.models.hyperparameter(value: Any = None, optimization_space: str = 'linear', dtype: Callable | None = None, name: str = '', minval: Any = None, maxval: Any = None, step: Any = None, choices: Iterable[Any] = [], **kwargs)[源代码]
Class of hyperparameter object, containing data could be used in hyperparameter optimizations.
- 参数:
value (Any, optional) – The value of the hyperparameter.
optimization_space (str, optional) – Defines the space for hyperparameter. Currently supports
'linear'
, and'log'
.dtype (Callable, optional) – A callable object that forces the data type of value. Automatically choose one if set to
None
.
- update(new_hyperparameter: hyperparameter) None [源代码]
Update hyperparameter with data in another instance.
- 参数:
new_hyperparameter (
mlatom.models.hyperparamters
) – Whose data are to be applied to the current instance.
- class mlatom.models.hyperparameters(dict=None, /, **kwargs)[源代码]
Class for storing hyperparameters, values are auto-converted to
mlatom.models.hyperparameter
objects. Inherit from collections.UserDict.- Initiaion:
Initiate with a dictinoary or kwargs or both.
e.g.:
hyperparamters({'a': 1.0}, b=hyperparameter(value=2, minval=0, maxval=4))
- copy(keys: Iterable[str] | None = None) hyperparameters [源代码]
Returns a copy of current instance.
- 参数:
keys (Iterable[str], optional) – If keys provided, only the hyperparameters selected by keys will be copied, instead of all hyperparameters.
- 返回:
a new instance copied from current one.
- 返回类型:
mlatom.models.hyperparamters
- class mlatom.models.kreg(model_file: str | None = None, ml_program: str = 'KREG_API', equilibrium_molecule: molecule | None = None, prior: float = 0, nthreads: int | None = None, hyperparameters: Dict[str, Any] | hyperparameters = {})[源代码]
Create a KREG model object.
- 参数:
model_file (str, optional) – 应该将模型转储或加载到其中的文件的名称。
ml_program (str, optional) – 指定要使用的ML程序。可用选项:
'KREG_API'
,'MLatomF
equilibrium_molecule (
mlatom.data.molecule
| None) – 指定要用于生成RE描述符的平衡几何构型。如果设置为None
,将选择能量/值最低的几何构型。prior (default - None) – 先验可以是 ‘mean’, None(0.0)或任何浮点数。
hyperparameters (Dict[str, Any] |
mlatom.models.hyperparameters
, optional) – 使用提供更新模型的超参数。
- train(molecular_database=None, property_to_learn=None, xyz_derivative_property_to_learn=None, save_model=True, invert_matrix=False, matrix_decomposition=None, prior=None, hyperparameters: Dict[str, Any] | hyperparameters = {})[源代码]
Train the KREG model with molecular database provided.
- 参数:
molecular_database (
mlatom.data.molecular_database
) – 用于训练模型的分子数据库。property_to_learn (str, optional) – 在模型训练中要学习的属性标签。
xyz_derivative_property_to_learn (str, optional) – 要学习的xyz导数属性的标签。
prior (str or float or int, optional) – default zero prior. It can also be ‘mean’ and any user-defined number.
- predict(molecular_database=None, molecule=None, calculate_energy=False, calculate_energy_gradients=False, calculate_hessian=False, property_to_predict=None, xyz_derivative_property_to_predict=None, hessian_to_predict=None)[源代码]
使用模型预测分子的几何构型。
- 参数:
molecular_database (
mlatom.data.molecular_database
, optional) – 数据库中包含需要由模型预测其性质的分子。molecule (
mlatom.models.molecule
, optional) – 需要用模型预测其性质的分子对象。calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。
property_to_predict (str, optional) – 待保存的预测属性的标签名称。
xyz_derivative_property_to_predict (str, optional) – 待保存的预测的xy导数属性的标签名称。
hessian_to_predict (str, optional) – 待保存的预测的Hessians的标签名称。
model_tree_node
!---------------------------------------------------------------------------!
! models: Module with models !
! Implementations by: Pavlo O. Dral, Fuchun Ge, Yi-Fan Hou, Yuxinxin Chen, !
! Peikun Zheng !
!---------------------------------------------------------------------------!
- class mlatom.models.model_tree_node(name=None, parent=None, children=None, operator=None, model=None)[源代码]
创建一个模型树节点对象。
- 参数:
name (str) – 为模型树节点设定的名字。
parent – 模型树节点的父本。
children – 模型树节点的子体。
operator – 指定预测时要进行的操作。
- predict(**kwargs)[源代码]
使用模型预测分子的几何构型。
- 参数:
molecular_database (
mlatom.data.molecular_database
, optional) – 数据库中包含需要由模型预测其性质的分子。molecule (
mlatom.models.molecule
, optional) – 需要用模型预测其性质的分子对象。calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。
Interfaces
第三方软件接口。
TorchANI
DeepMD-kit
GAP/QUIP
PhysNet
MACE
sGDML
Gaussian
Orca
DFT-D4
PySCF
Sparrow
xTB
MNDO
Simulations
!---------------------------------------------------------------------------!
! simulations: Module for simulations !
! Implementations by: Pavlo O. Dral !
!---------------------------------------------------------------------------!
Geomopt, freq, DMC
- class mlatom.simulations.optimize_geometry(model=None, model_predict_kwargs={}, initial_molecule=None, molecule=None, ts=False, program=None, optimization_algorithm=None, maximum_number_of_steps=None, convergence_criterion_for_forces=None, working_directory=None, print_properties=None, dump_trajectory_interval=None, filename=None, format='json', **kwargs)[源代码]
几何优化。
- 参数:
model (
mlatom.models.model
ormlatom.models.methods
) – 任何可提供能量和力的模型或方法。initial_molecule (
mlatom.data.molecule
) – 待优化的分子对象。ts (bool, optional) – 是否进行过渡状态搜索。目前仅支持Gaussian、ASE或geometric程序。
program (str, optional) – 用于几何优化的程序。目前支持Gaussian、ASE、scipy和PySCF。
optimization_algorithm (str, optional) – ASE中使用的优化算法。默认值:LBFGS (ts=False), dimer (ts=False)。
maximum_number_of_steps (int, optional) – ASE、SciPy和geometric的最大步数。默认值:200。
convergence_criterion_for_forces (float, optional) – ASE中的强制收敛准则。默认值:0.02 eV/Angstroms。
working_directory (str, optional) – 工作目录。默认值为‘.’, 即当前目录。
constraints (dict, optional) – 几何优化的约束。目前只适用于program=ASE以及program=geometric。对于program=ASE,约束条件遵循与ASE相同的规则:
constraints={'bonds':[[目标,[索引0,索引1]], ...],'angles':[[目标,[索引0,索引1,索引2]], ...],'dihedrals':[[目标,[索引0,索引1,索引2,索引3]], ...]}
(请参阅 ASE中的FixInternals类 以获取更多信息)。对于program=geometric,需要提供约束文件的名称,约束文件的格式请参考 约束优化 。print_properties (None or str, optional) – 待打印的性质。默认:None。可选项为’all’。
dump_trajectory_interval (int, optional) – 转储轨迹的时间间隔(1)。设置为
None
可禁用转储(默认)。filename (str, optional) – 保存转储轨迹的文件。
format (str, optional) – 转储轨迹的保存格式。
示例:
# Initialize molecule mol = ml.data.molecule() mol.read_from_xyz_file(filename='ethanol.xyz') # Initialize methods aiqm1 = ml.models.methods(method='AIQM1', qm_program='MNDO') # Run geometry optimization geomopt = ml.simulations.optimize_geometry(model = aiqm1, initial_molecule=mol, program = 'ASE') # Get the optimized geometry, energy, and gradient optmol = geomopt.optimized_molecule geo = optmol.get_xyz_coordinates() energy = optmol.energy gradient = optmol.get_energy_gradients()
- class mlatom.simulations.freq(model=None, model_predict_kwargs={}, molecule=None, program=None, ir=False, raman=False, normal_mode_normalization='mass deweighted normalized', anharmonic=False, anharmonic_kwargs={}, working_directory=None)[源代码]
频率分析。
- 参数:
model (
mlatom.models.model
ormlatom.models.methods
) – 任何能够提供能量、力和Hessian的模型或方法。molecule (
mlatom.data.molecule
) – 具有必要信息的分子对象。program (str, optional) – 用于分析频率的程序。支持pyscf或Gaussian。
normal_mode_normalization (str, optional) – 简正模输出方案。 它应该是以下值之一:质量加权归一化、质量减权非归一化和质量减权归一化(默认值)。
anharmonic (bool) – 是否进行非谐波频率计算。
working_directory (str, optional) – 工作目录。默认值为‘.’, 即当前目录。
示例:
# Initialize molecule mol = ml.data.molecule() mol.read_from_xyz_file(filename='ethanol.xyz') # Initialize methods aiqm1 = ml.models.methods(method='AIQM1', qm_program='MNDO') # Run frequence analysis ml.simulations.freq(model=aiqm1, molecule=mol, program='ASE') # Get frequencies frequencies = mol.frequencies
- class mlatom.simulations.thermochemistry(model=None, molecule=None, program=None, ir=False, raman=False, normal_mode_normalization='mass deweighted normalized')[源代码]
热化学性质计算。
- 参数:
model (
mlatom.models.model
ormlatom.models.methods
) – 任何能够提供能量、力和Hessian的模型或方法。molecule (
mlatom.data.molecule
) – 具有必要信息的分子对象。program (str) – 用于热化学性质计算的程序。目前支持Gaussian和ASE。
normal_mode_normalization (str, optional) – 简正模输出方案。 它应该是以下值之一:质量加权归一化、质量减权非归一化和质量减权非归一化(默认值)。
# Initialize molecule mol = ml.data.molecule() mol.read_from_xyz_file(filename='ethanol.xyz') # Initialize methods aiqm1 = ml.models.methods(method='AIQM1', qm_program='MNDO') # Run thermochemical properties calculation ml.simulations.thermochemistry(model=aiqm1, molecule=mol, program='ASE') # Get ZPE and heat of formation ZPE = mol.ZPE Hof = mol.DeltaHf298
计算结束后,得到
molecule
对象的热化学性质:ZPE
: 零点能DeltaE2U
: 能量的热校正 (仅支持使用Gaussian)DeltaE2H
: 焓的热校正 (仅支持使用Gaussian)DeltaE2G
: 吉布斯自由能的热校正 (仅支持使用Gaussian)U0
: 0K时的内能H0
: 0K时的焓U
: 内能 (仅支持使用Gaussian)H
: 焓G
: 吉布斯自由能S
: 熵 (仅支持使用Gaussian)atomization_energy_0K
ZPE_exclusive_atomization_energy_0K
DeltaHf298
: 298 K时的生成热
- class mlatom.simulations.dmc(model: model, initial_molecule: molecule = None, initial_molecular_database: molecular_database = None, energy_scaling_factor: float = 1.0)[源代码]
使用 PyVibDMC 运行扩散蒙特卡罗模拟。
- 参数:
model (
mlatom.models.model
) – 势能面模型。单位应为Hartree,否则需要设置正确的energy_scaling_factor
。initial_molecule (
mlatom.data.molecule
) – 步行者的初始几何形状。通常应提供能量最小几何。默认情况下,每个坐标将被缩放1.01,使其稍微扭曲。energy_scaling_factor (float, optional) – 被乘到模型的能量预测中的因子
- run(run_dir: str = 'DMC', weighting: str = 'discrete', number_of_walkers: int = 5000, number_of_timesteps: int = 10000, equilibration_steps: int = 500, dump_trajectory_interval: int = 500, dump_wavefunction_interval: int = 1000, descendant_weighting_steps: int = 300, time_step: float = 0.024188843265857, initialize: bool = False)[源代码]
运行DMC模拟
- 参数:
run_dir (str) – 用于存放输出文件的文件夹。
weighting (str) –
'discrete'
或者'continuous'
。'continuous'
保持系综大小不变。number_of_walkers (int) – 探测势能面时的几何构型的数目
number_of_timesteps (int) – 模拟运行的步数
equilibration_steps (int) – 平衡的步数
dump_trajectory_interval (int) – 丢弃步行轨迹的间隔
dump_wavefunction_interval (int) – 收集波函数的间隔
descendant_weighting_steps (int) – 每个波函数的后代加权的时间步数
time_step (float) – 每个时步的长度,单位飞秒(fs)
- mlatom.simulations.numerical_gradients(molecule, model, displacement=1e-05, model_kwargs={}, return_molecular_database=False, nthreads=None)[源代码]
Calculate numerical gradients. Two-point numerical differentiation is used and the required single-point calculations are run in parallel.
- 参数:
molecule (
mlatom.data.molecule
) – the molecule object.model (
mlatom.models.model
ormlatom.models.methods
) – any model or method which provides energies (takes molecule as an argument).displacement (float, optional) – displacement of nuclear coordinates in Angstrom (default: 1e-5).
model_kwargs (dict, optional) – kwargs to be passed to model (except for molecule).
return_molecular_database (bool, optional) – whether to return the
mlatom.data.molecular_database
with the displaced geometries and energies (default: False).nthreads (int, optional) – number of threads (default: None, using all threads it can find).
- mlatom.simulations.numerical_hessian(molecule, model, displacement=0.000529167, displacement4grads=1e-05, model_kwargs={})[源代码]
Calculate numerical Hessians. Two-point numerical differentiation is used and the required single-point calculations are run in parallel.
- 参数:
molecule (
mlatom.data.molecule
) – the molecule object.model (
mlatom.models.model
ormlatom.models.methods
) – any model or method which provides energies (takes molecule as an argument).displacement (float, optional) – displacement of nuclear coordinates in Angstrom (default: 5.29167e-4).
displacement4grads (float, optional) – displacement of nuclear coordinates in Angstrom (default: 1e-5) when calculating gradients.
model_kwargs (dict, optional) – kwargs to be passed to model (except for molecule).
Initial conditions
- mlatom.initial_conditions.generate_initial_conditions(molecule=None, generation_method=None, number_of_initial_conditions=1, file_with_initial_xyz_coordinates=None, file_with_initial_xyz_velocities=None, eliminate_angular_momentum=True, degrees_of_freedom=None, initial_temperature=None, initial_kinetic_energy=None, use_hessian=False, reaction_coordinate_momentum=True, filter_by_energy_window=False, window_filter_kwargs={}, random_seed=None)[源代码]
生成初始条件。
- 参数:
molecule (
data.molecule
) – 带有必要信息的分子。generation_method (str) – 初始条件生成方法,参见下面的表格
number_of_initial_conditions (int) – 生成初始条件的个数,默认为1。
file_with_initial_xyz_coordinates (str) – 初始xyz坐标的文件,仅对
generation_method='user-defined'
有效file_with_initial_xyz_velocities (str) – 初始xyz速度的文件,仅对
generation_method='user-defined'
有效eliminate_angular_momentum (bool) – 从速度中去除角动量,对
generation_method='random'
和generation_method='wigner'
有效degrees_of_freedom (int) – 分子的自由度,默认情况下去除了平动和转动自由度。 当该值设置为负值时,运行时将从3N中减去一些值,N为分子中的原子个数
initial_temperature (float) – 初始温度以开尔文为单位,控制随机初始速度。
initial_kinetic_energy (float) – 初始能量以Hartree为单位,控制随机初始速度。
random_seed (int) – numpy随机数生成器的随机种子(除非每次都想获得相同的结果,否则不要使用)
filter_by_energy_window (bool) – 按激发能窗口滤波
window_filter_kwargs (dict) – 用于过滤能量窗口的关键字参数,请参阅下表
生成方法
说明
'user-defined'
(默认)使用用户定义的初始条件
'random'
产生随机速度
'maxwell-boltzmann'
根据麦克斯韦-玻尔兹曼分布随机生成初速度
'wigner'
在 Newton-X 实现中使用Wigner采样
window_filter_kwargs
说明
model
能计算激发能和振子强度的模型或方法
model_predict_kwargs
上述模型的关键字参数,通常是
nstates
,指定要计算多少个状态target_excitation_energy (float)
单位为 eV
window_half_width (float)
单位为 eV
random_seed (int)
numpy随机数生成器的随机种子(除非每次都想获得相同的结果,否则不要使用)
- 返回:
初始条件为
number_of_initial_conditions
的分子数据库(ml.data.molecular_database
)
示例:
# Use user-defined initial conditions init_cond_db = ml.generate_initial_conditions(molecule = mol, generation_method = 'user-defined', file_with_initial_xyz_coordinates = 'ethanol.xyz', file_with_initial_xyz_velocities = 'ethanol.vxyz', number_of_initial_conditions = 1) # Generate random velocities init_cond_db = ml.generate_initial_conditions(molecule = mol, generation_method = 'random', initial_temperature = 300, number_of_initial_conditions = 1) # Use Wigner sampling init_cond_db = ml.generate_initial_conditions(molecule = mol, generation_method = 'wigner', number_of_initial_conditions = 1) # Sample with filtering by excitation energy window. Requires the model for calculating excitation energies and oscillator strengths. model = ml.models.methods(method='AIQM1') model_predict_kwargs={'nstates':9} # requests calculation of 9 electronic states window_filter_kwargs={'model':model, 'model_predict_kwargs':model_predict_kwargs, 'target_excitation_energy':5.7, # eV 'window_half_width':0.1, # eV} init_cond_db = ml.generate_initial_conditions(molecule=mol, generation_method='wigner', number_of_initial_conditions=5, initial_temperature=0, random_seed=0, use_hessian=False, filter_by_energy_window=True, window_filter_kwargs=window_filter_kwargs)
备注
ml.models.methods.predict(molecule=mol,calculate_hessian=True)
获取Hessian矩阵。
Molecular dynamics
!---------------------------------------------------------------------------!
! md: Module for molecular dynamics !
! Implementations by: Yi-Fan Hou & Pavlo O. Dral !
!---------------------------------------------------------------------------!
- class mlatom.md.md(model=None, model_predict_kwargs={}, molecule_with_initial_conditions=None, molecule=None, ensemble='NVE', thermostat=None, time_step=0.1, maximum_propagation_time=1000, dump_trajectory_interval=None, filename=None, format='h5md', stop_function=None, stop_function_kwargs=None)[源代码]
Molecular dynamics
- 参数:
model (
mlatom.models.model
ormlatom.models.methods
) – 任何可提供能量和力的模型或方法。molecule_with_initial_conditions (
data.molecule
) – 有初始条件的分子。ensemble (str, optional) – 使用哪种组合。
thermostat (
thermostat.Thermostat
) – 应用于系统的恒温器。time_step (float) – 以飞秒为单位的时间步长。
maximum_propagation_time (float) – 最大模拟时间(以飞秒为单位)。
dump_trajectory_interval (int, optional) – 转储轨迹的时间间隔。设置为
None
可禁用转储。filename (str, optional) – 保存转储轨迹的文件。
format (str, optional) – 转储轨迹的保存格式。
stop_function (any, optional) – 用户定义的函数,在
maximum_propagation_time
之前停止MD模拟。stop_function_kwargs (Dict, optional) – Kwargs of
stop_function
系综
说明
'NVE'
(default)微正则系综(NVE)
'NVT'
正则系综(NVT)
恒温器
说明
ml.md.Andersen_thermostat
Andersen恒温器
ml.md.Nose_Hoover_thermostat
Hose-Hoover恒温器
None
(默认)未应用恒温器
For theoretical details, see and cite original paper.
示例:
# Initialize molecule mol = ml.data.molecule() mol.read_from_xyz_file(filename='ethanol.xyz') # Initialize methods aiqm1 = ml.models.methods(method='AIQM1') # User-defined initial condition init_cond_db = ml.generate_initial_conditions(molecule = mol, generation_method = 'user-defined', file_with_initial_xyz_coordinates = 'ethanol.xyz', file_with_initial_xyz_velocities = 'ethanol.vxyz') init_mol = init_cond_db.molecules[0] # Initialize thermostat nose_hoover = ml.md.Nose_Hoover_thermostat(temperature=300,molecule=init_mol,degrees_of_freedom=-6) # Run dynamics dyn = ml.md(model=aiqm1, molecule_with_initial_conditions = init_mol, ensemble='NVT', thermostat=nose_hoover, time_step=0.5, maximum_propagation_time = 10.0) # Dump trajectory traj = dyn.molecular_trajectory traj.dump(filename='traj', format='plain_text') traj.dump(filename='traj.h5', format='h5md')
备注
轨迹保存在
ml.md.molecular_trajectory
中,这是一个ml.data.molecular_trajectory
类警告
在MLatom中,能量单位为Hartree,距离单位为Angstrom。请确保模型中的单位一致。
- class Andersen_thermostat(**kwargs)
Andersen恒温器对象
- 参数:
gamma (float) – fs^{-1}中的碰撞率,默认为0.2
temperature (float) – 系统温度以开尔文为单位,默认为300
- class Nose_Hoover_thermostat(**kwargs)
Nose-Hoover恒温器对象
- 参数:
nose_hoover_chain_length (int) – Nose Hoover链长度,应为正数,默认为3
multiple_time_step (int) – 多个时间步长,应为正数,默认为3
number_of_yoshida_suzuki_steps (int) – Yoshida Suzuki步数,可以是(1,3,5,7)中的任意一个,默认为7
nose_hoover_chain_frequency (float) – 以 fs^{-1} 为单位的 Nose-Hoover 链频率,默认为0.0625,应与要平衡的频率相当
temperature (float) – 系统温度以开尔文为单位,默认为300
molecule (
data.molecule
) – 要平衡的分子degrees_of_freedom – 系统的自由度
Surface-hopping dynamics
!---------------------------------------------------------------------------!
! namd: Module for nonadiabatic molecular dynamics !
! Implementations by: Lina Zhang & Pavlo O. Dral !
!---------------------------------------------------------------------------!
- class mlatom.namd.surface_hopping_md(model=None, model_predict_kwargs={}, molecule_with_initial_conditions=None, molecule=None, ensemble='NVE', thermostat=None, time_step=0.1, maximum_propagation_time=100, dump_trajectory_interval=None, filename=None, format='h5md', stop_function=None, stop_function_kwargs=None, hopping_algorithm='LZBL', nstates=None, initial_state=None, random_seed=<function generate_random_seed>, prevent_back_hop=False, reduce_memory_usage=False, rescale_velocity_direction='along velocities', reduce_kinetic_energy=False)[源代码]
面跳跃分子动力学
- 参数:
model (
mlatom.models.model
ormlatom.models.methods
) – 任何可提供能量和力的模型或方法。model_predict_kwargs (Dict, optional) – 模型预测的关键字参数
molecule_with_initial_conditions (
data.molecule
) – 有初始条件的分子。molecule (
data.molecule
) – 工作原理与molecule_with_initial_conditions相同ensemble (str, optional) – 使用哪种组合。
thermostat (
thermostat.Thermostat
) – 应用于系统的恒温器。time_step (float) – 以飞秒为单位的时间步长。
maximum_propagation_time (float) – 最大模拟时间(以飞秒为单位)。
dump_trajectory_interval (int, optional) – 转储轨迹的时间间隔。设置为
None
可禁用转储。filename (str, optional) – 保存转储轨迹的文件。
format (str, optional) – 转储轨迹的保存格式。
stop_function (any, optional) – 用户定义的函数,在
maximum_propagation_time
之前停止MD模拟。stop_function_kwargs (Dict, optional) – Kwargs of
stop_function
hopping_algorithm (str, optional) – 面跳跃算法
nstates (int) – 态数目
initial_state (int) – 初态
random_seed (int) – 随机种子
prevent_back_hop (bool, optional) – 是否阻止回跃
rescale_velocity_direction (string, optional) – 重新调整速度方向
reduce_kinetic_energy (bool, optional) – 是否降低动能
系综
说明
'NVE'
(default)微正则系综(NVE)
'NVT'
正则系综(NVT)
恒温器
说明
ml.md.Andersen_thermostat
Andersen恒温器
ml.md.Nose_Hoover_thermostat
Hose-Hoover恒温器
None
(默认)未应用恒温器
有关理论细节,请参阅并引用原始论文(待提交)。
Lina Zhang, Sebastian Pios, Mikołaj Martyka, Fuchun Ge, Yi-Fan Hou, Yuxinxin Chen, Joanna Jankowska, Lipeng Chen, Mario Barbatti, Pavlo O. Dral. MLatom software ecosystem for surface hopping dynamics in Python with quantum mechanical and machine learning methods. 2024, to be submitted. Preprint on arXiv: https://arxiv.org/abs/2404.06189.
示例:
# Propagate multiple LZBL surface-hopping trajectories in parallel # .. setup dynamics calculations namd_kwargs = { 'model': aiqm1, 'time_step': 0.25, 'maximum_propagation_time': 5, 'hopping_algorithm': 'LZBL', 'nstates': 3, 'initial_state': 2, } # .. run trajectories in parallel dyns = ml.simulations.run_in_parallel(molecular_database=init_cond_db, task=ml.namd.surface_hopping_md, task_kwargs=namd_kwargs, create_and_keep_temp_directories=True) trajs = [d.molecular_trajectory for d in dyns] # Dump the trajectories itraj=0 for traj in trajs: itraj+=1 traj.dump(filename=f"traj{itraj}.h5",format='h5md') # Analyze the result of trajectories and make the population plot ml.namd.analyze_trajs(trajectories=trajs, maximum_propagation_time=5) ml.namd.plot_population(trajectories=trajs, time_step=0.25, max_propagation_time=5, nstates=3, filename=f'pop.png', pop_filename='pop.txt')
备注
轨迹保存在
ml.md.molecular_trajectory
中,这是一个ml.data.molecular_trajectory
类警告
在MLatom中,能量单位为Hartree,距离单位为Angstrom。请确保模型中的单位一致。
Spectra
!---------------------------------------------------------------------------!
! spectra: Module for working with spectra !
! Implementations by: Yi-Fan Hou, Fuchun Ge, Bao-Xin Xue, Pavlo O. Dral !
!---------------------------------------------------------------------------!
- class mlatom.spectra.uvvis(x=None, y=None, wavelengths_nm=None, energies_eV=None, molar_absorbance=None, cross_section=None, meta_data=None)[源代码]
UV/Vis absorption spectrum class
- 参数:
x (float, np.ndarray) – range of spectra (e.g., wavelength in nm, recommended, or energies in eV)
y (float, np.ndarray) – user-provided intensities (e.g., molar absorpbance, recommended, or cross section)
done (It is better to provide spectrum information explicitly so that the correct conversions to different units are)
wavelengths_nm (float, np.ndarray) – range of wavelengths in nm
energies_eV (float, np.ndarray) – range of energies in eV
molar_absorbance (float, np.ndarray) – molar absorbance (extinction coefficients) in M^-1 cm^-1
cross_section (float, np.ndarray) – cross section in A^2/molecule
Also
meta-data (the user is encouraged to provide the)
meta_data (str) – meta data such as solvent, references, etc.
示例
- uvvis = mlatom.spectra.uvvis(
wavelengths_nm = np.array(…), molar_absorbance = np.array(…), meta_data = ‘solvent: benzene, reference: DOI…’ )
# spectral properties can be accessed as: # uvvis.x is equivalent to what is provided by the user, e.g., wavelengths_nm or energies_eV # uvvis.y is equivalent to what is provided by the user, e.g., molar_absorbance or cross_section # wavelength range (float, np.ndarray) in nm uvvis.wavelengths_nm # molar absorbance (extinction coefficients) (float, np.ndarray) in M^-1 cm^-1 uvvis.molar_absorbance # energies corresponding to the wavelength range (float, np.ndarray), in eV uvvis.energies_eV # absorption cross-section (float, np.ndarray) in A^2/molecule uvvis.cross_section
- classmethod spc(molecule=None, band_width=0.3, shift=0.0, refractive_index=1.0)[源代码]
Single-point convolution (SPC) approach for obtaining UV/vis spectrum via calculating the exctinction coefficient (and absorption cross section) from the single-point excited-state simulations for a single geometry Implementation follows http://doi.org/10.1007/s00894-020-04355-y
- 参数:
molecule (
mlatom.data.molecule
) – molecule object with excitation_energies (in Hartree, not eV!) and oscillator_strengthswavelengths_nm (float, np.ndarray) – range of wavelengths in nm (default: np.arange(400, 800))
band_width (float) – band width in eV (default: 0.3 eV)
shift (float) – shift of excitation energies, eV (default: 0 eV)
refractive_index (float) – refractive index (default: 1)
示例
- uvvis = mlatom.spectra.uvvis.spc(
molecule=mol, wavelengths_nm=np.arange(100, 200), band_width=0.3)
# spectral properties can be accessed as: # uvvis.x is equivalent to uvvis.wavelengths_nm # uvvis.y is equivalent to uvvis.molar_absorbance # wavelength range (float, np.ndarray) in nm uvvis.wavelengths_nm # molar absorbance (extinction coefficients) (float, np.ndarray) in M^-1 cm^-1 uvvis.molar_absorbance # energies corresponding to the wavelength range (float, np.ndarray), in eV uvvis.energies_eV # absorption cross-section (float, np.ndarray) in A^2/molecule uvvis.cross_section # quick plot uvvis.plot(filename=’uvvis.png’)
- classmethod spc_broadening_func(DeltaE, ff, wavelength_range, band_width, refractive_index=1, shift=0.0)[源代码]
Spectrum convolution function
- 参数:
band_width (float) – width of band
DeltaE (float) – vertical excitation energy, eV
ff (float) – oscillator strength
wavelength_range (float, np.ndarray) – range of wavelengths
refractive_index (float) – refractive index
shift (float) – peak shift
- 返回:
extinction coefficients in M^-1 cm^-1
- 返回类型:
(float, np.ndarray)
- classmethod nea(molecular_database=None, wavelengths_nm=None, broadening_width=0.05)[源代码]
Nuclear ensemble approach (NEA) for obtaining UV/vis spectrum. Implementation follows Theor. Chem. Acc. 2012, 131, 1237.
- 参数:
molecular_database (
mlatom.data.molecular_database
) – molecular_database object with molecules containing excitation_energies (in Hartree, not eV!) and oscillator_strengthswavelengths_nm (float, np.ndarray) – range of wavelengths in nm (default: determined automatically)
broadening_width (float) – broadening factor in eV (default: 0.05 eV)
示例
- uvvis = mlatom.spectra.uvvis.nea(molecular_database=db,
wavelengths_nm=wavelengths_nm, broadening_width=0.02)
# spectral properties can be accessed as: # uvvis.x is equivalent to uvvis.wavelengths_nm # uvvis.y is equivalent to uvvis.molar_absorbance # wavelength range (float, np.ndarray) in nm uvvis.wavelengths_nm # molar absorbance (extinction coefficients) (float, np.ndarray) in M^-1 cm^-1 uvvis.molar_absorbance # energies corresponding to the wavelength range (float, np.ndarray), in eV uvvis.energies_eV # absorption cross-section (float, np.ndarray) in A^2/molecule uvvis.cross_section # quick plot uvvis.plot(filename=’uvvis.png’)
Active learning
Initial data sampling
initdata_sampler
can be:
'wigner'
'harmonic-quantum-boltzmann'
User-defined ML models
The user has the flexibility to create their own ML model class for AL. Minimum requirements to such a class:
it must have the usual
train
andpredict
functions.the
train
function must acceptmolecular_database
parameter.the
predict
function must acceptmolecule
and/ormolecular_database
parameters.
The realistic, fully fledged example of how to create a usable ML model class is below (it is what we use in al routine!):
class my_model():
def __init__(self, al_info = {}, model_file=None, device=None, verbose=False):
import torch
if device is None:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
if model_file is None:
if 'mlmodel_file' in al_info.keys():
self.model_file = al_info['mlmodel_file']
else:
self.model_file = 'mlmodel'
al_info['mlmodel_file'] = self.model_file
else:
self.model_file = model_file
al_info['mlmodel_file'] = self.model_file
if 'main_mlmodel_file' in al_info.keys():
main_mlmodel_file = al_info['main_mlmodel_file']
else:
main_mlmodel_file = f'{self.model_file}.pt'
al_info['main_mlmodel_file'] = main_mlmodel_file
if 'aux_mlmodel_file' in al_info.keys():
aux_mlmodel_file = al_info['aux_mlmodel_file']
else:
aux_mlmodel_file = f'aux_{self.model_file}.pt'
al_info['aux_mlmodel_file'] = aux_mlmodel_file
self.device = device
self.verbose = verbose
self.main_model = ml.models.ani(model_file=main_mlmodel_file,device=device,verbose=verbose)
self.aux_model = ml.models.ani(model_file=aux_mlmodel_file,device=device,verbose=verbose)
def train(self, molecular_database=None, al_info={}):
if 'working_directory' in al_info.keys():
workdir = al_info['working_directory']
self.main_model.model_file = f'{workdir}/{self.model_file}.pt'
self.aux_model.model_file = f'{workdir}/aux_{self.model_file}.pt'
validation_set_fraction = 0.1
[subtraindb, valdb] = molecular_database.split(number_of_splits=2, fraction_of_points_in_splits=[1-validation_set_fraction, validation_set_fraction], sampling='random')
# train the model on energies and gradients
self.main_model = ml.models.ani(model_file=self.main_model.model_file,device=self.device,verbose=self.verbose)
self.main_model.train(molecular_database=subtraindb,validation_molecular_database=valdb,property_to_learn='energy',xyz_derivative_property_to_learn='energy_gradients')
# train the auxiliary model only on energies
self.aux_model = ml.models.ani(model_file=self.aux_main_model.model_file,device=self.device,verbose=self.verbose)
self.aux_model.train(molecular_database=subtraindb,validation_molecular_database=valdb,property_to_learn='energy')
if not 'uq_threshold' in al_info.keys():
self.predict(molecular_database=valdb)
uqs = valdb.get_property('uq')
al_info['uq_threshold'] = np.median(uqs) + 3*stats.calc_median_absolute_deviation(uqs)
self.uq_threshold = al_info['uq_threshold']
# if the models were trained successfully, let's update al info where we can find them
al_info['main_mlmodel_file'] = self.main_model.model_file
al_info['aux_mlmodel_file'] = self.aux_model.model_file
def predict(self, molecule=None, molecular_database=None):
# predict energies and gradients with the main model
self.main_model.predict(molecule=molecule, molecular_database=molecular_database,property_to_predict='energy',xyz_derivative_property_to_predict='energy_gradients')
# predict energies with the auxiliary model
self.aux_model.predict(molecule=molecule, molecular_database=molecular_database,property_to_predict='aux_energy')
# calculate uncertainties
moldb = molecular_database
if moldb is None:
moldb = ml.molecular_database()
for mol in moldb:
mol.uq = abs(mol.energy - mol.aux_energy)
if mol.uq > self.uq_threshold:
mol.uncertain = True
else:
mol.uncertain = False
# This are useful in some internal al routines, e.g., when we want to make predictions in parallel (if nthreads is not set properly, it may slow down al significantly!)
@property
def nthreads(self):
return self.main_model.nthreads
@nthreads.setter
def nthreads(self, value):
self.main_model.nthreads = value
self.aux_model.nthreads = value
ml.al(
...
ml_model = my_model,
# do not use my_model(...), if you want to pass any arguments, use ml_model_kwargs:
ml_model_kwargs = {...}, # 'al_info' is unnecessary to include, it will be added automatically. If you supply 'al_info' key, it will overwrite the default one so use if you know what you are doing.
...
)
As you can see, it is helpful (but not required) if the __init__
and train
functions of the ML model class also accept the al_info
parameter which can be used to pass information during active learning from one routine to another.
Sampler
Here is a realistic example of the sampler function used in the physics-informed active learning:
def my_sampler(al_info={}, ml_model=None, initcond_sampler=None, initcond_sampler_kwargs={}, maximum_propagation_time=1000, time_step=0.1, ensemble='NVE', thermostat=None, dump_trajs=False, dump_trajectory_interval=None, stop_function=None, batch_parallelization=True):
moldb2label = ml.data.molecular_database()
# generate initial conditions
if type(initcond_sampler) == str:
if initcond_sampler.casefold() in ['wigner', 'harmonic-quantum-boltzmann']:
initcond_sampler = ml.generate_initial_conditions
initcond_sampler_kwargs['generation_method'] = initcond_sampler
import inspect
args, varargs, varkw, defaults = inspect.getargspec(initcond_sampler)
# Do we need al_info below?
if 'al_info' in args:
initial_molecular_database = initcond_sampler(al_info=al_info, **initcond_sampler_kwargs)
else:
initial_molecular_database = initcond_sampler(**initcond_sampler_kwargs)
# run MD in parallel to collect uncertain points
if batch_parallelization: # Faster way to propagate many trajs with ML
dyn = ml.md_parallel(model=ml_model,
molecular_database=initial_molecular_database,
ensemble=ensemble,
thermostat=thermostat,
time_step=time_step,
maximum_propagation_time=maximum_propagation_time,
dump_trajectory_interval=dump_trajectory_interval,
stop_function=stop_function)
trajs = dyn.molecular_trajectory
for itraj in range(len(trajs.steps[0])):
print(f"Trajectory {itraj} number of steps: {trajs.traj_len[itraj]}")
if trajs.steps[trajs.traj_len[itraj]][itraj].uncertain:
print(f'Adding molecule from trajectory {itraj} at time {trajs.traj_len[itraj]*time_step} fs')
moldb2label.molecules.append(trajs.steps[trajs.traj_len[itraj]][itraj])
# Dump traj
if dump_trajs:
import os
traj = ml.data.molecular_trajectory()
for istep in range(trajs.traj_len[itraj]+1):
step = ml.data.molecular_trajectory_step()
step.step = istep
step.time = istep * time_step
step.molecule = trajs.steps[istep][itraj]
traj.steps.append(step)
if 'working_directory' in al_info.keys():
dirname = f'{al_info['working_directory']}/trajs'
else:
dirname = 'trajs'
if not os.path.exists(dirname):
os.makedirs(dirname)
traj.dump(f"{dirname}/traj{itraj}.h5",format='h5md')
else:
md_kwargs = {
'molecular_database': initial_molecular_database,
'model': ml_model,
'time_step': time_step,
'maximum_propagation_time': maximum_propagation_time,
'ensemble': ensemble,
'thermostat': thermostat,
'dump_trajectory_interval': dump_trajectory_interval,
'stop_function': stop_function
}
dyns = ml.simulations.run_in_parallel(molecular_database=initial_molecular_database,
task=ml.md,
task_kwargs=md_kwargs,
create_and_keep_temp_directories=False)
trajs = [d.molecular_trajectory for d in dyns]
itraj=0
for traj in trajs:
itraj+=1
print(f"Trajectory {itraj} number of steps: {len(traj.steps)}")
if traj.steps[-1].molecule.uncertain:
print('Adding molecule from trajectory %d at time %.2f fs' % (itraj, traj.steps[-1].time))
moldb2label.molecules.append(traj.steps[-1].molecule)
# Dump traj
if dump_trajs:
import os
if 'working_directory' in al_info.keys():
dirname = f'{al_info['working_directory']}/trajs'
else:
dirname = 'trajs'
if not os.path.exists(dirname):
os.makedirs(dirname)
traj.dump(f"{dirname}/traj{itraj}.h5",format='h5md')
# add the source of molecule
for mol in moldb2label:
mol.sampling = 'md'
return moldb2label
ml.al(
...
sampler=my_sampler,
sampler_kwargs={'time_step': 0.5},
...
)