.. _tutorial_krr_julia: Kernel ridge regression in Julia ================================= Here we show how to use kernel ridge regression (KRR) in MLatom to train and use your model. This function is based on our implementation in `Julia `_. Prerequisites --------------- - ``MLatom 3.18.0`` or later - ``Julia 1.10.2`` (Julia v1.4+ should work) - ``julia 0.6.2`` (this is a Python package) Apparently, you should have Julia installed in your local environment. You can download it easily from `its website `_ and install it following the `instructions `_. After you successfully dowanload and configure Julia, you can enter ``julia`` in the command line and open the interactive session. .. code-block:: _ _ _ _(_)_ | Documentation: https://docs.julialang.org (_) | (_) (_) | _ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help. | | | | | | |/ _` | | | | |_| | | | (_| | | Version 1.10.2 (2024-03-01) _/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release |__/ | julia> .. note:: As you see, we are using Julia 1.10.2. But to make PyJulia work, you should have at least Julia v1.4+. If you have no knowledge about Julia, it's totally OK, because we call Julia functions from Python. Before that, we need to install such a bridge in Python: .. code-block:: bash pip install julia==0.6.2 Then in Python, run the following codes: .. code-block:: python import julia julia.install() But things are not done yet. The Python interpreter is statically linked to libpython, but PyJulia does not fully support such Python interpreter. We recommend using ``python-jl`` to run your script: .. code-block:: bash python-jl your_script.py If it works well, you will be able to run the following codes: .. code-block:: python from julia import Base print(Base.sind(90)) Data and models ------------------- Here, you can download dataset and models that will be used in the following tutorials. - :download:`E_FCI_451.dat ` - :download:`E_UHF_451.dat ` - :download:`R_451.dat ` ML database -------------- We introduce a new data class called ``ml_database`` You can save x and y values in it. The ``ml.models.krr`` will use ``ml_database.x`` and ``ml_database.y`` to train the model. Here are the examples on how to deal with this class. .. code-block:: python import mlatom as ml mlDB = ml.data.ml_database() mlDB.read_from_x_file(filename="R_451.dat") # This will be saved in mlDB.x mlDB.read_from_y_file(filename="E_FCI_451.dat",property_name="fci") mlDB.read_from_y_file(filename="E_UHF_451.dat",property_name="uhf") mlDB.y = mlDB.get_property("fci") Train a KRR model ------------------- After loading the ML database, we can train a model. .. code-block:: python model = ml.models.krr(model_file="E_FCI.npz",kernel_function='Gaussian',ml_program='mlatom.jl') # The model will be saved in E_FCI.npz. Gaussian function is used as the kernel function. model.hyperparameters['sigma'].value = 4.0 model.hyperparameters['lambda'].value = 0.00001 model.train(ml_database=mlDB,prior=-1.0) # Set the prior value. The prior will be subtracted from each y value before training (and added back after prediction). # Predict values with the model model.predict(ml_database=mlDB,property_to_predict='fci_yest') Now we support several kernel functions: - Gaussian kernel function: :math:`k(\textbf{x}_1,\textbf{x}_2) = \text{exp}(-\lvert \textbf{x}_1-\textbf{x}_2 \rvert ^{2} / (2\sigma ^{2}))` - Periodic Gaussian kernel function: :math:`k(\textbf{x}_1,\textbf{x}_2) = \text{exp}(-2.0(\text{sin} (\pi \lvert \textbf{x}_1-\textbf{x}_2 \rvert))^2 / \sigma ^2)` - Decaying periodic Gaussian kernel function: :math:`k(\textbf{x}_1,\textbf{x}_2) = \text{exp} (-\lvert \textbf{x}_1-\textbf{x}_2 \rvert ^{2} / (2\sigma ^{2}) - 2.0(\text{sin} (\pi \lvert \textbf{x}_1-\textbf{x}_2 \rvert))^2 / \sigma_{p} ^2)` - Matern kernel function: :math:`k(\textbf{x}_1,\textbf{x}_2) = \text{exp}(-\lvert \textbf{x}_1-\textbf{x}_2 \rvert / \sigma) \sum \limits _{k=0} ^{n} \frac {(n+k)!} {(2n)!} \begin{pmatrix} n \\ k \end{pmatrix} (2\lvert \textbf{x}_1-\textbf{x}_2 \rvert / \sigma)^{n-k}` Hyperparameters of these kernel functions: .. table:: :align: center ============================= ================================== ==================================== kernel function how to call in MLatom hyperparameters ============================= ================================== ==================================== Gaussian ``'Gaussian'`` ``sigma`` Periodic Gaussian ``'periodic_Gaussian'`` ``sigma``, ``period`` Decaying periodic Gaussian ``'decaying_periodic_Gaussian'`` ``sigma``, ``period``, ``sigmap`` Matern ``'Matern'`` ``sigma``, ``nn`` ============================= ================================== ==================================== Default values of hyperparameters: .. table:: :align: center ================== =============== ======================================= hyperparameters default value description ================== =============== ======================================= ``sigma`` ``1.0`` sigma ``period`` ``2E-35`` period ``sigmap`` ``100.0`` sigmap ``nn`` ``1.0`` nn ``lambda`` ``2`` regularization parameter of KRR ================== =============== ======================================= Load a KRR model ------------------ If the model file exists, it will be loaded automatically during the initialization of the instance. .. code-block:: python import mlatom as ml mlDB = ml.data.ml_database() mlDB.read_from_x_file(filename="R_451.dat") model = ml.models.krr(model_file='E_FCI.npz',ml_program='mlatom.jl') model.predict(ml_database=mlDB,property_to_predict='fci_yest') Optimize hyperparameters ------------------------- Below, we show how to optimize hyperparameters using grid search. .. code-block:: python import mlatom as ml mlDB = ml.data.ml_database() mlDB.read_from_x_file(filename="R_451.dat") # This will be saved in mlDB.x mlDB.read_from_y_file(filename="E_FCI_451.dat",property_name="fci") mlDB.y = mlDB.get_property("fci") # Optimize hyperparameters model = ml.models.krr(model_file="E_FCI_grid_search.npz",kernel_function='Gaussian',ml_program='mlatom.jl') model.hyperparameters['sigma'].minval = 2**-5 # Set the minimum value of sigma model.hyperparameters['sigma'].maxval = 2**9 # Set the maximum value of sigma model.hyperparameters['sigma'].optimization_space = 'log' model.hyperparameters['lambda'].minval = 2**-35 # Set the minimum value of lambda model.hyperparameters['lambda'].maxval = 1.0 # Set the maximum value of lambda model.hyperparameters['lambda'].optimization_space = 'log' sub,val = mlDB.split(number_of_splits=2, fraction_of_points_in_splits=[0.9, 0.1], sampling='none') model.optimize_hyperparameters(subtraining_ml_database=sub,validation_ml_database=val, optimization_algorithm='grid', hyperparameters=['lambda','sigma'], training_kwargs={'prior':-1.0}, prediction_kwargs={'property_to_predict':'yest'},debug=False) model.predict(ml_database=mlDB,property_to_predict='fci_yest') Multi-outputs -------------- The model can be trained on multiple y values at the same time. Below we show how to train a model on both FCI and UHF energies. .. code-block:: python import mlatom as ml mlDB = ml.data.ml_database() mlDB.read_from_x_file(filename="R_451.dat") mlDB.read_from_y_file(filename="E_FCI_451.dat",property_name="fci") mlDB.read_from_y_file(filename="E_UHF_451.dat",property_name="uhf") Ntrain = len(mlDB.y) mlDB.y = np.concatenate((mlDB.get_property('fci').reshape((Ntrain,1)),mlDB.get_property('uhf').reshape((Ntrain,1))),axis=1) model = ml.models.krr(model_file="E_FCI_multi_output.npz",kernel_function='Gaussian',ml_program='mlatom.jl') # Optimize hyperparameters model.hyperparameters['sigma'].minval = 2**-5 # Set the minimum value of sigma model.hyperparameters['sigma'].maxval = 2**9 # Set the maximum value of sigma model.hyperparameters['sigma'].optimization_space = 'log' model.hyperparameters['lambda'].minval = 2**-35 # Set the minimum value of lambda model.hyperparameters['lambda'].maxval = 1.0 # Set the maximum value of lambda model.hyperparameters['lambda'].optimization_space = 'log' sub,val = mlDB.split(number_of_splits=2, fraction_of_points_in_splits=[0.9, 0.1], sampling='none') model.optimize_hyperparameters(subtraining_ml_database=sub,validation_ml_database=val, optimization_algorithm='grid', hyperparameters=['lambda','sigma'], training_kwargs={'prior':-1.0}, prediction_kwargs={'property_to_predict':'yest'},debug=False) model.predict(ml_database=mlDB,property_to_predict='fci_uhf_yest')