In the following section, we will give details about the methodologies that make up our proposed workflow for predicting the conductivity and band gap of spinel oxide materials: DFT for band structure calculation, principal layer, Greens functions, Landauer formalism for electric conductivity, and machine learning and band structure fitting to tight binding Hamiltonian.
All DFT calculations were done using the Vienna ab-initio simulation package (VASP)25 with the PerdewBurkeErnzerhof (PBE) functional26. We align the spins in a ferrimagnetic arrangement with Nel collinear configuration, where the magnetic moment of tetrahedral cations is opposing that of octahedral cations. Following convergence tests, the calculations for the cubic cells utilized a 600 eV plane wave energy cut-off and a 333 k-point mesh, centered at the -point. The self-consistent electronic minimization employed a 0.1meV threshold and ionic relaxation had a force threshold of 0.01eV/. The band structure was calculated using k-points along high symmetry lines, selected based on the implementation in the pymatgen27 library as described in ref28. The k-point file is available in the SI. The Hubbard U values applied to Mn, Co, Fe, Ni, and O atoms were 3.9, 3.5, 4.5, 5.5, and 0 eV, respectively, where all U values were used based on previous literature works29,30,31,32,33,34,35. The unit cell of all the spinels in the dataset consists of 56 atoms, 32 of which are oxygen, 16 of which are transition metal ions in tetrahedral sites and 8 transition metal ions in octahedral sites (Fig.8). The dataset contains 190 different combinations of this cell as explained later in the results.
An example of normal spinel oxide unit cell geometry (Co16Ni8O32). The red spheres are oxygen ions, gray are nickel ions in tetrahedral sites and blue are cobalt ions in octahedral sites.
For calculating the current, we followed the well-established approach of LandauerBttiker for quantum transport36,37. In this formalism, we postulate that there is a potential between two electron reservoirs that possess an equilibrium distribution and has dissimilar chemical potentials. These reservoirs act as sources and drains of electrons for the left and right electrodes, respectively. The device, which we intend to evaluate its current, constitutes the scattering region located between the electrodes. The variance in the electrochemical potentials of the reservoirs represents the applied bias on the scattering region. The LandauerBttiker is
$$I=frac{2e}{h}underset{-infty }{overset{infty }{int }}Tleft(Eright)left[fleft(E-{upmu }_{L}right)-fleft(E-{upmu }_{R}right)right]dE$$
(1)
where (fleft(E-{upmu }_{L}right)) and (fleft(E-{upmu }_{R}right)) are the FermiDirac electron distributions with ({upmu }_{L}) and ({upmu }_{R}) the chemical potentials of the left and right electrodes, respectively. (Tleft(Eright)) is the transmission function of an electron from left to right electrodes through the scattering region. In this setup the system is divided into principal layers that interact only with the nearest neighbors layer and each layer is described by a tight-binding Hamiltonian, ({H}_{i}), where i is the layer number. Since each electrode is semi-infinite, this approach introduces an infinite problem. Fortunately, this problem is solved by using Greens function. The scattering region which must interest us is a finite region that can be obtained by
$${{varvec{G}}}_{{varvec{S}}}={left(left(E+ieta right){{varvec{S}}}_{{varvec{S}}}-{{varvec{H}}}_{S}-{boldsymbol{Sigma }}_{L}-{boldsymbol{Sigma }}_{R}right)}^{-1}$$
(2)
where E is the energy of the system and (eta) is a small number. ({{varvec{S}}}_{{varvec{S}}}) is the overlap matrix of the scattering region The overlap matrix terms are defined as the overlap integral between associate vectors of the Hamiltonian basis. ({{varvec{H}}}_{{varvec{S}}}) is a block matrix representing the scattering region, and the effect of the semi-infinite electrodes is contained through the finite self-energy matrices ({{varvec{Sigma}}}_{{varvec{L}}backslash {varvec{R}}}) of electrode left (L) and right (R) electrodes.
The transmission formula can be obtained from the LandauerBttiker formalism and quantum scattering theory36,37,38,39, and it is given by
$${mathbf{T}}left( {mathbf{E}} right) = {text{Tr}}left{ {{{varvec{Gamma}}}_{{mathbf{L}}} {mathbf{G}}_{{mathbf{S}}}^{dag } {{varvec{Gamma}}}_{{mathbf{R}}} {mathbf{G}}_{{mathbf{S}}} } right}$$
(3)
where ({Gamma }_{Lbackslash R}) is called the broadening matrix and it is given by the expression
$${{varvec{Gamma}}}_{{{mathbf{L}}backslash {mathbf{R}}}} = {text{i}}left( {{{varvec{Sigma}}}_{{{varvec{L}}backslash {varvec{R}}}} -{varvec{varSigma}}_{{{varvec{L}}backslash {varvec{R}}}}^{dag } } right)$$
(4)
A technique for deriving finite self-energy matrices is available in the literature40,41. This procedure results in the following terms for the left electrodes self-energy.
$$hat{user2{Sigma }}_{{varvec{L}}} = {varvec{H}}_{{user2{L^{prime}},{varvec{L}}}}^{dag } {varvec{g}}_{{varvec{L}}} {varvec{H}}_{{user2{L^{prime}},{varvec{L}}}}^{{}}$$
(5)
where ({{varvec{H}}}_{{{varvec{L}}}^{boldsymbol{^{prime}}},{varvec{L}}}) is the coupling Hamiltonian between the scattering region and the left electrode. And ({{varvec{g}}}_{{varvec{L}}}) is given by a recursive formula,
$$g_{L}^{{left( {n + 1} right)}} = left[ {{varvec{H}}_{{varvec{L}}} - {varvec{H}}_{{user2{L^{prime}},{varvec{L}}}}^{dag } {varvec{g}}_{{varvec{L}}}^{{left( {varvec{n}} right)}} {varvec{H}}_{{user2{L^{prime}},{varvec{L}}}}^{{}} } right]^{ - 1}$$
(6)
The initial guess is ({{varvec{g}}}_{{varvec{L}}}^{left({varvec{n}}right)}={{varvec{H}}}_{{{varvec{L}}}^{boldsymbol{^{prime}}},{varvec{L}}}) and the iterative procedure continues until ({left({{varvec{g}}}_{{varvec{L}}}^{left({varvec{n}}-1right)}-{{varvec{g}}}_{{varvec{L}}}^{left({varvec{n}}right)}right)}_{ij}^{2}le {updelta }^{2}), where (updelta) is a small number in the order of 11015, this convergence criterion is achieved within few iterations. A similar procedure is done for the right self-energy (for symmetric electrodes and bias ({{varvec{g}}}_{{varvec{L}}}={{varvec{g}}}_{{varvec{R}}})). ({boldsymbol{Sigma }}_{{varvec{L}}}) takes the size of the scattering Hamiltonian and has a block element only in the top left of the left self-energy matrix and in the bottom right of the right self-energy,
$${boldsymbol{Sigma }}_{{varvec{L}}}=left(begin{array}{ccccc}{widehat{Sigma }}_{L}& 0& cdots & 0& 0\ 0& 0& 0& cdots & 0\ vdots & 0& 0& cdots & vdots \ 0& vdots & cdots & ddots & 0\ 0& 0& cdots & 0& 0end{array}right)$$
(7)
Our system is divided into the so-called principal layers42,43,44 where each layer consists of four unit cells of spinel oxide as illustrated in Fig.9. Using only four unit cells is sufficient to capture all the necessary information from the TB Hamiltonians that were adjusted to fit the DFT band structure. The layers are interconnected by a coupling Hamiltonian, effectively forming an endless wire. The Hamiltonian of a single layer of our system was defined as follows
$$H_{layer} = left( {begin{array}{*{20}c} {{varvec{H}}_{00} } & {{varvec{H}}_{010} } & {{varvec{H}}_{001} } & 0 \ {{varvec{H}}_{010}^{dag } } & {{varvec{H}}_{00} } & 0 & {{varvec{H}}_{001} } \ {{varvec{H}}_{001}^{dag } } & 0 & {{varvec{H}}_{00} } & {{varvec{H}}_{010} } \ 0 & {{varvec{H}}_{001}^{dag } } & {{varvec{H}}_{010}^{dag } } & {{varvec{H}}_{00} } \ end{array} } right)$$
(8)
where the elements in the Hamiltonian matrix are blocks of matrices. The blocks along the diagonal correspond to a tight-binding Hamiltonian (the tight-binding Hamiltonians defined in the next section) of each of the four unit cells within a single layer of spinel oxide. The block matrices along the off-diagonal represent the tight-binding Hamiltonian matrix elements that connect each unit cell with its neighboring cells within the same layer. The subscripts used for H indicate the spatial directions between a unit cell and its adjacent neighbors. The block matrix below defines the coupling between each layer and its neighboring layers,
A schematic representation of a wire built from principal layers of four unit cells each. The layers are coupled in the 100 direction.
$${H}_{layerCoupling}=left(begin{array}{cccc}{{varvec{H}}}_{100}& 0& 0& 0\ 0& {{varvec{H}}}_{100}& 0& 0\ 0& 0& {{varvec{H}}}_{100}& 0\ 0& 0& 0& {{varvec{H}}}_{100}end{array}right)$$
(9)
To maintain a manageable size for the Hamiltonians and take advantage of the electrodes being made of the same material, we opted to use only two layers to represent our material in the scattering region. This approach essentially yields an infinite material through which the current can flow. The Hamiltonian for the scattering region is expressed as follows:
$$H_{s} = left( {begin{array}{*{20}c} {{varvec{H}}_{{{varvec{layer}}}} } & {{varvec{H}}_{{{varvec{layerCoupling}}}} } \ {{varvec{H}}_{{{varvec{layerCoupling}}}}^{dag } } & {{varvec{H}}_{{{varvec{layer}}}} } \ end{array} } right)$$
(10)
The size of the scattering Hamiltonian is composed of 22 blocks representing two layers and their connection, and each block is composed of 44 block matrices that include ({{varvec{H}}}_{00},{{varvec{H}}}_{100},{{varvec{H}}}_{010},{{varvec{H}}}_{001}) , where these matrices have the size of 88 (since in this work we fitted to eight bands). Thus, in total the size of ({H}_{s}) is 6464.
The project emphasizes material property prediction and follows a workflow suited for such a task. The workflow includes the following steps: building a dataset by collecting and cleaning data and dividing it into a train and test set, selecting features, training the model by selecting an appropriate model, tuning hyperparameters, and optimizing it. The model's performance is evaluated on the test set and compared to other models, and finally, the model is used for making predictions.
To prepare for model training and testing, the dataset is first divided into two sets: a training set and a test set. The latter is used to evaluate the model's ability to predict properties on new data that it has not seen during training. It's a mistake to train and test on the same data since the model can overfit the training data and perform poorly on new data. A standard approach is to partition 80% of the dataset for training and reserve the remaining 20% for testing.
To set up a dataset for machine learning (ML) models, it should be divided into features and target variables. Features are independent variables grouped into a vector that serves as input for the ML model. Choosing informative and independent features is crucial for model performance. A single input vector represents a sample, and the order of features must remain consistent. The target variables are the properties to be predicted. For the materials in this project, A and B elements and stoichiometric numbers x and y from the formula AyBy8A16xBxO32 are the most representative features. Oxygen has a constant value and doesn't add information to the feature vector. The A and B atoms are represented by their atomic number values: Mn=25, Fe=26, Co=27, and Ni=28. Features normalization and scaling are important to ensure a proper functioning of some ML algorithms and to speed up gradient descent. In this project, we applied standardization using Scikit-learn module45, which involves removing the mean value and scaling to unit variance.
For the regression problems, which are predicting current from compositions, predicting current from bandwidth and prediction of bandgaps, we have chosen to utilize five different machine learning algorithms for comparison of their performance. For predicting current from composition these algorithms include kernel ridge regression (KRR), support vector regression (SVR), random forests (RF), neural networks (NN), and an ensemble model which takes the average result of all other four algorithms. For the prediction of current from bandwidth and for the bandgap prediction we used the XGboost algorithm46. These algorithms were chosen as they each have unique strengths in handling different types of data and relationships. Specifically, NN were chosen due to their ability to model complex, non-linear relationships, which is often the case in predicting material properties like electronic conductivity and band gaps; SVR was selected for its effectiveness in high-dimensional spaces, which is common in material science data. It also has strong theoretical foundations in statistical learning theory; KRR was chosen for its ability to handle non-linear relationships through the use of kernel functions. It also has the advantage of being less sensitive to outliers; RF was selected for its robustness and ease of use. It performs well with both linear and non-linear data.
ML models are functions defined by parameters. During training, the model optimizes the parameters to fit the data. Hyperparameters, like regularization values or architecture of a neural network, are tuned by the user. To avoid overfitting and ensure models perform well with new predictions, cross-validation procedures are used. The training set is split into k-folds, and the model is trained on k-1 folds and evaluated on the remaining fold (validation set) k times. The average performance is reported, and once hyperparameters are tuned, the model is trained on the whole training set and tested. In evaluating the model, selecting an appropriate metric is crucial. For this work, we have chosen the mean absolute error (MAE) as it provides a natural measurement of the accuracy of a model that predicts physical values such as current. Additionally, we have used R2 to measure the linearity between the predicted values of the models and the true values that were calculated.
Traditional tight-binding fitting schemes entail tuning numerous parameters that represent the atomic structure and do not always yield precise results47. A more common approach is utilizing Wannier functions, obtained by transforming extended Bloch functions from DFT calculations, which derive tight-binding parameters from ab-inito results without the need for fitting48,49. However, this method demands extensive system knowledge and a trial-and-error process due to the numerous parameters involved. Our objective is to rapidly and precisely compute the tight-binding Hamiltonian for approximately two hundred material samples.
Our approach follows the method proposed by Wang et al., where a parametrized tight-binding Hamiltonian is obtained by employing the back-propagation technique to fit the real-space tight-binding matrices to the DFT band structure50. Here we implemented the same approach but with pyTorch51 library instead of TensorFlow52 as done by the original authors. In TB theory the reciprocal space Hamiltonian for a desired K vector is,
$${H}^{TB}(k)={sum }_{R}{e}^{ikcdot R}{H}^{R}$$
(11)
where R is a lattice vector of selected real-space Hamiltonian matrices and ({H}^{R}) is the corresponding TB Hamiltonian. H0 (R=(000)) is the real-space TB Hamiltonian matrix of the unit cell, ({H}^{R}ne 0) (R=(100), (010), (001), etc.) are the Hamiltonian matrices that couple the unit cell Hamiltonian to the adjacent unit cells in the direction of R vectors. The TB band structure is obtained from the eigenvalues ({{varepsilon }^{TB}}_{n,k}) for every desired k vector via,
$${H}_{k}^{TB}{psi }_{n,k}^{TB}={{varepsilon }^{TB}}_{n,k}{psi }_{n,k}^{TB}$$
(12)
where ({{varepsilon }^{TB}}_{n,k}) is the energy of the n-th band at reciprocal vector k. To fit the TB bands to the DFT bands, the mean squared error loss L between the TB and DFT eigenvalues is minimized,
$$L=frac{1}{N}frac{1}{{N}_{k}}{{sum }_{i=1}^{N}{sum }_{k}{left({varepsilon }_{i,k}^{TB}-{varepsilon }_{i,k}^{DFT}right)}^{2}}$$
(13)
The loss is computed for all bands and k-points, where N represents the total number of bands and Nk represents the number of sampled k-points. The TB Hamiltonians parameters (HR) are updated iteratively using the Adam gradient descent algorithm53. The back-propagation procedure is used to calculate the derivative of the loss with respect to HRs. The gradient descent algorithm seeks to minimize the loss by moving in the direction of the steepest descent, which is opposite to the direction of the gradient, according to the general formula,
$${H}_{l+1}^{R}={H}_{l}^{R}-alpha frac{partial L}{partial {H}_{l}^{R}}$$
(14)
The variable "l" represents either an epoch number (epoch defines the number of times the entire data set, in this case all the k points, has worked through the learning algorithm) or a single iteration over all k points. In this context, an epoch is considered complete after the loss function has accumulated all the eigenvalues in the band structure, which occurs after calculating all ({H}^{TB}(k)) values. The derivative of the loss with respect to HR is given with matrix derivative algorithmic differentiation rules54.
The fitting process has a predefined numerical threshold (means squared error<(3cdot {10}^{-5} {text{eV}}^{2}), as defined in Eq.13) for the loss function, which serves as a criterion to terminate the fitting. Additionally, we introduced another criterion that halts the fitting if the loss increases for more than ten epochs due to spikes or jumps in the loss function during training. If the loss was already low, we consider the results of the fitting process. However, if the loss was high, we re-run the process with slightly different initial random weights in the Hamiltonians.
To determine the R vectors for the fitting process, we conducted tests with various vectors, up to 24 in total. After experimentation, we concluded that the three main directions and their opposite directions were sufficient for achieving a highly accurate fitting of the bands. The size of the matrices used in the fitting process is defined by the number of bands being fitted. For instance, for eight bands, the matrix size is 88.
More:
From density functional theory to machine learning predictive models for electrical properties of spinel oxides ... - Nature.com
Read More..