Prediction of cell migration potential on human breast cancer cells treated with Albizia lebbeck ethanolic extract using … – Nature.com

Plant material

Fresh stem barks of A. lebbeck were collected during the rainy season (April to October) from northern Nigeria, a town called Tabuli, part of Gaya Local Government, Kano State, during their flowering stage and dried at room temperature. The A. lebbeck stem bark collection follows all the applicable international standards, guidelines, and laws. The plant specimen was authenticated by Dr. Bala Sidi Aliyu, and deposited with voucher specimen number BUKHAN187 at the herbarium Plant Biology Department, Faculty of Science, Bayero University Kano.

Dried Albizia lebbeck stem barks were pulverised to clear powder and subjected to flask extraction using 99.9% methanol as extraction solvent. Powdered A. lebbeck stem bark (50g) was soaked in an Erlenmeyer flask containing methanol (500mL) and placed under continual shaking for 48 at room temperature27. Whatman filter paper No.1 was used to filter the extract and concentrate it under reduced pressure using a Rotary evaporator. The concentrated extract was dried completely at 40C in an oven and stored at 4C before the analysis.

The ALEE extracts were analysed for their total flavonoid (TFC) and total phenolic content (TPC) using standard spectrophotometric methods28,29. To determine TFC determination, ALEE (1mg/mL) was mixed with NaNO2 solution (5%), 10% AlCl3, and 1M NaOH, and absorbance was measured at 510nm. Folin-Ciocalteu reagent was added to ALEE (10:1) for TPC, followed by incubation with Na2CO3 (7.5%) and absorbance measurement at 760nm. Results are presented as mg quercetin equivalent (QE)/g dry extract and gallic acid equivalents (g GAEs/g dry extract).

We utilised gas chromatography-mass spectrometry (GCMS) to analyse the organic composition of ALEE. We first created a crude extract in ethanol (1mg/mL) and filtered it via a 0.22m syringe filter. Then, we injected it into a Shimadzu GCMSQP2010 plus analyser with helium as the carrier gas at a steady flow rate of 1mL/min. The oven temperature was set at 50 C for 2min and gradually increased by 7 C/min. We assessed the mass spectra at a scanning interval of 0.5s, with a complete scan range from 25 to 1000m/z, employing a Quadrupole mass detector. Ultimately, we identified the existing compounds by scrutinising the spectrum via the WILLEY7 MS library.

MDA-MB 231 (strongly metastatic) and MCF-7 (weakly metastatic) BCa cell lines were obtained as a gift from Imperial College London (UK) and stored at the Biotechnology Research Centre (BCR) of Cyprus International University. The BCR ethical committee (BRCEC2011-01) approved using these cell lines in our study. We cultured the cells in Dulbecco's Modified Eagle's Medium (DMEM) (Gibco by Life Technology USA), supplemented with 2mM L-glutamine, penicillin, and 10% fetal bovine serum (FBS), and maintained them in a sterile incubator at 37C and 5% CO2.

We conducted a tryphan blue dye exclusion assay, following the guidelines provided by Fraser et al.31, to measure the level of cytotoxicity in BCa cells. We administered various doses, 0, 2.5, 10, 25, 50, 100 and 200 g/mL, to the cells and observed them for 24, 48, and 72h. After this period, we replaced the medium with a diluted tryphan blue solution, formulated by mixing 0.25ml of the dye with 0.8ml of medium. This assay accurately determined the extent of cytotoxicity present in the cells. Data are presented as averages of 330 measurements.

The proliferation of MDA-MB 231 (strongly metastatic) and MCF-7 (weakly metastatic) BCa cells treated with ALEE extracts were assessed using MTT (3-[4,5-dimethylthiazol-2-yl]-2,5-diphenyltetrazolium bromide) reagent Sigma-Alderich) as described by Fraser et al. (1990) with some adjustments. BCa cells (3104 cells/mL) cultured in tissue plates (12-well) were treated with 10, 5, 2.5, and 0g/mL of ALEE extracts and incubated for 24, 48, and 72h. Treatments and culture medium (DMEM) were replaced every 24h. Microplate Reader (ELX 800) was used to measure the absorbance of the treated cell and control at 570nm. All the experiment was performed at least thrice in triplicates (n3).

A wound heal assay was carried out to evaluate the anti-metastatic potential of ALEE extracts against highly metastatic (MDA-MB 231) and weakly metastatic (MCF-7) cells using the method of Fraser et al. with some modifications. Cells were plated in 35mm culture dishes, and parallel and intersecting lines were drawn on the culture dishes31. Briefly, 1106/mL and 5105/mL cells per dish of MCF-7 and MDA-MB 231, respectively, were plated on 35mm culture dishes, and three scratch lines were made using pipette tips (200 L) after the cell settled. The initial and subsequent wounds causedwere captured using a camera (Leica, Germany) attached to an inverted microscope at100 magnification, and image processing software (ImageJ) was used to analyse the recovery wound area (cell migration) by migrating cells using Eq.(1).

$$mathrm{Mo I}=1-(frac{ {text{Wt}}}{mathrm{ W}0})$$

(1)

Mo I, motility index; Wt, the wound width at 24 or 48h; W0, initial wound width at 0h.

The study of the science of data is critical in any driven-model data-driven model. The accuracy of the data was tested using XGB, ELM, and MLP algorithms with MATLAB (R2021a). In this work, various models were proposed for the in vitro cancer metastasis prediction in MDA-MB 231 and MCF-7 cells, respectively. The data was collected from our experimental data set (n80) to reveal the accuracy of the algorithms. In this way, two parameters were used as input variables, i.e. the motility index on the cells and the concentration of the extract, respectively. The two parameters we considered in modelling were the concentration of the extract and the motility index, although other parameters can be utilized for the same purpose. The models used have a learning algorithm with a single layer, and a fast learning rate and both the hidden biases and input layers which process and distribute data respectively, in the network are chosen randomly. However, other variables can also be used in the simulation of in vitro cancer metastasis prediction in both cell lines. In addition, models provide details on the effectiveness of the treatment, and choosing a single model that can perform best in most circumstances is difficult for the predictors, but applying various ensemble models can reveal the best models that will fit the data. Determination of cell migration potentials in breast cancer cells treated with ALEE extract using the motility index on the cells and the extract concentration as the input parameters were the main objectives of our proposed method. The proposed flowchart of the models is shown in Fig.1.

Proposed flowchart of experimental data-driven methods.

The XGB algorithm is a commonly used model that is highly efficient with high reproducibility in analysing and modelling data using various inputs and outputs. The method was first introduced and improved by Friedman et al.32, and it plays an essential role in the classification and regression of data. Its application in extreme learning techniques is well-known and the technique33. The technique uses a precise setup of up best complex decision tree algorithm to reveal good performance and speed faster than the standard gradient algorithm34. XGB is a machine learning ensemble technique that works similarly to Random Forest and is recognised by its classification and regression trees (CART) set. The model utilizes parallel processing to enhance learning speed, balance between variance and bias, and minimize the risk of overfitting. Furthermore, it is not the same with the decision tree (DT), whereby every leave carries an actual score, which aids in enriching those interpretations which cannot be defined using the DT. Algorithms have been used in modelling and predicting data, and it has shown promising results. Due to this ensemble technique's wide application and excellent features, we use it to model and predict the anti-migratory potential of the cells. Given that CART ([(xi, yi)dots ..{text{T}}K(xi, yi)]) is the training data set of the treated cells motility index represented as xi to predict outcomes yi and determined using K classification, as shown in Eq.(2)35:

$$widehat{y}= sum_{k=1}^{K}{f}_{k}left({x}_{i}right), {f}_{k}in F$$

(2)

where ({f}_{k}) represents independent tree structure with cells motility index scores, and F denotes the space of all CART. Optimisation of the objective is given by Eq.(3)35:

$$objleft(theta right)= sum_{i=1}^{n}l({y}_{i}, {widehat{y}}_{i})+sum_{i=1}^{t}Omega ({f}_{i})$$

(3)

The loss function is denoted (l) which estimates the difference between target ({y}_{i}) and predicted ({widehat{y}}_{i}). The regularization function that penalises the model to avoid over-fitting is denoted as (Omega ,) and ({f}_{i}) represents the simultaneous training loss function. Furthermore, the prediction value for (t) at step ({widehat{y}}_{i}^{t})35:

Prediction (widehat{y}) at the t step can be expressed as

$${widehat{y}}_{i}^{t}=sum_{k=1}^{t}{f}_{k}left({x}_{i}right)={widehat{y}}_{i}^{t-1}+{f}_{t} ({x}_{i})$$

(4)

Substituting the predicted value in Eq.(4). Equation(3) can be expressed as36:

$${obj}^{t}= sum_{i=1}^{n}({{y}_{i}-({widehat{y}}_{i}^{t-1}+{f}_{t} left({x}_{i}right)))}^{2}+sum_{i=1}^{t}Omega ({f}_{i})$$

(5)

It can also be expressed as

$${obj}^{t}= sum_{i=1}^{n}{[ 2left( {widehat{y}}_{i}^{t-1}-{y}_{i}right){f}_{t} left({x}_{i}right) +{f}_{t} left({x}_{i}right)}^{2}+Omega ({f}_{t})+constant$$

(6)

Looking at Taylors expansion due to loss of function, it can be expressed in Eq.(7)36:

$${obj}^{t}= sum_{i=1}^{n}{[ lleft({y}_{i},{widehat{y}}_{i}^{t-1}right)+{g}_{i} {f}_{t} left({x}_{i}right) +{frac{1}{2}{h}_{i} f}_{t} left({x}_{i}right)}^{2}+Omega ({f}_{t})+constant$$

(7)

where ({g}_{i}= {partial }_{{widehat{y}}_{i}^{t-1}}{l(y}_{i}-{widehat{y}}_{i}^{t-1})), and ({h}_{i}= {partial }_{{widehat{y}}_{i}^{t-1}}^{2}{l(y}_{i}-{widehat{y}}_{i}^{t-1})). Which was described by ({f}_{t}left(xright)= {w}_{q(x)},) and the normalised function is expressed as

$$Omega left(fright)= gamma T+frac{1}{2}lambda sum_{j=1}^{T}{w}_{j}^{2}$$

(8)

where (T) represent the total number of trees, and the objective function can rewritten as

$${obj}^{t}approx sum_{i=1}^{n}[{g}_{i} {w}_{qleft({x}_{i}right)}+ frac{1}{2}{h}_{i}{w}_{qleft({x}_{i}right)}^{2}]+gamma T+frac{1}{2}lambda sum_{j=1}^{T}{w}_{j}^{2}=sum_{j=1}^{T}[left(sum_{iin {I}_{i}}{g}_{i}right){w}_{j}+ frac{1}{2}left(sum_{iin {I}_{i}}{h}_{i}+lambda right){w}_{j}^{2}+ gamma T$$

(9)

where ({I}_{i}={left.iright| qleft({x}_{i}right)=j}) refers to the ({j}^{th}) leaf data index. ({G}_{j}=sum_{iin {I}_{i}}{g}_{i}) and ({H}_{j}=sum_{iin {I}_{i}}{h}_{i}), the objective function can be written as

$${obj}^{t}= sum_{j=1}^{T}[{G}_{j}{w}_{j}+frac{1}{2}left({H}_{j}+ lambda right){w}_{j}^{2}+gamma T$$

(10)

Performance for (q(x)) can be achieved using the objective function and ({w}_{j,}) as you can see in Eqs. (11) and (12).

$${w}_{j}^{*}= -frac{{G}_{j}}{{H}_{j}+ lambda }$$

(11)

$${obj}^{*}= - frac{1}{2} sum_{j=1}^{T}frac{{G}_{j}}{{H}_{j}+ lambda }+gamma T$$

(12)

In addition, Eq.(13) is for leaf node score during splitting, L and R are the left and right scores, and the regularisation of the additional leaf is denoted as (gamma).

$$Gain= frac{1}{2}left[frac{{G}_{L}^{2}}{{H}_{L}+lambda }+ frac{{G}_{R}^{2}}{{H}_{R}+lambda }-frac{{({G}_{L}+{G}_{R})}^{2}}{{H}_{L}+ {H}_{R}+ lambda }right]-gamma$$

(13)

The ELM model is a novel learning algorithm with a single hidden layer that works similarly to a feed-forward neural network (FNN) due to its approximation potential. And it was first introduced by Huang et al.37. Issues such as slower training speed and over-fitting with FNN have been addressed analytically by ELM through inversion and matrix multiplication38. The structure of this model contains only one layer and hidden nodes, which result in the model not requiring a learning process to calculate its parameters, and hence, it remains constant during both the training and predicting phases. In addition, ELM hidden biases and input layer are chosen randomly, and the MoorePenrose generalised inverse function determines the output layer. The ELM revealed precision due to its robustness when applied to hydrological.

Modelling39.

The ELM was expressed by training dataset ({left({x}_{1}, {y}_{1}right), dots , left({x}_{t}, {y}_{t}right)}). Overall, the input are represented as ({x}_{1}, {x}_{2}, dots , {x}_{t}) and the output as ({y}_{1}, {y}_{2}, dots , {y}_{t}).

The training dataset (N) ((t = 1, 2, dots , N)) where ({x}_{t} in {mathbb{R}}^{d}) and ({y}_{t}in {mathbb{R}}), with (H) hidden nodes, is given by37 as in Eq.(14):

$$sum_{i=1}^{H}{B}_{i}{g}_{i}left({alpha }_{i}.{x}_{t}+{beta }_{i}right)= {z}_{t},$$

(14)

Equation(14), (i) represents index of the hidden layer node, ({beta }_{i}) and ({alpha }_{i}) denote the bias and weight of the random layers, and (d) is the number of inputs. Furthermore, the predicted weight of the output layer, model output and hidden layer neurons activation function are (B in {mathbb{R}}^{H}), (Z({z}_{t}in {mathbb{R}})) and (Gleft(alpha ,beta , xright)) respectively. The best activation function is found to be the sigMoId function40 as follows:

$$Gleft(xright)= frac{1}{1+exp(-x)},$$

(15)

In addition, the output layer utilizes a linear activation function, which is shown in the following equation:

$$sum_{t=1}^{N}Vert {z}_{t}-{y}_{t}Vert =0,$$

(16)

The value of (B) is calculated using the system of linear equations as expressed in Eq.(17) and G in Eq.(18)

$$Gleft(alpha ,beta , xright)= left[begin{array}{c}g({x}_{1})\ vdots \ g({x}_{N})end{array}right]={ left[begin{array}{ccc}{g}_{1}({alpha }_{1}.{x}_{1}+{beta }_{1})& cdots & {g}_{L}({w}_{H}.{x}_{1}+{beta }_{H})\ vdots & cdots & vdots \ {g}_{1}({alpha }_{N}.{x}_{N}+{beta }_{1})& cdots & {g}_{L}({w}_{H}.{x}_{N}+{beta }_{F})end{array}right]}_{N times H}$$

(18)

B is calculated in Eq.(19), and Y in Eq.(20).

$$B={left[begin{array}{c}{B}_{1}^{T}\ vdots \ {B}_{H}^{T}end{array}right]}_{H times 1}$$

(19)

$$Y={left[begin{array}{c}{y}_{1}^{T}\ vdots \ {y}_{N}^{T}end{array}right]}_{N times 1}$$

(20)

G is for the hidden layer. (widehat{B}) was calculated using MoorePenrose inverse function+by inverting the hidden-layer matrix (see Eq.21).

$$widehat{B}={G}^{+}Y$$

(21)

Overall, estimated (widehat{y,}) which denotes the predicted MoI of the cells whic,h can achieved using Eq.(22).

$$widehat{y}= sum_{i=1}^{H}{widehat{B}}_{i}{g}_{i}left({alpha }_{i}.{x}_{t}+{beta }_{i}right)$$

(22)

MLP, as one of the commonly applied Artificial neural networks (ANNs) composed of information processing units and an advanced simulation tool, motivated and mimicked the biological neurons. In this way, ANN, just like the human central nervous system (CNS), can solve complex problems with a non-linear and linear behaviour by combining features such as parallel processing, generalisation, learning power and decision making41. The general architecture of ANN consists of 3 layers with individual and different tasks: the input layer, which distributes the data in the network; the hidden layers, which process the information and the outputs, which, in addition to processing each input vector, show its work. The neurons are regarded as the smallest unit that processes the networks. The basic characteristics of MLP include using interactive connections between the neurons without advanced mathematical design to complete the information processing. Furthermore, MLP comprises input, one or more hidden and output layers in its architecture, similar to the ANN (Fig.2)40.

Schematic diagram of MLP network structure.

To evaluate the performance efficiency of the artificial intelligence-based models used in the current study; two different metrics, where; NashSutcliffe coefficient (NS) was used for understanding the fitness between the experimental and predicted values, while Root mean square error (RMSE) was used in determining the errors depicted by each model.

Hence, the Root mean square error (RMSE) was expressed as:

$$RMSE=sqrt{frac{1}{N}} sum_{j=1}^{N}{left({(Y)}_{obs,j}-{(Y)}_{com,j}right)}^{2}$$

(23)

NashSutcliffe coefficient (NS), expressed as:

$${text{NS}}=1-left[frac{{sum }_{i=1}^{N}{left({Q}_{obs,i}-{Q}_{sim,i}right)}^{2}}{{sum }_{i=1}^{N}{left({Q}_{obs,i}-{overline{Q} }_{obs,i}right)}^{2}}right]infty le NSle 1$$

(24)

Read more:
Prediction of cell migration potential on human breast cancer cells treated with Albizia lebbeck ethanolic extract using ... - Nature.com

Related Posts

Comments are closed.