RAIN: machine learning-based identification for HIV-1 bNAbs – Nature.com

Ethics statement

The research complies with all relevant ethical regulations and informed consent was obtained by all study participants (n=25, 16 females and 9 males). Study protocols were approved by the Ethikkomission beider Basel (EKBB; Basel, Switzerland; reference number 342/10), the Ifakara Health Institute Institutional Review Board (Reference number IHI/IRB/No. 24-2010), and the National Institute for Medical Research (NIMR; Dar es Salaam, United Republic of Tanzania; reference number NIMR/HQ/R.8a/Vol.IX/1162).

Serum samples from HIV-1-infected individuals were incubated with Protein G Sepharose (GE Life Sciences) 4C for 1h. IgGs were eluted from chromatography columns using 0.1M glycine (pH=2.9) into 0.1M Tris (pH=8.0)70. Samples were run through Zeba Spin Desalting Columns 7K MWCO (Thermo Scientific, 89882) Concentrations of purified IgGs were determined by UV/Vis spectroscopy (A280) on a Nanodrop 2000 and samples were stored at 80C.

The CD19+ cell fraction was enriched from PBMCs by positive selection with CD19 magnetic microbeads (Miltenyi Biotech) and subsequently stained on ice for 30min with the following fluorochrome-labeled mouse monoclonal antibodies: CD20-PE-Cy7 (dilution 1:50, clone L27, catalog no. 335793, BD Biosciences) and F(ab)2-Goat anti-Human IgG Fc secondary antibody, APC (dilution 1:100, RRID: AB_2337695, Jackson ImmunoResearch). Cells were sorted to over 98% purity on a FACS Aria III (BD) using the following gating strategy: circulating memory B cells were sorted as CD20+ IgG+ cells. FACS-sorted cells were collected in 6l FCS in Eppendorf tubes that were pre-coated overnight with 2% BSA.

The 5 single-cell VDJ libraries were generated using Chromium Next GEM Single Cell V(D)J Reagent kit v.1, 1.1 or v.2 (10X Genomics) according to the manufacturers protocol. Paired heavy and light-chain BCR libraries were prepared from the sorted B-cell populations. Briefly, up to 20,000 memory B cells per well of 10X chip were loaded in the 10X Genomics Chromium Controller to generate single-cell gel beads in emulsion. After reverse transcription, gel beads in the emulsion were disrupted. Barcoded complementary DNA was isolated and used for the preparation of BCR libraries. All the steps were followed as per the manufacturers instructions in the user guide recommended for 10X Genomics kit v.1, 1.1, or 2. The purified libraries from each time point were pooled separately and sequenced on the NextSeq550 (Illumina) as per the instructions provided in 10X Genomics user guide for the read length and depth.

Memory B cells were targeted for single-cell targeted RNA-seq and BCR-Seq analysis using the BD Rhapsody Single-Cell Analysis System71 (BD Biosciences). Briefly, the single-cell suspension was loaded into a BD Rhapsody cartridge with >200,000 microwells, and single-cell capture was achieved by random distribution and gravity precipitation. Next, the bead library was loaded into the microwell cartridge to saturation so that the bead was paired with a cell in a microwell. The cells were lysed in a microwell cartridge to hybridize mRNA molecules onto barcoded capture oligos on the beads. These beads were then retrieved from the microwell cartridge into a single tube for subsequent cDNA synthesis, exonuclease I digestion, and multiplex-PCRbased library construction. Sequencing was performed on NovaSeq paired-end mode.

Single-cell suspensions with 1105 cells/mL in PBS were prepared. Then, the suspensions were loaded onto microfluidic devices, and scRNA-seq libraries were constructed according to the Singleron GEXSCOPE protocol in the GEXSCOPE Single-Cell RNA Library Kit (Singleron Biotechnologies)72. Individual libraries were diluted to 4nM and pooled for sequencing. Pools were sequenced on an Illumina HiSeq X with 150bp paired-end reads.

Expi293 cells (Thermo Fisher Cat No. A14527) were diluted to a final volume of 0.5L at a concentration of 2.5106cells mL1 in Expi293 media73. Heavy-chain and light-chain plasmids were complexed with Polyethyleneimine (Thermo Fisher) and added to the cells. On day five, cells were cleared from cell culture media by centrifugation at 10,000g for 30min, and the supernatant was subsequently passed through a 0.45-m filter. The supernatant containing the recombinant antibody was purified with the HiTrap Protein A HP column (Cytiva, 17040301) on the kta pure system (Cytiva). The resin was washed with 75mL of phosphate-buffered saline (PBS). A total of 25mL of 0.1M glycine pH 2.9 were used to elute the antibody from the protein A resin. The acidic pH of the eluted antibody solution was increased to ~7 by the addition of 1M Tris pH 8.0. The antibody solution was buffer exchanged to PBS by the HiPrep 26/10 Desalting column (GE Healthcare) or Size Exclusion Chromatography Superdex 16/600 HiLoad (Cytiva), filtered, snap-frozen in liquid nitrogen, and stored at 80C.

For the Fab production, the heavy chain was engineered with a two amino acids glycine serine linker followed by a six-histidine tag and stop codon. Light and mutated heavy chains were transfected as described in the previous section. Cell supernatant was harvested five days post- transfection and purified by IMAC chromatography (HisTrap excel, Cytiva) using the elution buffer 25mM Tris pH 7.4, 150mM NaCl, 500mM imidazole. The eluat was buffer exchanged to 25mM Tris pH 7.4, 150mM NaCl, 0.085mM n-dodecyl -D-maltoside (DDM) on a HiPrep 26/10 Desalting column (GE Healthcare), followed by Size Exclusion Chromatography on a Superdex 16/600 HiLoad column (Cytiva)74. The sample was concentrated using an Amicon filter 10kDa cutoff, snap-frozen, and stored at 80C until further use.

BG505 DS-SOSIP trimer75 production and purification were performed as previously described48. Briefly, prefusion-stabilized Env trimer derived from the clade A BG505 strain was stably transfected in CHO-DG44 cells and expressed in ActiCHO P medium with ActiCHO Feed A and B as feed (Cytiva). Cell supernatant was collected by filtration through a Clarisolve 20MS depth filter followed by a Millistak + F0HC filter (Millipore Sigma) at 60 LMH. Tangential Flow Filtration was used to concentrate and buffer exchange clarified supernatant in 20mM MES, 25mM NaCl, pH 6.5. The trimer was then purified by ion exchange chromatography as described48. Fractions containing the BG505 DS-SOSIP protein were pooled, sterile-filtered, snap-frozen, and stored at 80C.

Neutralization assays with IgGs against the 12-strain global virus panel, were performed in 96-well plates as previously described44,76,77. Briefly, 293T-derived HIV-1 Env-pseudotyped virus stocks were generated by cotransfection of an Env expression plasmid and a pSG3Env backbone. Animal sera were heat-inactivated at 56C for 1h and assessed at 8-point fourfold dilutions starting at 1:20 dilutions. Monoclonal antibodies were tested at 8-point fivefold dilutions starting at 50g/ml or 500g/ml. Virus stocks and antibodies (or sera) were mixed in a total volume of 50L and incubated at 37C for 1h. TZM-bl cells (20l, 0.5 million/ml) were then added to the mixture and incubated at 37C. Cells were fed with 130L cDMEM on day 2, lysed, and assessed for luciferase activity (RLU) on day 3. A nonlinear regression curve was fitted using the 5-parameter hill slope equation. The 50% and 80% inhibitory dilutions (ID50 and ID80) were determined for sera and the 50% and 80% inhibitory concentrations (IC50 and IC80) were determined for mAbs. All samples were tested in duplicates.

The biolayer interferometry experiments using SOSIP were performed as follows. All experiments were performed in reaction buffer (TBS pH 7.4+0.01% (w/v) BSA+0.002% (v/v) Tween 20) at 30C using an Octet K2 instrument (ForteBio). Protein A (Fortebio) biosensor probes were first equilibrated in reaction buffer for 600s. IgGs were diluted to 5g/ml in reaction buffer and immobilized onto the protein A probes for 300s, followed by a wash for 300s in reaction buffer. The binding of SOSIP trimers to the IgGs was then measured at various concentrations for 500s, followed by dissociation for 300s in reaction buffer. Analysis was performed using the Octet software with bivalent analyte fitting for antibody binding and 1.1 analyte fitting for the interaction with Fabs. Association and dissociation curves are visualized by GraphPad Prism version 9.0.

The samples were adsorbed to a glow-discharged carbon-coated copper grid 400 mesh (EMS, Hatfield, PA, USA), washed with deionized water, and stained with a 1% uranyl acetate solution for 20s. Observations were made using an F20 electron microscope (Thermo Fisher, Hillsboro, USA) operated at 200kV73. Digital images were collected using a direct detector camera Falcon III (Thermo Fisher, Hillsboro, USA) 40984098 pixels. Automatic data collection was performed using the EPU software (Thermo Fisher, Hillsboro, USA) at a nominal magnification of x62,000, corresponding to a pixel size of 1.65 using a defocus range from 1m to 2.5m. Image preprocessing, two-dimensional classification, and three-dimensional processing was done using the CryoSPARC software (Version 4.4)78.

BG505 DS-SOSIP trimers complexes were prepared using a stock solution of 5mg/ml trimer incubated with a threefold molar excess of bNAb4251 for 10min. To prevent aggregation and interaction of the trimer complexes with the air-water interface during vitrification, the samples were incubated in 25mM Tris pH 7.4, 150mM NaCl, 0.085mM DDM. Samples were applied to plasma-cleaned QUANTIFOIL holey carbon grids (EMS, R2/2 Cu 300 mesh). The grid was plunge frozen using a Vitrobot MarkIV (Thermo Fisher, Hillsboro, USA)with humidity and temperature control.

Grids were screened for particle presence and ice quality on a TFS Glacios microscope (200kV), and the best grids were transferred to a TFS Titan Krios G4. Cryo-EM data were collected using a TFS Titan Krios G4 transmission electron microscope, equipped with a Cold-FEG on a Falcon IV detector in electron counting mode. Falcon IV gain references were collected just before data collection. Data were collected using TFS EPU v2.12.1 utilizing the aberration-free image shift protocol, recording four micrographs per ice hole. Movies were recorded at a magnification of 165,000, corresponding to the 0.73 pixel size at the specimen level, with defocus values ranging from 0.9 to 2.4m. Exposures were obtained with 39.89e2 total dose, resulting in an exposure time of ~2.75s per movie. In total, 15,163 micrographs in EER format were collected.

Data processing was performed with cryoSPARC (Version 4.4) including Motion correction and CTF determination78. Particle picking and extraction (extraction box size 350 pixels2) were carried out using cryoSPARC Version 4.478. Next, several rounds of reference-free 2D classification were performed to remove artifacts and selected particles were used for ab initio reconstruction and hetero-refinement. After hetero-refinement, 72497 particles contributed to an initial 3D reconstruction of 3.8 resolution (Fourier-shell coefficient (FSC) 0.143) with C1 symmetry. A model of a SOSIP trimer (PDB ID 4TVP)79 or AlphaFold2 (ColabFold implementation) models of the 4251 Fab were fitted into the cryo-EM maps with UCSF ChimeraX (Version 1.5). These docked models were extended and rebuilt manually with refinement using Coot (Version 0.9.8.8) and Phenix (Version 1.21)80,81. Figures were prepared in UCSF ChimeraX, and Pymol (Version 4.6)82. The numbering of Fab4251 is based on the Kabat numbering of immunoglobulin models83. Buried surface area measurements were calculated within ChimeraX and PISA84.

For all antigenic sites, paired bNAb sequences were collected from the CATNAP database32 as of January 1, 2022 as nucleotide and amino acid sequences. First, the 249 heavy-chain and 240 light-chain nucleotide sequences were annotated with Igblastn36. Sequences were then processed and analyzed using the Immcantation Framework (http://immcantation.org) with MakeDB.py from Change-O v1.2.0 (with the options --extended partial). Next, bNAbs were filtered by a dedicated Java script to keep only sequences with an annotated CDR3 and paired sequences (VH+VK/L). Each paired antibody was associated with its targeting Env antigenic site, information provided by the database CATNAP text file (abs.txt as of January 1, 2022). The 27 CATNAP antibodies with only the protein sequences available were annotated with IgBlastp followed by MakeDB.py from Change-O v1.2.0 (with the options igblast-aa --extended). In parallel, using the fasta protein sequences, ANARCI85 was used to identify the junction region. As for nucleotide sequences, paired and annotated CDR3 bNAbs were filtered in. In total, 255 bNAbs sequences were collected. Repartition of the antigenic site is as follows: 54 bNAbs target the CD4bs, 21 MPER, 98 V1V2, 56 V3, and 26 interface.

For the training and evaluation of the machine-learning models, paired BCR repertoires of ten healthy donors were collected. The repertoires were obtained from various sources (Supplementary Data Files1) and sequenced using 10X genomics technology. Annotation and processing of the sequences were done as previously described39 and resulted in the generation of a customized AIRR format table containing 14,962 paired BCRs. For HIV-1 immune donors three different sequencing technologies were employed: 10X genomics (D1, D2, G3, and G4), Singleron (S4), and BD Rhapsody (B3). Single-cell sequencing of selected HIV-1 immune donors using Singleron technology was processed using celescope v1.14.1 (https://github.com/singleron-RD/CeleScope) with flv_CR mode utilizing cellranger v7.0.1. BD rhapsody single-cell sequencing was first processed using BD Rhapsody Targeted mRNA Analysis Pipeline (version 1.11) and then, using a custom script, the generated VDJ_Dominant_Contigs.csv file was converted into cellranger-like output files, namely filtered_contig_annotations.csv and filtered_contig.fasta. Lastly, the 10X Genomics single-cell sequencing was processed with cellranger v7.0.1. The cellranger output files of the different HIV-1 repertoires enabled us to annotate and process them as described earlier, resulting in a table of paired BCRs with AIRR characteristics. The six different experiments resulted in 2152 BCRs for D1, 6195 BCRs for D2, 4008 BCRs for B3, 3794 BCRs for G3, 3112 BCRs for S4, and 4799 BCRs for G4.

All mAbs and bNAbs VDJ protein sequences were initially aligned using ANARCI with IMGT format. Subsequently, employing a custom R script, two similarity matrices were generated: one encompassing the entire VDJ sequence (VH) and the other focusing solely on the CDRH3 region. For each pair of sequences, a Levenshtein distance was computed, yielding a similarity score ranging from 0 to 1 (higher score representing lower Levenshtein distance). Heatmaps were constructed with the pheatmap R package, to visualize the following comparisons: all five antigenic site categories of bNAbs and the comparison bNAbs versus mAbs (mAbs sequences were downsampled to 500 sequences). Sequences were ranked based on their V and J genes.

Using a custom script, AIRR characteristics were converted into our features of interest. The mutation frequency was calculated using the difference of residues between the protein sequence of the BCR and its germline sequence in the FWR1+CDR1+FWR2+CDR2+FWR3 regions (VH gene). The framework mutation frequency was calculated similarly but using only FWR1+FWR2+FWR3. The hydrophobicity of the CDRH3 sequences was computed using a customized score, with aromatic residues having the highest value (1 for W, 0.75 for Y, and 0.5 for F). Residues A, L, I, M, P, and V were set to 0.1, while the rest of the resides were set to zero. The values of all residues were summed up for each CDRH3. In addition, the length of the CDRH3, CDRL3, VH, and VL/K genes were considered as features. Two extra features were added to be used by the anomaly detection algorithm: VH1+CDRL3 length of five residues with a zero or one value designed for the bNAbs targeting the CD4bs and VH1-69+VK3-20+GW motif in the CDRH3 with a zero or one value for the bNAbs targeting MPER.

Three ML-based approaches were trained on the features table generated using BCRs obtained from healthy donors and bNAbs datasets, using Python v3.8.16 and scikit-learn v1.0.2. These algorithms were: Anomaly Detection (AD), Decision Tree (DT), and Random Forest (RF). For each antigenic site, the dataset was partitioned into training, validation, and test sets with a 60:20:20 ratio, setting random.seed to 1 for all models. For the AD model, bNAbs data were removed from the training set, since this algorithm only trains with non-anomaly data. For this model, the features with discrete values were first normalized using the preprocessing.normalize method (axis=0) from the scikit-learn library. Features exhibiting significantly different values from the normal distribution were selected for each antigenic site, which included the frequency of mutations in the V genes and in the frameworks. For CD4bs, we added the combined feature VH1+CDR3L with a length of five residues. For MPER, we included the combined feature VH1-69, VK3-20, and the GW motif in CDRH3. In addition, CDRH3 hydrophobicity was added for MPER, V1V2, and V3. Lastly, CDRH3 length was incorporated for V1V2 and V3. Using the validation test, a multivariate normal random variable was calculated with the mutivariate_normal function from the scipy package v1.8.0 and used for setting the optimal Epsilon parameter () minimizing the false positive numbers. The Epsilon value was set to 619.55 for CD4bs, 231501.41 for MPER, 866803.64 for V1V2, 845445.99 for V3, and 24.36 for interface. Those threshold values were used on the test set to predict a BCR as an anomaly (bNAb) or not. For DT and RF models, V genes (for heavy and light chains) were one-hot encoded as a preprocessing step, resulting in a total of 122 features in the features table. Hyperparameter tuning was conducted using the validation dataset, minimizing the number of false positives. DT models were trained with a balanced class weight, the Entropy criterion for measuring the quality of splits, and the cost complexity pruning parameter alpha of zero. RF models were trained with 100 estimators, a balanced class weight, the Entropy criterion for measuring the quality of splits, maximum samples were set to 1.0, maximum depth of tree of none, maximum features of 11 (122), and bootstrapping to build trees. Matplot library v3.6.2 was used to generate ROC plots from performance results and to generate the Venn diagrams showing the intersection of the number of true positives or false positives between the three models. The Super Learner Ensembles algorithm was implemented using the ML-Ensemble (mlens) v0.2.3 library. For each antigenic site, the dataset was partitioned into train and test sets with a 75:25 ratio. The Super Learner was created with the precision score as scorer parameter, a k-fold cross-validation of tenfolds, and the option shuffle set to true. The following classifiers were used as based models in the Super Learner algorithm: DecisionTreeClassifier, SVC (Support Vector Classification), KNeighborsClassifier, AdaBoostClassifier, BaggingClassifier, RandomForestClassifier, and ExtraTreesClassifier. A LogisticRegression was used as the meta-model, with the solver parameter set to lbfgs.

Flow cytometric data were acquired using BD FACSDiva (v.9.0) software. Flow cytometric data were analyzed using FlowJo (v.10.7.1). Statistics were conducted using R Statistical Software (v4.2.1) and ggstatsplot package86. The Complex Heatmap package was used for visualization87. No statistical methods were used to predetermine the sample size. The experiments were not randomized, and investigators were not blinded to allocation during experiments and outcome assessment.

Further information on research design is available in theNature Portfolio Reporting Summary linked to this article.

View post:
RAIN: machine learning-based identification for HIV-1 bNAbs - Nature.com

Related Posts

Comments are closed.