Workflow of ML-guided synthesis of CQDs    
    Synthesis parameters have great impacts on the target    properties of resulting samples. However, it is intricate to    tune various parameters for optimizing multiple desired    properties simultaneously. Our ML-integrated MOO strategy    tackles this challenge by learning the complex correlations    between hydrothermal/solvothermal synthesis parameters and two    target properties of CQDs in a unified MOO formulation, thus    recommending optimal conditions that enhance both properties    simultaneously. The overall workflow for the ML-guided    synthesis of CQDs is shown in Fig.1 and Supplementary    Fig.1. The workflow    primarily consists of four key components: database    construction, multi-objective optimization formulation, MOO    recommendation, and experimental verification.  
            It consists of four key components: database            construction, multi-objective optimization (MOO)            formulation, MOO recommendation, and experimental            verification.          
    Using a representative and comprehensive synthesis descriptor    set is of vital importance in achieving the optimization of    synthesis conditions36. We carefully    selected eight descriptors to comprehensively represent the    hydrothermal system, one of the most common methods to prepare    CQDs. The descriptor list includes reaction temperature (T),    reaction time (t), type of catalyst (C), volume/mass of    catalyst (VC), type of solution (S), volume of    solution (VS), ramp rate (Rr), and mass    of precursor (Mp). To minimize human intervention,    the bounds of synthesis parameters are determined primarily by    the constraints of the synthesis methods and equipment used,    instead of expert intuition. For instance, in employing    hydrothermal/solvothermal method to prepare CQDs, as the    reactor inner pot is made of polytetrafluoroethylene material,    the usage temperature should be  220oC. Moreover,    the capacity of the reactor inner pot used in the experiment is    25mL, with general guidance of not exceeding 2/3 of this    volume for reactions. Therefore, in this study, the main    considerations of experimental design are to ensure    experimental safety and accommodate the limitations of    equipment. These practical considerations naturally led to a    vast parameter space, estimated at 20 million possible    combinations, as detailed in Supplementary    Table1. Briefly, the    2,7-naphthalenediol molecule along with catalysts such as    H2SO4, HAc, ethylenediamine (EDA) and    urea, were adopted in constructing the carbon skeleton of CQDs    during the hydrothermal or solvothermal reaction process    (Supplementary Fig.2). Different    reagents (including deionized water, ethanol,    N,N-dimethylformamide (DMF), toluene, and formamide) were used    to introduce different functional groups into the architectures    of CQDs, combined with other synthesis parameters, resulting in    tunable PL emission. To establish the initial training dataset,    we collected 23 CQDs synthesized under different randomly    selected parameters. Each data sample is labelled with    experimentally verified PL wavelength and PLQY (see Methods).  
    To account for the varying importance of multiple desired    properties, an effective strategy is needed to quantitatively    evaluate candidate synthesis conditions in a unified manner. A    MOO strategy has thus been developed that prioritizes    full-color PL wavelength over PLQY enhancement, by assigning an    additional reward when maximum PLQY of a color surpassing the    predefined threshold for the first time. Given (N) explored experimental conditions,    {(({x}_{i},,{y}_{i}^{c},,{y}_{i}^{gamma }{|;    i}=(1,2,ldots,N))}, ({x}_{i}) indicates the (i)-th synthesis condition defined by 8    synthesis parameters, ({y}_{i}^{c}) and ({y}_{i}^{gamma }) indicate the    corresponding color label and yield (i.e., PLQY) given    ({x}_{i}); ({y}_{i}^{c}in    left{{c}_{1},,,{c}_{2},ldots,{c}_{M}right}) for    (M) possible colors,    ({y}_{i}^{gamma }in    left[0,,1right]). The unified objective function is    formulated as the sum of maximum PLQY for each color label,    i.e.,  
      $$mathop{sum}nolimits_{{c}_{j}}{Y}_{{c}_{j}}^{max      },$$    
      (1)    
    where (jin    left{1,,2,,ldots,,Mright}) and ({Y}_{{c}_{j}}^{max }) is 0 if    (nexists    {y}_{i}^{c}={c}_{j}); otherwise  
      $${Y}_{{c}_{j}}^{max }={max      }_{i}left[Big({y}_{i}^{gamma }+R{{cdot      }}{mathbb{1}}left({y}_{i}^{gamma }ge alpha      right)Big){{cdot      }}{mathbb{1}}left({y}_{i}^{c}={c}_{j}right)right].$$    
      (2)    
    ({mathbb{1}}({{cdot }}))    is an indicator function that output 1 if true, otherwise    outputs 0. The term (Rcdot    {mathbb{1}}({y}_{i}^{gamma }ge alpha )) enforces a    higher priority of full-color synthesis, where PLQY for each    color shall be at least (alpha) ((alpha=0.5) in our case) to have an    additional reward of (R)    ((R=10) in our settings).    (R) can be any real value    larger than 1 (i.e., maximum possible improvement of PLQY for    one synthesis condition), to ensure the higher priority of    exploring synthesis conditions for colors in which yield has    not achieved (alpha). We    set (R) to 10, such that the    tens digit of unified objective functions value clearly    indicates the number of colors with maximum PLQYs exceeding    (alpha), and the units    digit reflects the sum of maximum PLQYs (without the additional    reward) for all colors. As defined by the ranges of PL    wavelength in Supplementary Table2, seven primary    colors considered in this work are purple (<420nm), blue    (420 and <460nm), cyan (460 and <490nm), green (490    and <520nm), yellow (520 and <550nm), orange (550 and    <610nm), and red (610nm), i.e., (M=7). Notably, the proposed MOO    formulation unifies the two goals of achieving full color and    high PLQY into a single objective function, providing a    systematical approach to tune synthesis parameters for the    desired properties.  
    The MOO strategy is premised on the prediction results of ML    models. Due to the high-dimensional search space and limited    experimental data, it is challenging to build models that    generalize well on unseen data, especially considering the    nonlinear nature of the condition-property    relationship37. To address this    issue, we employed a gradient boosting decision tree-based    model (XGBoost), which has proven advantageous in handling    related material datasets (see Methods and Supplementary    Fig.3)30,38. In addition, its    capability to guide hydrothermal synthesis has been proven in    our previous work (Supplementary Fig.4)21. Two regression    models, optimized with the best hyperparameters through grid    search, were fitted on the given dataset, one for PL wavelength    and the other for PLQY. These models were then deployed to    predict all unexplored candidate synthesis conditions. The    search space for candidate conditions is defined by the    Cartesian product of all possible values of eight synthesis    parameters, resulting in ~20 million possible combinations (see    Supplementary Table1). The candidate    synthesis conditions, i.e., unexplored regions of the search    space, are further ranked by MOO evaluation strategy with the    prediction results.  
    Finally, the PL wavelength and PLQY values of the CQDs    synthesized under the top two recommended synthesis conditions    are verified through experiments and characterization, whose    results are then augmented to the training dataset for the next    iteration of the MOO design loop. The iterative design loops    continue until the objectives are fulfilled, i.e., when the    achieved PLQY for all seven colors surpasses 50%. In prior    studies on CQDs, its worth noting that only a limited number    of CQDs with short-wavelength fluorescence (e.g., blue and    green), have reached PLQYs above 50%39,40,41. On the other    hand, their long-wavelength counterparts, particularly those    with orange and red fluorescence, usually demonstrate PLQYs    under 20%42,43,44. Underlining the    efficacy of our ML-powered MOO strategy, we have set an    ambitious goal for all fluorescent CQDs: the attainment of    PLQYs exceeding 50%. The capacity to modulate the PL emission    of CQDs holds significant promise for various applications,    spanning from bioimaging and sensing to optoelectronics. Our    four-stage workflow is crafted to forge an ML-integrated MOO    strategy that can iteratively guide hydrothermal synthesis of    CQDs for multiple desired properties, while also constantly    improving the models prediction performance.  
    To assess the effectiveness of our ML-driven MOO strategy in    the hydrothermal synthesis of CQDs, we employed several    metrics, which were specifically chosen to ascertain whether    our proposed approach not only meets its dual objectives but    also enhances prediction accuracy throughout the iterative    process. The unified objective function described above    measures how well the two desired objectives have been realized    experimentally, and thus can be a quantitative indicator of the    effectiveness of our proposed approach in instructing the CQD    synthesis. The evaluation output of the unified objective    function after a specific ML-guided synthesis loop is termed as    objective utility value. The MOO strategy improves the    objective utility value by a large margin of 39.27% to 75.44,    denoting that the maximum PLQY in all seven colors exceeds the    target of 0.5 (Fig.2a). Specifically, at    iterations 7 and 19, the number of color labels with maximum    PLQY exceeding 50% increases by one, resulting in an additional    reward of 10 each time. Even on the seemingly plateau, the two    insets illustrate that the maximally achieved PLQY is    continuously enhanced. For instance, during iterations 8 to 11,    the maximum PLQY for cyan emission escalates from 59% to 94%,    and the maximum PLQY for purple emission rises from 52% to 71%.    Impressively, our MOO approach successfully fulfilled both    objectives within only 20 iterations (i.e., 40 guided    experiments).  
            a MOOs unified objective utility versus design            iterations. b Color explored with new            synthesized experimental conditions. Value ranges of            colors defined by PL wavelength: purple            (PL<420nm), blue (420nm  PL<460nm), cyan            (460nm  PL<490nm), green (490nm             PL<520nm), yellow (520nm  PL<550nm),            orange (550nm  PL<610nm), and red (610nm             PL). It shows that while high PLQY has been achieved            for red, orange, and blue in the initial dataset,            the MOO strategy purposefully enhances PLQYs for            yellow, purple, cyan, green respectively in            subsequent synthesized conditions in a group of five.            c MSE between the predicted and real target            properties. d Covariance matrix for correlation            among the 8 synthesis parameters (i.e., reaction            temperature T, reaction time t, type of catalyst C,            volume/mass of catalyst VC, type of solution            S, volume of solution VS, ramp rate            Rr, and mass of precursor Mp) and            2 target properties, i.e., PLQY and PL wavelength (PL            ). e Two-dimensional t-distributed stochastic            neighbor embedding (t-SNE) plot for the whole search            space, including unexplored (circular points), training            (star-shaped points), and explored (square points)            conditions, where the latter two sets are colored by            real PL wavelengths.          
    Figure2b reveals that the MOO    strategy systematically explores the synthesis conditions for    each color, addressing those that have not yet achieved the    designed PLQY threshold, starting with yellow in the first 5    iterations and ending with green in the last 5 iterations.    Notably, within each quintet of 5 iterations, a singular color    demonstrates an enhancement in its maximum PLQY. Initially, the    PLQY for yellow surges to 65%, which is then followed by a    significant rise in purples maximum PLQY from 44% to 71%    during the next set of 5 iterations. This trend continues with    cyan and green, where the maximum PLQY escalates to 94% and 83%    respectively. Taking into account both the training set (i.e.,    the first 23 samples) and the augmented dataset, the peak PLQY    for all colors exceeds 60%. Several colors approach 70%    (including purple, blue, and red), and some are near 100%    (including cyan, green, and orange). This further underscores    the effectiveness of our proposed ML technique. A more detailed    visualization of the PL wavelength and PLQY along each    iteration is provided in Supplementary Fig.5.  
    The MOO strategy ranks candidate synthesis conditions based on    ML prediction; thus, it is vital to evaluate the ML models    performance. Mean squared error (MSE) is employed as the    evaluation metric, commonly used for regression, which is    computed based on the predicted PL wavelength and PLQY from the    ML models versus the experimentally determined    values45. As shown in    Fig.2c, the MSE of PLQY    drastically decreases from 0.45 to approximately 0.15 within    just four iterations  a notable error reduction of 64.5%. The    MSE eventually stabilizes around 0.1 as the iterative loops    progress. Meanwhile, the MSE of PL wavelength remains    consistently low, always under 0.1. MSE of PL wavelength is    computed after normalizing all values to the range of zero to    one for a fair comparison, thus MSE of 0.1 signifies a    favorable deviation within 10% between the ML-predicted values    and the experimental verifications. This indicates that the    accuracies of our ML models for both PL wavelength and PLQY    consistently improve, with predictions closely aligning with    actual values after enhanced learning from augmented data. This    not only demonstrates the efficacy of our MOO strategy in    optimizing multiple desired properties but also in refining ML    models.  
    To unveil the correlation between synthesis parameters and    target properties, we further calculated the covariance matrix.    As illustrated in Fig.2d, the eight synthesis    parameters generally exhibit low correlation among each other,    indicating that each parameter contributes unique and    complementary information for the optimization of the CQDs    synthesis conditions. In terms of the impact of these synthesis    parameters on target properties, factors such as reaction time    and temperature are found to influence both PL wavelength and    PLQY. This underscores the importance for both experimentalists    and data-driven methods to adjust them with higher precision.    Besides reaction time and temperature, PL wavelength and PLQY    are determined by distinct sets of synthesis parameters with    varying relations. For instance, the type of solution affects    PLQY with a negative correlation, while solution volume has a    stronger positive correlation with PLQY. This reiterates that,    given the high-dimensional search space, the complex interplay    between synthesis parameters and multiple target properties can    hardly be unfolded without capable ML-integrated methods.  
    To visualize how the MOO strategy has navigated in the    expansive search space (~20 million) using only 63 data    samples, we have compressed the initial training, explored, and    unexplored space into two dimensions by projecting them into a    new reduced embedding space using t-distributed stochastic    neighbor embedding (t-SNE)46. As shown in    Fig.2e, discerning distinct    clustering patterns by color proves challenging, which    emphasizes the intricate task of uncovering the relationship    between synthesis conditions and target properties. This    complexity further underscores the critical role of a ML-driven    approach in deciphering the hidden intricacies within the data.    The efficacy of ML models is premised on the quality of    training data. Thus, selecting training data that span as large    search space as possible is particularly advantageous to    models generalizability37. As observed in    Fig.2e, our developed ML    models benefit from the randomly and sparsely distributed    training data, which in turn encourage the models to further    generalize to previously unseen areas in the search space, and    effectively guide the searching of optimal synthesis conditions    within this intricate multi-objective optimization landscape.  
    With the aid of ML-coupled MOO strategy, we have successfully    and rapidly identified the optimal conditions giving rise to    full-color CQDs with high PLQY. The ML-recommended synthesis    conditions that produced the highest PLQY of each color are    detailed in the Methods section. Ten CQDs with the best optical    performance were selected for in-depth spectral investigation.    The resulting absorption spectra of the CQDs manifest strong    excitonic absorption bands, and the normalized PL spectra of    the CQDs displayed PL peaks ranging from 410nm of purple CQDs    (p-CQDs) to 645nm of red CQDs (r-CQDs), as shown in    Fig.3a and Supplementary    Fig.6. This encompasses a    diverse array of CQD types, including p-CQDs, blue CQDs    (b-CQDs, 420nm), cyan CQDs (c-CQDs, 470nm), darkcyan CQDs    (dc-CQDs, 485nm), green CQDs (g-CQDs, 490nm), yellow-green    CQDs (yg-CQDs, 530nm), yellow CQDs (y-CQDs, 540nm), orange    CQDs (o-CQDs, 575nm), orange red CQDs (or-CQDs, 605nm), and    r-CQDs. Importantly, PLQY of most of these CQDs were above 60%    (Supplementary Table3), exceeding the    majority of CQDs reported to date (Supplementary    Table4). Corresponding    photographs of full-color fluorescence ranging from purple to    red light under UV light irradiation are provided in    Fig.3b. Excellent    excitation-independent behaviors of the CQDs have been further    revealed by the three-dimensional fluorescence spectra    (Supplementary Fig.7). Furthermore, a    comprehensive investigation of the time-resolved PL spectra    revealed a notable trend. The monoexponential lifetimes of CQDs    progressively decreased from 8.6ns (p-CQDs) to 2.3ns (r-CQDs)    (Supplementary Fig.8). This observation    signified that the lifetimes of CQDs diminished as their PL    wavelength experiences a shift towards the red end of the    spectrum47. Moreover, the    CQDs also demonstrate long-term photostability (>12hours),    rendering them potential candidates for applications in    optoelectronic devices that require stable performance over    extended periods of time (Supplementary    Fig.9). All the results    together demonstrate the high quality and great potential of    our synthesized CQDs.  
            a Normalized PL spectra of CQDs. b            Photographs of CQDs under 365 nm-UV light irradiation.            c Dependence of the HOMO and LUMO energy levels            of CQDs.          
    To gain further insights into the properties of the synthesized    CQDs, we calculated their bandgap energies using the    experimentally obtained absorption band values (Supplementary    Fig.10 and    Table5). It is revealed    that the calculated bandgap energies gradually decrease from    3.02 to 1.91eV from p-CQDs to r-CQDs. In addition, we measured    the highest occupied molecular orbital (HOMO) energy levels of    the CQDs using ultraviolet photoelectron spectroscopy. As shown    in the energy diagram in Fig.3c, the HOMO values    exhibit wave-like variations without any discernible pattern.    This result further suggests the robust predictive and    optimizing capability of our ML-integrated MOO strategy, which    enabled the successful screening of these high-quality CQDs    from vast and complex search space using only 40 sets of    experiments.  
    To uncover the underlying mechanism of the tuneable optical    effect of the synthesized CQDs, we have carried out a series of    characterizations to comprehensively investigate their    morphologies and structures (see Methods). X-ray diffraction    (XRD) patterns with a single graphite peak at 26.5 indicate a    high-degree graphitization in all CQDs (Supplementary    Fig.11)15. Raman spectra    exhibit a stronger signal intensity for the ordered G    band at 1585cm1 compared to the disordered    D band at 1397cm1, further confirming the    high-degree graphitization (Supplementary    Fig.12)48.    Fourier-transform infrared (FT-IR) spectroscopy was then    performed to detect the functional groups in CQDs, which    clearly reveals the NH2 and NC stretching at 3234    and 1457cm1, respectively, indicating the presence    of abundant NH2 groups on the surface of CQDs,    except for orange CQDs (o-CQDs) and yellow CQDs (y-CQDs)    (Supplementary Fig.13)49. The C=C aromatic    ring stretching at 1510cm1 confirms the carbon    skeleton, while three oxide-related peaks, i.e., OH, C=O, and    CO stretching, were observed at 3480, 1580, and    1240cm1, respectively, due to abundant hydroxyl    groups of the precursor. The FT-IR spectrum also shows a    stretching vibration band SO3 at    1025cm1, confirming the additional    functionalization of y-CQDs by SO3H groups.  
    X-ray photoelectron spectroscopy (XPS) was adopted to further    probe the functional groups in CQDs (Supplementary    Fig.14 to 23). XPS survey    spectra analysis reveals three main elements in CQDs, i.e., C,    O, and N, except o-CQDs and y-CQDs. Specifically, o-CQDs and    y-CQDs lack the N element and y-CQDs contains S element. The    high-resolution C1s spectrum of CQDs can be deconvoluted into    three peaks, including a dominant CC/C=C graphitic carbon bond    (284.8eV), CO/CN (286eV), and carboxylic C=O (288eV),    revealing the structures of CQDs. The N1s peak at 399.7eV    indicates the presence of NC bonds, verifying the successful    N-doping in the basal plane network structure of CQDs, except    o-CQDs and y-CQDs. The separated peaks of O1s at 531.5 and    533eV indicate the two forms of oxyhydrogen functional groups    with C=O and CO, respectively, consistent with the FT-IR    spectra50. The S2p band of    y-CQDs can be decomposed into two peaks at 163.5 and 167.4eV,    representing SO3/2P3/2 and    SO3/2P1/2,    respectively47,51. Combining the    results of structure characterization, the excellent    fluorescence properties of the CQDs are attributed to the    presence of N-doping, which reduces non-radiative sites of CQDs    and promotes the formation of C=O bonds. The C=O bonds play a    crucial role in radiation recombination and can increase the    PLQY of the CQDs.  
    To gain deeper insights into the morphology and microstructures    of the CQDs, we have then conducted transmission electron    microscopy (TEM). The TEM images demonstrate uniformly shaped    and monodisperse nanodots, with the gradual increase of average    lateral sizes ranging from 1.85nm for p-CQDs to 2.3nm for    r-CQDs (Fig.4a and Supplementary    Fig.24), which agrees    with the corresponding PL wavelength, providing further    evidence for the quantum size effect of CQDs    (Fig.4a)47. High-resolution    TEM images further reveal the highly crystalline structures of    CQDs with well-resolved lattice fringes    (Fig.4b-c). The measured    crystal plane spacing of 0.21nm corresponds to the (100)    graphite plane, further corroborating the XRD data. Our    analysis suggests that the synthesized CQDs possess a    graphene-like high-crystallinity characteristic, thereby giving    rise to their superior fluorescence performance.  
            a The lateral size and color of full-color            fluorescent CQDs (inset: dependence of the PL            wavelength and the lateral size of full-color            fluorescent CQDs). Data correspond to meanstandard            deviation, n=3. b, c            High-resolution TEM images and the fast Fourier            transform patterns of p-, b-, c-, g-, y-, o- and            r-CQDs, respectively. d Boxplots of PL            wavelength (left)/PLQY (right) and 7 synthesis            parameters of CQDs. VC is excluded here as            its value range is dependent on C, whose relationships            with other parameters are not directly interpretable.            The labels at the bottom indicate the minimum value            (inclusive) for the respective bins, whereas the bins            on the left are the same as the discretization of            colors in Supplementary Table2, the bins            on the right are uniform. Each box spans vertically            from the 25th percentile to the 75th percentile, with            the horizontal line marking the median and the triangle            indicating the mean values. The upper and lower            whiskers extend from the ends of the box to the minimum            and maximum data values.          
    Following the effective utilization of ML in thoroughly    exploring the entire search space, we proceeded to conduct a    systematic examination of 63 samples using box plots, aiming to    elucidate the complex interplay between various synthesis    parameters and the resultant optical properties of CQDs. As    depicted in Fig.4d, the synthesis under    conditions of high reaction temperature, prolonged reaction    time, and low-polarity solvents, tends to result in CQDs with a    larger PL wavelength. These findings are consistent with the    general observations in the literature, which suggest that the    parameters identified above can enhance precursor molecular    fusion and nucleation growth, thereby yielding CQDs with    increased particle size and high PL    wavelength47,52,53,54,55. Moreover, a    comprehensive survey of existing literature implies that    precursors and catalysts, typically including electron donation    and acceptance, aid in producing long-wavelength    CQDs56,57. Interestingly,    diverging from traditional findings, we successfully    synthesized long-wavelength red CQDs under ML guidance, with    2,7-naphthalenediol containing electron-donating groups as the    precursor and EDA is known for its electron-donating    functionalities as the catalyst. This significant breakthrough    questions existing assumptions and offers new insights into the    design of long-wavelength CQDs.  
    Concerning PLQY, we found that catalysts with stronger    electron-donating groups (e.g., EDA) led to enhanced PLQY in    CQDs, consistent with earlier observations made by our research    team16. Remarkably, we    uncovered the significant impact of synthesis parameters on    CQDs PLQY. In the high PLQY regime, strong positive    correlations were discovered between PLQY and reaction    temperature, reaction time, and solvent polarity, previously    unreported in the literature58,59,60,61. This insight    could be applied to similar systems for PLQY improvement.  
    Aside from the parameters discussed above, other factors such    as ramp rate, the amount of precursor, and solvent volume also    influence the properties of CQDs. Overall, the emission color    and PLQY of CQDs are governed by complex, non-linear trends    resulting from the interaction of numerous factors. Its    noteworthy to mention that the traditional methods used to    adjust CQDs properties often result in a decrease in PLQY as    the PL wavelength redshifts4,47,51,54. However,    utilizing AI-assisted synthesis, we have successfully increased    the PLQY of the resulting full-color CQDs to over 60%. This    significant achievement highlights the unique advantages    offered by ML-guided CQDs synthesis and confirms the powerful    potential of ML-based methods in effectively navigating the    complex relationships among diverse synthesis parameters and    multiple target properties within a high-dimensional search    space.  
Read the rest here:
Machine learning-guided realization of full-color high-quantum-yield carbon quantum dots - Nature.com
Read More..