The gut microbiome-metabolome dataset collection: a curated resource for integrative meta-analysis | npj Biofilms and Microbiomes – Nature.com

The data resource includes curated and unified data tables from 14 different human gut (feces) microbiome-metabolome published studies from recent years (Table 1, Supplementary Table 1)8,9,10,13,14,15,16,17,18,19,20,21,22,23. Figure 1a highlights the main data sources and key processing steps. For each study we provide 4 processed tables: A genus-level abundance table, a metabolite abundance table, a metabolite identifiers mapping table, and a sample metadata table including sample- and subject-characteristics (Fig. 1b). For studies with shotgun metagenomics we also provided species-level abundance tables. Importantly, microbiome profiles were obtained through processing of raw metagenomics sequencing data, while for metabolite profiles we obtained already processed tables due to the substantial differences between metabolomics instruments and approaches. Where possible, both taxa and metabolite identifiers have been unified, allowing comparison across studies (see Methods). The data for each study are provided both as simple text files (.tsv) and as R-data files (.RData), and are accessible via a public GitHub repository. We further provide detailed documentation and a usage example in a dedicated Wiki page and via script examples also available in the repository. New datasets could be added to the resource by Git pull requests, following the instructions provided in the Wiki section Adding new datasets. Overall, 2900 samples from 1849 individuals are currently included in the resource (Fig. 1c). Most of these studies are case-control studies, i.e. they include two study groups, one consisting of individuals with a specific medical condition, and another group of healthy control individuals (Table 1).

a A highlight of data resources and main processing steps of the curated microbiome-metabolome data resource (see Methods); b A database scheme of the final data products per dataset. Each box describes a specific table and its content and primary key (PK) field. The species table is only available for studies with shotgun metagenomic data; c Data resource summary statistics; d Genera prevalence across datasets. Each bar represents the number of unique genera that appear in at least the specified number of datasets; e Metabolite prevalence across datasets, interpretation equivalent to (d).

The described resource, which includes hundreds of unique metabolites and thousands of unique genera that appear in multiple independent datasets (Fig. 1d, e), could be used for different types of meta-analyses or cross-study comparisons involving paired microbiome and metabolome data across health and disease. We specifically identify 3 main categories of analysis use cases, facilitated by this resource: First, this resource can be used for meta-analysis efforts where associations of different types are compared across some or all datasets, aiming to identify robust and consistent signals. Such associations could be identified via a wide range of statistical methods, univariate or multivariate approaches, and using a wide range of features, e.g. taxa at different ranks, microbiome diversity metrics, sample or subject characteristics, metabolite features, etc. Two examples of such meta-analysis efforts are further described below. Second, this resource can be used to benchmark methods related to the joint analysis of microbiome and metabolome data. For example, machine learning methods for predicting metabolite levels based on taxonomic features have been recently proposed but validated on only a very small set of datasets24,25. Third, researchers analyzing new microbiome-metabolome datasets can use this resource to add support for findings on their own data, using specific datasets from the resource that resemble their own cohort (studies on the same disease, for example, or using an identical metabolomics method).

Indeed, we recently demonstrated the utility of a similar dataset collection in a large-scale meta-analysis of the relationship between gut microbes and metabolites26. In this study we were interested in pinpointing metabolites that are robustly and universally predicted by the microbiotas composition in a healthy population across multiple studies. Using a combination of random forest regressor models (for predicting metabolites) and random-effects models (for quantifying robustness), we were able to identify 97 metabolites that were robustly well-predicted by the microbiotas composition. We additionally found that multiple microbiome-metabolite relationships are study-specific, implying that links based on a single study should be interpreted with caution and highlighting the importance of validating findings on additional data sources.

Here, as an additional use-case example, we present another meta-analysis of the microbiome-metabolome relationship, searching for specific genus-metabolite associations that are significant and consistent across multiple datasets (see Methods). For this analysis we included only the 11 non-infant cohorts from our resource, and analyzed a total of 29,708 unique genus-metabolite pairs that appeared in at least 3 different datasets. These pairs included 109 different GTDB genera and 314 metabolites. We used linear models to estimate the association between a specific genuss abundance and a specific metabolites level, while controlling for disease state (i.e. study group). Overall, 132,391 linear models were fitted, of which, 18,075 (13.6%) resulted in a significant genus-metabolite association (i.e. regression coefficient FDR 0.05). Comparing the associations direction and significance across datasets, we found multiple genus-metabolite pairs associated in some (and often, all) datasets, but interestingly also pairs with conflicting associations in different datasets (Fig. 2a). Notably, genus-metabolite correlations can clearly stem from a direct involvement of the genus in the production, consumption, or degradation of the metabolite, but also from indirect associations related, for example, to interactions between different gut bacteria, or co-abundant metabolites present in specific diets. We similarly emphasize that the analyzed metabolites can be either endogenous to the host, obtained through diet, microbially produced/transformed, or otherwise acquired from the environment. Finding associations across multiple datasets, as facilitated by our resource, potentially increases the likelihood that such associations are microbially driven and represent ubiquitous microbial metabolism, rather than specific host or diet-related associations.

a Associations between genera and metabolites were tested using linear models, in each dataset independently and controlling for study groups. The dot plot illustrates association results for the top 70 associated metabolites and the top 40 associated genera. Each dot represents a genus-metabolite pair, dot size represents the number of datasets in which the pair was analyzed, and dot colors represent the percent of datasets in which a significant association (positive or negative) was found (see also Methods). A question mark indicates conflicting results between 2 or more datasets, i.e. at least one significant negative association and at least one significant positive association. Metabolites (grid columns) are grouped by their metabolite classes, abbreviated as follows: Ben. Benzenoids, OS Other steroids, Cbxm. Carboximidic acids, COOH Carboxylic acids and derivatives, AA Amino acids, OO Other organic acids, ONC Organonitrogen compounds, CHO Carbohydrates and carbohydrate conjugates, OHC Organoheterocyclic compounds, PPA Phenylpropanoic acids. Genera (grid rows) are grouped by their order taxonomic rank, abbreviated as follows: Actin. Actinomycetales (Actinobacteriota phylum), Bacte. Bacteroidales (Bacteroidota phylum), Lachn. Lachnospirales (Firmicutes_A phylum), Oscil. Oscillospirales (Firmicutes_A phylum), Chris. Christensenellales (Firmicutes_A phylum), Veill. Veillonellales (Firmicutes_C phylum), Enter. Enterobacterales (Proteobacteria phylum), b A bipartite network of consistent genus-metabolite associations, identified by a meta-analysis of 11 different microbiome-metabolome datasets from the curated microbiome-metabolome data resource. Green nodes represent genera, with node sizes proportional to genus average relative abundance, and orange nodes represent metabolites. Edges between genus nodes and metabolite nodes represent a consistent positive (blue) or negative (red) association. Details about the network nodes and edges are available in Supplementary Table 4.

Moreover, to determine which genus-metabolite pairs are consistently associated in a more statistically rigorous manner, we conducted a random-effects meta-analysis using semi-partial correlations derived from the linear regression results (as suggested by Aloe and Becker, 201227). We identified 1101 consistent associations, including in total 104 genera and 195 metabolites (Fig. 2b, Supplementary Table 4; see Methods). Metabolite-associated genera were mostly from the Firmicutes_A phylum but included other phyla as well. Microbe-associated metabolites spanned multiple metabolite classes, with the organic nitrogen compounds super-class being enriched for microbially-associated metabolites (odds ratio 3.47 [1.3, ], FDR 0.08), and the organic acids and derivatives super-class being specifically enriched for Bacteroidota-associated metabolites (odds ratio 3.21 [2, ], FDR 0.0004; see Methods).

We additionally examined the bipartite network of consistently associated genera and metabolites, presented in Fig. 2b. A full list of network edges, alongside meta-analysis results, are provided in Supplementary Table 4. We identified several genera with a particularly high number of metabolite associations, including ER4 and Dysosmobacter (both of which were previously identified as Oscillibacter genus), Alistipes, and the recently re-classified Alistipes_A genus (Fig. 2b-I). Even though most of these genera have a relatively low abundance in the human gut (0.36%, 0.66%, 3.3% and 0.1%, respectively, averaged over all samples and datasets in the analysis), they are connected to the highest number of metabolites in the network (51, 44, 43 and 50, respectively). This observation may be explained by at least two potential hypotheses: (i) that these bacteria are highly metabolically active in the gut, and/or (ii) that they possess central ecological roles in the gut microbial ecosystem. The former hypothesis is supported, for example, by a recent study on the newly isolated human commensal Dysosmobacter welbionis, where administration of this species to mice was found to strongly influence host metabolism and counteract diet-induced obesity development, with only negligible impact on the overall microbiota composition28. Alistipes commensal species are also well-studied for their diverse metabolic functions in the gut29. Another recent study, however, supported the latter hypothesis when reporting that based on a gut microbiome analysis of a large Dutch cohort, several Alistipes, Alistipes_A, and unclassified Oscillibacter species were all identified as keystone species, predicted to have an important impact on the entire microbiome structure and function30. Lastly, we note that analogously to highly-associated genera, there are also a few metabolites that are associated with a high number of genera (over 30). This is perhaps not surprising as some metabolites are imported/exported by dozens of different species31, and may in turn be further associated with additional genera by indirect associations.

Another noteworthy highlight from this network is the consistent positive associations between butyrate, a short-chain-fatty-acid with beneficial effects on intestinal homeostasis, and several genera, including Faecalibacterium, Butyrivibrio (formerly classified as TF0111 genus), Roseburia, Eubacterium_I, Agathobacter, and Lachnospira (Fig. 2b-II; Supplementary Table 4). While the former 5 genera are all known butyrate-producers in the gut32,33,34, Lachnospira does not produce butyrate directly but has an indirect positive effect on other butyrate-producing taxa, upon pectin fermentation35. Interestingly, Flavonifractor is consistently negatively associated with butyrate in our network, albeit known to be a butyrate-producer36. This negative association may reflect an ecological interaction rather than a metabolic one, as Flavonifractor tends to have increased abundance in various host conditions that are also characterized by reduced abundances of major butyrate producers, including disease states, postantibiotic treatments, and during infancy30,36.

Future work on consistent genus-metabolite associations (out of the scope of the current study) could include genomic analyses to infer which associations likely stem from known production/consumption capabilities, which association signals are low due to significant species-level variation that masks genus-level findings, which associations break in disease states, and whether genera associated with multiple metabolites are also key ecological players in microbial interaction networks.

We note that this resource has several obvious limitations. One major limitation is the substantial difference between various metabolomics platforms and the impact of the used platform on the set of chemical classes that can be detected. Short-chain fatty acids, for example, which are known to be important microbial metabolites in the gut, are mostly detectable by gas chromatography-mass spectrometry and may be therefore missing in datasets using other metabolomics methods37. With that in mind, it is important to note that the number of datasets in which a metabolite appears should not be used as an indication of its prevalence. Similarly, differences between methods may result in different scales of metabolite values, and hence a direct comparison of metabolite values between studies should be avoided. Lastly, metabolite identification in untargeted metabolomic platforms may vary in its confidence level, which could in turn imply lower confidence of downstream analyses. To allow users of this resource to better address these issues, we provide detailed information about metabolomics methods and identification confidence levels for each dataset in Supplementary Table 3, and specifically mark metabolites with putative identifications (see Methods)38. On the microbiome side, differences between 16S amplicon sequencing and shotgun sequencing, as well as differences in sequencing depth and library preparations, may all effect the resolution and accuracy of the obtained microbiome profiles. We encourage users of this resource to carefully account for these limitations using appropriate analysis approaches (some of which were described above), and to apply caution when interpreting analysis results. Additional recommendations for how to best utilize the resource are available in the Wiki page. Overall, The Curated Gut Microbiome-Metabolome Data Resource can facilitate a wide and diverse range of integrated microbiome-metabolome analyses, promote the discovery of robust microbe-metabolite links, and allow researchers to easily place newly identified microbe-metabolite findings in the context of other published datasets.

Go here to see the original:

The gut microbiome-metabolome dataset collection: a curated resource for integrative meta-analysis | npj Biofilms and Microbiomes - Nature.com

Related Posts

Comments are closed.