The Conservatory Project
Advances in plant genomics and genome editing are offering unprecedented opportunities to modify gene function and crop trait variation to meet current and future challenges in agriculture. However, the pivotal question that remains is which genomic sequences should we edit to achieve predicable and desirable phenotypic outcomes?
The Conservatory Project (
1), spearheaded by the labs of Idan Efroni(Hebrew University), Zachary Lippman (Cold Spring Harbor Laboratory/HHMI) and Madelaine Bartlett (UMass Amherst), offers an algorithm and genomics-driven data resource designed to enrich for cis-regulatory sequences, which are predicted to control gene expression and phenotypes. Leveraging the explosion of high-quality genomic data, comprised of hundreds of plant reference genomes generated by the plant research community, our algorithm specializes in identifying Conserved Non-Coding Sequences (CNSs) through a gene-centric and gradual alignment approach that addresses challenges related to plant genome complexity and evolutionary history (e.g. whole-genome duplications, etc.). Using multiple genomes per species, we identify and date CNSs over evolutionary timescales. Our dataset spans 314 genomes from 284 species across 72 plant families, including eudicots, monocots, gymnosperms, and algae.
Explore the Conservatory Algorithm on GitHub -
https://github.com/idanefroni/Conservatory/tree/master
Conservatory Version 2.0 (beta)
The present beta version of Conservatory that we provide here employs ten distinct reference species genomes across six families (Arabidopsis thaliana, Arabis alpina, Brachypodium distachyon, Glycine max, Lemna minor, Lactuca saligna, Medicago truncatula, Physalis grisea, Solanum lycopersicum, and Zea mays). We uncovered 1.7 million CNSs that capture family-specific conservation as well as deeply conserved CNSs. The average CNS is 80 base pairs. Our analysis categorizes CNSs based on age and distribution, providing insights into their roles in vital biological processes like photosynthesis, polyamine metabolism, and cell cycle regulation.
Note
The data released here aligns with the Toronto agreement's guidelines for pre-publication data sharing (
http://www.nature.com/nature/journal/v461/n7261/full/461168a.html).
As originators of this CNS data, we retain the first rights to publish our findings. Under the Toronto agreement, researchers can use the CNS sequences and annotation to study individual or small sets of genes and localized regions of the genome. For queries related to citation or publication based on this pre-release data, please contact us.
Data Release Policy
The NSF funded Conservatory project,
overseen by PI Dave Jackson and co-PIs Madelaine Bartlett, Zach Lippman, and Idan Efroni, offers a beta-version database to the research community. It encompasses CNSs data across a broad plants phylogenetic spectrum in adherence to the Toronto agreement. This grants us the first rights to publication, including whole-genome comparisons, structural annotations, and genome-wide association studies. We are actively working on the full official release anticipated for December 2023. Any redistribution of these data should include the full text of the data use policy.
Publication
1. Hendelman, A., Zebell, S., Rodriguez-Leal, D., Dukler, N., Robitaille, G., Wu, X., Kostyun, J., Tal, L., Wang, P., Bartlett, M.E., Eshed, Y., Efroni, I.*, Lippman, Z.B.* (2021) Conserved pleiotropy of an ancient plant homeobox gene uncovered by cis-regulatory dissection. Cell. 184: 1-16. doi:
10.1016/j.cell.2021.02.001. PMID: 33667348. * co-corresponding authors
Contact Us
Should you have any questions or need further clarification on the data or its usage, feel free to reach out at:
Idan Efroni (HUJI):
[email protected]
Zachary Lippman (CSHL/HHMI):
[email protected]
Anat Hendelman (CSHL):
[email protected]
Madelaine Bartlett (UMASS):
[email protected]
David Jackson (CSHL):
[email protected]