Talks and Papers

Presentations and relevant papers taking place during the short and long programs. View the schedule for complete event information, including journal club meetings and social events.

Retreat Research Talk: Vasilis Ntranos
Clustering a million cells: Large-scale scRNA-Seq data analysis
1. Ntranos, V., Kamath, G.M., Zhang, J.M., Pachter, L. and David, N.T., 2016. Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts. Genome biology, 17(1), p.112.

Retreat Research Talk: Serghei Mangul
Squeezing the last drop out of next generation sequencing data

Retreat Research Talk: Andy Dahl
Adjusting for principal components of molecular phenotypes induces replicating false positives
1. Dahl, A., Guillemot, V., Mefford, J., Aschard, H. and Zaitlen, N., 2017. Adjusting For Principal Components Of Molecular Phenotypes Induces Replicating False Positives. bioRxiv, p.120899.
2. Leek, J.T. and Storey, J.D., 2007. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet, 3(9), p.e161.
3. Leek, J.T. and Storey, J.D., 2008. A general framework for multiple testing dependence. Proceedings of the National Academy of Sciences, 105(48), pp.18718-18723.

Retreat Research Talk: Ilan Gronau
Population phylogenomics: A genealogical perspective
1. Gronau, I., Hubisz, M.J., Gulko, B., Danko, C.G. and Siepel, A., 2011. Bayesian inference of ancient human demography from individual genome sequences. Nature genetics, 43(10), pp.1031-1034.
2. Rasmussen, M.D., Hubisz, M.J., Gronau, I. and Siepel, A., 2014. Genome-wide inference of ancestral recombination graphs. PLoS Genet, 10(5), p.e1004342.

Retreat Research Talk: Na Cai
Heterogeneity in depression
1. Cai, N., Bigdeli, T.B., Kretzschmar, W.W., Li, Y., Liang, J., Hu, J., Peterson, R.E., Bacanu, S., Webb, B.T., Riley, B. and Li, Q., 2017. 11,670 whole-genome sequences representative of the Han Chinese population from the CONVERGE project. Scientific data, 4.
2. Peterson, R.E., Cai, N., Bigdeli, T.B., Li, Y., Reimers, M., Nikulova, A., Webb, B.T., Bacanu, S.A., Riley, B.P., Flint, J. and Kendler, K.S., 2017. The genetic architecture of major depressive disorder in Han Chinese women. JAMA psychiatry, 74(2), pp.162-168.
3. Converge Consortium, 2015. Sparse whole genome sequencing identifies two loci for major depressive disorder. Nature, 523(7562), p.588.
4. Cai, N., Chang, S., Li, Y., Li, Q., Hu, J., Liang, J., Song, L., Kretzschmar, W., Gan, X., Nicod, J. and Rivera, M., 2015. Molecular signatures of major depression. Current Biology, 25(9), pp.1146-1156.

Retreat Research Talk: Marzia Cremona
Functional data analysis testing and linear modeling for high-resolution “omics” data
1. Campos-Sánchez, R., Cremona, M.A., Pini, A., Chiaromonte, F. and Makova, K.D., 2016. Integration and fixation preferences of human and mouse endogenous retroviruses uncovered with functional data analysis. PLoS Comput Biol, 12(6), p.e1004956.
2. Cremona, M.A., Campos-Sánchez, R., Pini, A., Vantini, S., Makova, K.D. and Chiaromonte, F., 2017. Functional data analysis of “Omics” data: how does the genomic landscape influence integration and fixation of endogenous retroviruses?. In Functional Statistics and Related Fields (pp. 87-93). Springer, Cham.
3. Cremona, Pini, Chiaromonte, Vantini (2017). IWTomics: Interval-Wise Testing for Omics Data. R package version 1.0.0.

Retreat Research Talk: David Koslicki
Improving Min Hash for Metagenomic Classification
1. Broder, A.Z., 1997, June. On the resemblance and containment of documents. In Compression and Complexity of Sequences 1997. Proceedings (pp. 21-29). IEEE.
2. Ondov, B.D., Treangen, T.J., Melsted, P., Mallonee, A.B., Bergman, N.H., Koren, S. and Phillippy, A.M., 2016. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biology, 17(1), p.132.

Retreat Research Talk: Pejman Mohammadi
Using ASE data to facilitate diagnosis for unresolved rare diseases
1. Cummings, B.B., Marshall, J.L., Tukiainen, T., Lek, M., Donkervoort, S., Foley, A.R., Bolduc, V., Waddell, L.B., Sandaradura, S.A., O’Grady, G.L. and Estrella, E., 2017. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Science translational medicine, 9(386), p.eaal5209.
2. Mohammadi, P., Castel, S.E., Brown, A.A. and Lappalainen, T., 2016. Quantifying the regulatory effect size of cis-acting genetic variation using allelic fold change. bioRxiv, p.078717.

Retreat Research Talk: Anil Ori
Integration of longitudinal gene expression with polygenic disease risk establishes human neuronal differentiation as a model to study schizophrenia

Retreat Research Talk: YoSon Park
Large, diverse population cohorts of hiPSCs and derived hepatocyte-like cells reveal functional genetic variation at blood lipid-associated loci
1. Pashos, E.E., Park, Y., Wang, X., Raghavan, A., Yang, W., Abbey, D., Peters, D.T., Arbelaez, J., Hernandez, M., Kuperwasser, N. and Li, W., 2017. Large, Diverse Population Cohorts of hiPSCs and Derived Hepatocyte-like Cells Reveal Functional Genetic Variation at Blood Lipid-Associated Loci. Cell Stem Cell, 20(4), pp.558-570.

Retreat Research Talk: Nikita Alexeev
Estimation of the rate of transpositions and the true evolutionary distance
1. Alexeev, N. and Alekseyev, M.A., 2017. Estimation of the true evolutionary distance under the fragile breakage model. BMC Genomics, 18(4), p.356.
2. Alexeev, N., Aidagulov, R. and Alekseyev, M.A., 2015. A computational method for the rate estimation of evolutionary transpositions. arXiv preprint arXiv:1501.07546.
3. Yancopoulos, S., Attie, O. and Friedberg, R., 2005. Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics, 21(16), pp.3340-3346.
4. Biller, P., Guéguen, L., Knibbe, C. and Tannier, E., 2016. Breaking good: accounting for fragility of genomic regions in rearrangement distance estimation. Genome biology and evolution, 8(5), pp.1427-1439.
5. Lin, Y. and Moret, B.M., 2008. Estimating true evolutionary distances under the DCJ model. Bioinformatics, 24(13), pp.i114-i122.
6. Erdos, P. and Rényi, A., 1960. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1), pp.17-60.

Retreat Research Talk: Loes Olde Loohuis
Transcriptome analysis in whole blood reveals increased microbial diversity in schizophrenia
1. Mangul, S., Loohuis, L.M.O., Ori, A., Jospin, G., Koslicki, D., Yang, H.T., Wu, T., Boks, M.P., Lomen-Hoerth, C., Wiedau-Pazos, M. and Cantor, R., 2016. Total RNA Sequencing reveals microbial communities in human blood and disease specific effects. bioRxiv, p.057570.

Research Talk: Lior Pachter
Title TBA

Tutorial: Ilan Gronau
Demography inference: From parameter estimation to model selection
1. Gronau, I., Hubisz, M.J., Gulko, B., Danko, C.G. and Siepel, A., 2011. Bayesian inference of ancient human demography from individual genome sequences. Nature genetics, 43(10), pp.1031-1034.
2. Hey, J. and Nielsen, R., 2007. Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proceedings of the National Academy of Sciences, 104(8), pp.2785-2790.
3. Rannala, B. and Yang, Z., 2017. Efficient Bayesian species tree inference under the multispecies coalescent. Systematic Biology, p.syw119.
4. Chung, Y. and Hey, J., 2016. Bayesian Analysis of Evolutionary Divergence with Genomic Data Under Diverse Demographic Models. bioRxiv, p.080606.

Research Talk: Jennifer Listgarten
Machine learning for CRISPR gene editing
1. Fusi, N., Smith, I., Doench, J. and Listgarten, J., 2015. In silico predictive modeling of CRISPR/Cas9 guide efficiency. bioRxiv, p.021568.
2. Listgarten, J., Weinstein, M., Elibol, M., Hoang, L., Doench, J. and Fusi, N., 2016. Predicting off-target effects for end-to-end CRISPR guide design. bioRxiv, p.078253.
3. Doench, J.G., Fusi, N., Sullender, M., Hegde, M., Vaimberg, E.W., Donovan, K.F., Smith, I., Tothova, Z., Wilen, C., Orchard, R. and Virgin, H.W., 2016. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nature biotechnology.

Research Talk: Dana Pe’er
Imputation in Single Cell RNA-seq Data
1. van Dijk, D., Nainys, J., Sharma, R., Kathail, P., Carr, A.J., Moon, K.R., Mazutis, L., Wolf, G., Krishnaswamy, S. and Pe’er, D., 2017. MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data. bioRxiv, p.111591.
2. Prabhakaran, S., Azizi, E. and Pe’er, D., 2016. Dirichlet process mixture model for correcting technical variation in single-cell gene expression data. In Proceedings of The 33rd International Conference on Machine Learning (pp. 1070-1079).

Research Talk: Michael Schatz
Advances in genome sequencing and assembly
1. Schatz, M.C., Delcher, A.L. and Salzberg, S.L., 2010. Assembly of large genomes using second-generation sequencing. Genome research, 20(9), pp.1165-1173.
2. Berlin, K., Koren, S., Chin, C.S., Drake, J.P., Landolin, J.M. and Phillippy, A.M., 2015. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nature biotechnology, 33(6), pp.623-630.
3. Loman, N.J., Quick, J. and Simpson, J.T., 2015. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nature methods, 12(8), pp.733-735.
4. Weisenfeld, N.I., Kumar, V., Shah, P., Church, D. and Jaffe, D.B., 2016. Direct determination of diploid genome sequences. bioRxiv, p.070425.
5. Zook, Justin M., et al. “Extensive sequencing of seven human genomes to characterize benchmark reference materials.” Scientific data 3 (2016).

Tutorial: Speaker TBA
Title TBA

Tutorial: Alexander Schönhuth
Constructing overlap graphs for de novo assembly of polyploid genomes
1. Baaijens, J., El Aabidine, A.Z., Rivals, E. and Schoenhuth, A., 2016. De novo assembly of viral quasispecies using overlap graphs. bioRxiv, p.080341.
2. Välimäki, N., Ladra, S. and Mäkinen, V., 2012. Approximate all-pairs suffix/prefix overlaps. Information and Computation, 213, pp.49-58.
3. Li, H., 2016. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics, p.btw152.

Research Talk: William (Xiaoquan) Wen
Bayesian approaches for integrative genetic association analysis: Data integration and scalable computation
1. Wen, X., Lee, Y., Luca, F. and Pique-Regi, R., 2016. Efficient integrative multi-SNP association analysis via Deterministic Approximation of Posteriors. The American Journal of Human Genetics, 98(6), pp.1114-1129.
2. Wen, X., Pique-Regi, R. and Luca, F., 2016. Integrating Molecular QTL Data into Genome-wide Genetic Association Analysis: Probabilistic Assessment of Enrichment and Colocalization. PLOS Genetics. 2017 Mar 13(3): e1006646.

Research Talk: James Zou
Deep learning in genomics: Introduction and examples

Research Talk: Ron Shamir
Universal hitting sets and minimizers
1. Orenstein, Y., Pellow, D., Marçais, G., Shamir, R. and Kingsford, C., 2016, August. Compact universal k-mer hitting sets. In International Workshop on Algorithms in Bioinformatics (pp. 257-268). Springer International Publishing.
2. Marcais, G., Pellow, D., Bork, D., Orenstein, Y., Shamir, R. and Kingsford, C., 2017. Improving the performance of minimizers and winnowing schemes. bioRxiv, p.104075.

Tutorial: David Koslicki
Comparative metagenomic analysis
1. Sczyrba, A., Hofmann, P., Belmann, P., Koslicki, D., Janssen, S., Droege, J., Gregor, I., Majda, S., Fiedler, J., Dahms, E. and Bremges, A., 2017. Critical Assessment of Metagenome Interpretation− a benchmark of computational metagenomics software. bioRxiv, p.099127.
2. National Research Council, 2007. The new science of metagenomics: revealing the secrets of our microbial planet. National Academies Press.
3. Ondov, B.D., Treangen, T.J., Melsted, P., Mallonee, A.B., Bergman, N.H., Koren, S. and Phillippy, A.M., 2016. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biology, 17(1), p.132.
4. Hamady, M., Lozupone, C. and Knight, R., 2010. Fast UniFrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data. The ISME journal, 4(1), pp.17-27.
5. McClelland, J. and Koslicki, D., 2016. EMDUnifrac: Exact linear time computation of the Unifrac metric and identification of differentially abundant organisms. arXiv preprint arXiv:1611.04634.

Tutorial: David Tse
How to solve NP-hard assembly problems in linear time
1. Bresler, G., Bresler, M.A. and Tse, D., 2013. Optimal assembly for high throughput shotgun sequencing. BMC bioinformatics, 14(5), p.S18.
2. Shomorony, I., Kim, S.H., Courtade, T.A. and David, N.C., 2016. Information-optimal genome assembly via sparse read-overlap graphs. Bioinformatics, 32(17), pp.i494-i502.
3. Kamath, G.M., Shomorony, I., Xia, F., Courtade, T. and David, N.T., 2017. Hinge: Long-read assembly achieves optimal repeat resolution. Genome Research,
4. Kannan, S., Hui, J., Mazooji, K., Pachter, L. and Tse, D., 2016. Shannon: An Information-Optimal de Novo RNA-Seq Assembler. bioRxiv, p.039230.

Research Talk: Ran Blekhman
Human genomic control of the microbiome
1. Goodrich, J.K., Waters, J.L., Poole, A.C., Sutter, J.L., Koren, O., Blekhman, R., Beaumont, M., Van Treuren, W., Knight, R., Bell, J.T. and Spector, T.D., 2014. Human genetics shape the gut microbiome. Cell, 159(4), pp.789-799.
2. Blekhman, R., Goodrich, J.K., Huang, K., Sun, Q., Bukowski, R., Bell, J.T., Spector, T.D., Keinan, A., Ley, R.E., Gevers, D. and Clark, A.G., 2015. Host genetic variation impacts microbiome composition across human body sites. Genome biology, 16(1), p.191.
3. Burns, M.B., Lynch, J., Starr, T.K., Knights, D. and Blekhman, R., 2015. Virulence genes are a signature of the microbiome in the colorectal tumor microenvironment. Genome medicine, 7(1), p.55.
4. Lynch, J., Tang, K., Sands, J., Sands, M., Tang, E., Mukherjee, S., Knights, D. and Blekhman, R., 2016. HOMINID: A framework for identifying associations between host genetic variation and microbiome composition. bioRxiv, p.081323.
5. Burns, M.B., Montassier, E., Abrahante, J., Starr, T.K., Knights, D. and Blekhman, R., 2016. Discrete mutations in colorectal cancer correlate with defined microbial communities in the tumor microenvironment. bioRxiv, p.090795.

Research Talk: Daniel Wegmann
Tracing the spread of farming using ancient DNA: Bioinformatic challenges and population genetic insights
1. Kousathanas, A., Leuenberger, C., Link, V., Sell, C., Burger, J. and Wegmann, D., 2017. Inferring heterozygosity from ancient and low coverage genomes. Genetics, 205(1), pp.317-332.
2. Broushaki, F., Thomas, M.G., Link, V., López, S., van Dorp, L., Kirsanow, K., Hofmanová, Z., Diekmann, Y., Cassidy, L.M., Díez-del-Molino, D. and Kousathanas, A., 2016. Early Neolithic genomes from the eastern Fertile Crescent. Science, 353(6298), pp.499-503.
3. Hofmanová, Z., Kreutzer, S., Hellenthal, G., Sell, C., Diekmann, Y., Díez-del-Molino, D., van Dorp, L., López, S., Kousathanas, A., Link, V. and Kirsanow, K., 2016. Early farmers from across Europe directly descended from Neolithic Aegeans. Proceedings of the National Academy of Sciences, p.201523951.

Research Talk: Brian Browning
Genotype phasing with large sample sizes
1. Kong, A., Masson, G., Frigge, M.L., Gylfason, A., Zusmanovich, P., Thorleifsson, G., Olason, P.I., Ingason, A., Steinberg, S., Rafnar, T. and Sulem, P., 2008. Detection of sharing by descent, long-range phasing and haplotype imputation. Nature genetics, 40(9), pp.1068-1075.
2. Browning, S.R. and Browning, B.L., 2011. Haplotype phasing: existing methods and new developments. Nature Reviews Genetics, 12(10), pp.703-714.
3. O’Connell, J., Sharp, K., Shrine, N., Wain, L., Hall, I., Tobin, M., Zagury, J.F., Delaneau, O. and Marchini, J., 2016. Haplotype estimation for biobank-scale data sets. Nature Publishing Group.
4. Loh, P.R., Danecek, P., Palamara, P.F., Fuchsberger, C., Reshef, Y.A., Finucane, H.K., Schoenherr, S., Forer, L., McCarthy, S., Abecasis, G.R. and Durbin, R., 2016. Reference-based phasing using the Haplotype Reference Consortium panel. Nature genetics, 48(11), pp.1443-1448.

Research Talk: Or Zuk
Estimating gene-specific selection parameters from human variation data: New methods and applications
1. Zuk, O., Schaffner, S.F., Samocha, K., Do, R., Hechter, E., Kathiresan, S., Daly, M.J., Neale, B.M., Sunyaev, S.R. and Lander, E.S., 2014. Searching for missing heritability: designing rare variant association studies. Proceedings of the National Academy of Sciences, 111(4), pp.E455-E464.
2. Lek, M., Karczewski, K.J., Minikel, E.V., Samocha, K.E., Banks, E., Fennell, T., O’Donnell-Luria, A.H., Ware, J.S., Hill, A.J., Cummings, B.B. and Tukiainen, T., 2016. Analysis of protein-coding genetic variation in 60,706 humans. Nature, 536(7616), pp.285-291.
3. Zuk O. et al. “Estimating and testing for gene-specific and population-specific selection in humans”; in prep.

Tutorial: Itsik Pe’er
Identity by descent in medical and population genomics
1. Palamara, P.F., Francioli, L.C., Wilton, P.R., Genovese, G., Gusev, A., Finucane, H.K., Sankararaman, S., Sunyaev, S.R., de Bakker, P.I., Wakeley, J. and Pe’er, I., 2015. Leveraging distant relatedness to quantify human mutation and gene-conversion rates. The American Journal of Human Genetics, 97(6), pp.775-789.
2. Genome of the Netherlands Consortium, 2014. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nature Genetics, 46(8), pp.818-825.
3. Palamara, P.F. and Pe’er, I., 2013. Inference of historical migration rates via haplotype sharing. Bioinformatics, 29(13), pp.i180-i188.
4. Palamara, P.F., Lencz, T., Darvasi, A. and Pe’er, I., 2012. Length distributions of identity by descent reveal fine-scale demographic history. The American Journal of Human Genetics, 91(5), pp.809-822.
5. Gusev, A., Palamara, P.F., Aponte, G., Zhuang, Z., Darvasi, A., Gregersen, P. and Pe’er, I., 2012. The architecture of long-range haplotypes shared within and across populations. Molecular biology and evolution, 29(2), pp.473-486.
6. Gusev, A., Kenny, E.E., Lowe, J.K., Salit, J., Saxena, R., Kathiresan, S., Altshuler, D.M., Friedman, J.M., Breslow, J.L. and Pe’er, I., 2011. DASH: a method for identical-by-descent haplotype mapping uncovers association with recent variation. The American Journal of Human Genetics, 88(6), pp.706-717.
7. Gusev, A., Lowe, J.K., Stoffel, M., Daly, M.J., Altshuler, D., Breslow, J.L., Friedman, J.M. and Pe’er, I., 2009. Whole population, genome-wide mapping of hidden relatedness. Genome research, 19(2), pp.318-326.

Tutorial: Lior Pachter
Title TBA

Research Talk: Melissa Gymrek
Analyzing complex repeat variation from high throughput sequencing data
1. Gymrek, M., Golan, D., Rosset, S. and Erlich, Y., 2012. lobSTR: a short tandem repeat profiler for personal genomes. Genome research, 22(6), pp.1154-1162.
2. Willems, T., Zielinski, D., Gordon, A., Gymrek, M. and Erlich, Y., 2016. Genome-wide profiling of heritable and de novo STR variations. bioRxiv, p.077727.
3. Gymrek, M., Willems, T., Guilmatre, A., Zeng, H., Markus, B., Georgiev, S., Daly, M.J., Price, A.L., Pritchard, J.K., Sharp, A.J. and Erlich, Y., 2015. Abundant contribution of short tandem repeats to gene expression variation in humans. Nature genetics.
4. Gymrek, M., Willems, T., Reich, D.E. and Erlich, Y., 2016. A framework to interpret short tandem repeat variation in humans. bioRxiv, p.092734.

Research Talk: Jonathan Flint
Using low pass sequence data to analyse complex traits
1. Nicod, J., Davies, R.W., Cai, N., Hassett, C., Goodstadt, L., Cosgrove, C., Yee, B.K., Lionikaite, V., McIntyre, R.E., Remme, C.A. and Lodder, E.M., 2016. Genome-wide association of multiple complex traits in outbred mice by ultra-low-coverage sequencing. Nature genetics, 48(8), pp.912-918.
2. Davies, R.W., Flint, J., Myers, S. and Mott, R., 2016. Rapid genotype imputation from sequence without reference panels. Nature genetics, 48(8), pp.965-969.
3. Converge Consortium, 2015. Sparse whole genome sequencing identifies two loci for major depressive disorder. Nature, 523(7562), p.588.
4. Cai, N., Chang, S., Li, Y., Li, Q., Hu, J., Liang, J., Song, L., Kretzschmar, W., Gan, X., Nicod, J. and Rivera, M., 2015. Molecular signatures of major depression. Current Biology, 25(9), pp.1146-1156.

Research Talk: Elhanan Borenstein
Multi-omic and model-based analysis of the human microbiome
1. Manor, O. and Borenstein, E., 2017. Systematic Characterization and Analysis of the Taxonomic Drivers of Functional Shifts in the Human Microbiome. Cell Host & Microbe, 21(2), pp.254-267.
2. Noecker, C., Eng, A., Srinivasan, S., Theriot, C.M., Young, V.B., Jansson, J.K., Fredricks, D.N. and Borenstein, E., 2016. Metabolic model-based integration of microbiome taxonomic and metabolomic profiles elucidates mechanistic links between ecological and metabolic variation. mSystems, 1(1), pp.e00013-15.
3. Greenblum, S., Carr, R. and Borenstein, E., 2015. Extensive strain-level copy-number variation across human gut microbiome species. Cell, 160(4), pp.583-594.

Research Talk: Saharon Rosset
Quality preserving databases for statistically sound “big data” analysis on public databases
1. Rosset, S., Aharoni, E. and Neuvirth, H., 2014. Novel Statistical Tools for Management of Public Databases Facilitate Community‐Wide Replicability and Control of False Discovery. Genetic epidemiology, 38(5), pp.477-481.
2. Aharoni, E. and Rosset, S., 2014. Generalized α‐investing: definitions, optimality results and application to public databases. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(4), pp.771-794.
3. Aharoni, E., Neuvirth, H. and Rosset, S., 2011. The quality preserving database: A computational framework for encouraging collaboration, enhancing power and controlling false discovery. IEEE/ACM transactions on computational biology and bioinformatics, 8(5), pp.1431-1437.

Research Talk: Jason Ernst
Computational approaches for deciphering the non-coding human genome
1. Ernst, J. and Kellis, M., 2010. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nature biotechnology, 28(8), pp.817-825.
2. Ernst, J., Kheradpour, P., Mikkelsen, T.S., Shoresh, N., Ward, L.D., Epstein, C.B., Zhang, X., Wang, L., Issner, R., Coyne, M. and Ku, M., 2011. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature, 473(7345), pp.43-49.
3. Ernst, J. and Kellis, M., 2012. ChromHMM: automating chromatin-state discovery and characterization. Nature methods, 9(3), pp.215-216.
4. Ernst, J. and Kellis, M., 2015. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues. Nature biotechnology, 33(4), pp.364-376.
5. Ernst, J., Melnikov, A., Zhang, X., Wang, L., Rogov, P., Mikkelsen, T.S. and Kellis, M., 2016. Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions. Nature Biotechnology, 34(11), pp.1180-1190.

Research Talk: Alex Zelikovsky
Inference of metabolic pathway activity from metatranscriptomic reads
1. Temate-Tiagueu, Y., Al Seesi, S., Mathew, M., Mandric, I., Rodriguez, A., Bean, K., Cheng, Q., Glebova, O., Măndoiu, I., Lopanik, N.B. and Zelikovsky, A., 2016. Inferring metabolic pathway activity levels from RNA-Seq data. BMC genomics, 17(5), p.542.
2. Mathew, M., Bean, K.I., Temate-Tiagueu, Y., Caciula, A., Mandoiu, I.I., Zelikovsky, A. and Lopanik, N.B., 2016. Influence of symbiont-produced bioactive natural products on holobiont fitness in the marine bryozoan, Bugula neritina via protein kinase C (PKC). Marine biology, 163(2), pp.1-17.
3. Glebova, O., Temate‐Tiagueu, Y., Caciula, A., Al Seesi, S., Artyomenko, A., Mangul, S., Lindsay, J., Măndoiu, I.I. and Zelikovsky, A., 2016. Transcriptome Quantification and Differential Expression from NGS Data. Computational Methods for Next Generation Sequencing Data Analysis, pp.301-327.
4. Nicolae, M., Mangul, S., Măndoiu, I.I. and Zelikovsky, A., 2011. Estimation of alternative splicing isoform frequencies from RNA-Seq data. Algorithms for molecular biology, 6(1), p.9.

Research Talk: Cenk Sahinalp
Computational methods for intra-tumor heterogeneity detection and modeling clonal evolution of cancer through the use of bulk and single cell sequencing data
1. El-Kebir, M., Satas, G., Oesper, L. and Raphael, B.J., 2016. Inferring the mutational history of a tumor using multi-state perfect phylogeny mixtures. Cell Systems, 3(1), pp.43-53.
2. El-Kebir, M., Oesper, L., Acheson-Field, H. and Raphael, B.J., 2015. Reconstruction of clonal trees and tumor composition from multi-sample sequencing data. Bioinformatics, 31(12), pp.i62-i70.
3. Oesper, L., Satas, G. and Raphael, B.J., 2014. Quantifying tumor heterogeneity in whole-genome and whole-exome sequencing data. Bioinformatics, 30(24), pp.3532-3540.
4. Salehi, S., Steif, A., Roth, A., Aparicio, S., Bouchard-Côté, A. and Shah, S.P., 2017. ddClone: joint statistical inference of clonal populations from single cell and bulk tumour sequencing data. Genome Biology, 18(1), p.44.
5. Roth, A., McPherson, A., Laks, E., Biele, J., Yap, D., Wan, A., Smith, M.A., Nielsen, C.B., McAlpine, J.N., Aparicio, S. and Bouchard-Côté, A., 2016. Clonal genotype and population structure inference from single-cell tumor sequencing. Nature methods, 13(7), pp.573-576.
6. Ha, G., Roth, A., Khattra, J., Ho, J., Yap, D., Prentice, L.M., Melnyk, N., McPherson, A., Bashashati, A., Laks, E. and Biele, J., 2014. TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data. Genome research, 24(11), pp.1881-1893.
7. Roth, A., Khattra, J., Yap, D., Wan, A., Laks, E., Biele, J., Ha, G., Aparicio, S., Bouchard-Côté, A. and Shah, S.P., 2014. PyClone: statistical inference of clonal population structure in cancer. Nature methods, 11(4), pp.396-398.
8. Jahn, K., Kuipers, J. and Beerenwinkel, N., 2016. Tree inference for single-cell data. Genome biology, 17(1), p.86.

Research Talk: Sagi Snir
A universal PaceMaker as a better explanation of evolution and aging
1. Snir, S., Wolf, Y.I. and Koonin, E.V., 2012. Universal pacemaker of genome evolution. PLoS Comput Biol, 8(11), p.e1002785.
2. Snir, S. and Pellegrini, M., 2016. A Statistical Framework to Identify Deviation from Time Linearity in Epigenetic Aging. PLoS Comput Biol, 12(11), p.e1005183.

Research Talk: David Tse
Maximally correlation and principal component analysis
1. Rényi, A., 1959. On measures of dependence. Acta mathematica hungarica, 10(3-4), pp.441-451.
2. Feizi, S. and Tse, D., 2017. Maximally Correlated Principle Component Analysis. arXiv preprint arXiv:1702.05471.

Tutorial: Saharon Rosset
Stochastic process models for mutations, their estimation from data, and their uses
1. Huelsenbeck, J.P. and Crandall, K.A., 1997. Phylogeny estimation and hypothesis testing using maximum likelihood. Annual Review of Ecology and Systematics, 28(1), pp.437-466.
2. Tamura, K. and Nei, M., 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular biology and evolution, 10(3), pp.512-526.
3. Nielsen, R., 2005. Statistical methods in molecular evolution (Vol. 6). New York: Springer.
4. Whittaker, J.C., Harbord, R.M., Boxall, N., Mackay, I., Dawson, G. and Sibly, R.M., 2003. Likelihood-based estimation of microsatellite mutation rates. Genetics, 164(2), pp.781-787.

Research Talk: Kin Fai Au
Transcriptome analysis at the gene isoform level using hybrid sequencing
1. Au, K.F., Sebastiano, V., Afshar, P.T., Durruthy, J.D., Lee, L., Williams, B.A., van Bakel, H., Schadt, E.E., Reijo-Pera, R.A., Underwood, J.G. and Wong, W.H., 2013. Characterization of the human ESC transcriptome by hybrid sequencing. Proceedings of the National Academy of Sciences, 110(50), pp.E4821-E4830.
2. Weirather, J.L., de Cesare, M., Wang, Y., Piazza, P., Sebastiano, V., Wang, X.J., Buck, D. and Au, K.F., 2017. Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000Research, 6.

Research Talk: Jo Hardin
Prediction intervals for random forests with applications to high throughput data
1. Chen, X. and Ishwaran, H., 2012. Random forests for genomic data analysis. Genomics, 99(6), pp.323-329.
2. Pang, H., Lin, A., Holford, M., Enerson, B.E., Lu, B., Lawton, M.P., Floyd, E. and Zhao, H., 2006. Pathway analysis using random forests classification and regression. Bioinformatics, 22(16), pp.2028-2036.
3. Zhang, J., Hadj-Moussa, H. and Storey, K.B., 2016. Current progress of high-throughput microRNA differential expression analysis and random forest gene selection for model and non-model systems: an R implementation. Journal of Integrative Bioinformatics, 13(5), p.306.

Speaker TBA
Title TBA

Research Talk: Ilan Gronau
Inferring a complex network of interbreeding between modern and archaic humans
1. Green, R.E., Krause, J., Briggs, A.W., Maricic, T., Stenzel, U., Kircher, M., Patterson, N., Li, H., Zhai, W., Fritz, M.H.Y. and Hansen, N.F., 2010. A draft sequence of the Neandertal genome. science, 328(5979), pp.710-722.
2. Reich, D., Green, R.E., Kircher, M., Krause, J., Patterson, N., Durand, E.Y., Viola, B., Briggs, A.W., Stenzel, U., Johnson, P.L. and Maricic, T., 2010. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature, 468(7327), pp.1053-1060.
3. Meyer, M., Kircher, M., Gansauge, M.T., Li, H., Racimo, F., Mallick, S., Schraiber, J.G., Jay, F., Prüfer, K., De Filippo, C. and Sudmant, P.H., 2012. A high-coverage genome sequence from an archaic Denisovan individual. Science, 338(6104), pp.222-226.
4. Prüfer, K., Racimo, F., Patterson, N., Jay, F., Sankararaman, S., Sawyer, S., Heinze, A., Renaud, G., Sudmant, P.H., De Filippo, C. and Li, H., 2014. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature, 505(7481), pp.43-49.
5. Kuhlwilm, M., Gronau, I., Hubisz, M.J., de Filippo, C., Prado-Martinez, J., Kircher, M., Fu, Q., Burbano, H.A., Lalueza-Fox, C., de La Rasilla, M. and Rosas, A., 2016. Ancient gene flow from early modern humans into Eastern Neanderthals. Nature, 530(7591), pp.429-433.

Tutorial: John Novembre
Computational tools for understanding population structure in genetic variation data
1. Novembre, J. and Stephens, M., 2008. Interpreting principal component analyses of spatial population genetic variation. Nature genetics, 40(5), pp.646-649.
2. Alexander, D.H., Novembre, J. and Lange, K., 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome research, 19(9), pp.1655-1664.
3. Yang, W.Y., Novembre, J., Eskin, E. and Halperin, E., 2012. A model-based approach for analysis of spatial structure in genetic data. Nature genetics, 44(6), pp.725-731.
4. Petkova, D., Novembre, J. and Stephens, M., 2015. Visualizing spatial population structure with estimated effective migration surfaces. Nature Publishing Group.
5. Novembre, J. and Peter, B.M., 2016. Recent advances in the study of fine-scale population structure in humans. Current Opinion in Genetics & Development, 41, pp.98-105.

Tutorial: Ben Raphael
Inferring tumor evolution

Tutorial: Or Zuk
Co-evolution analysis: Methods and applications
1. Felsenstein, J., 1985. Phylogenies and the comparative method. The American Naturalist, 125(1), pp.1-15.
2. Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D. and Yeates, T.O., 1999. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proceedings of the National Academy of Sciences, 96(8), pp.4285-4288.
3. Morcos, F., Pagnani, A., Lunt, B., Bertolino, A., Marks, D.S., Sander, C., Zecchina, R., Onuchic, J.N., Hwa, T. and Weigt, M., 2011. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proceedings of the National Academy of Sciences, 108(49), pp.E1293-E1301.

Research Talk: Jae-Hoon Sul
Increasing generality and power of rare variant tests utilizing extended pedigrees
1. Sul, J.H., Cade, B.E., Cho, M.H., Qiao, D., Silverman, E.K., Redline, S. and Sunyaev, S., 2016. Increasing Generality and Power of Rare-Variant Tests by Utilizing Extended Pedigrees. The American Journal of Human Genetics, 99(4), pp.846-859.
2. Zhu, Y. and Xiong, M., 2012. Family-based association studies for next-generation sequencing. The American Journal of Human Genetics, 90(6), pp.1028-1045.
3. Schaid, D.J., McDonnell, S.K., Sinnwell, J.P. and Thibodeau, S.N., 2013. Multiple genetic variant association testing by collapsing and kernel methods with pedigree or population structured data. Genetic epidemiology, 37(5), pp.409-418.

Research Talk: Fabio Vandin
Computational methods for survival analysis in genome-wide cancer studies
1. Vandin, F., Upfal, E. and Raphael, B.J., 2011. Algorithms for detecting significantly mutated pathways in cancer. Journal of Computational Biology, 18(3), pp.507-522.
2. Raphael, B.J., Dobson, J.R., Oesper, L. and Vandin, F., 2014. Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine. Genome medicine, 6(1), p.5.
3. Leiserson, M.D., Vandin, F., Wu, H.T., Dobson, J.R., Eldridge, J.V., Thomas, J.L., Papoutsaki, A., Kim, Y., Niu, B., McLellan, M. and Lawrence, M.S., 2015. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nature genetics, 47(2), pp.106-114.
4. Vandin, F., Papoutsaki, A., Raphael, B.J. and Upfal, E., 2015. Accurate computation of survival statistics in genome-wide studies. PLoS Comput Biol, 11(5), p.e1004071.
5. Hansen, T. and Vandin, F., 2016. Finding Mutated Subnetworks Associated with Survival in Cancer. arXiv preprint arXiv:1604.02467.

Research Talk: David Koslicki
Using the Earth-mover’s Distance to compare microbial communities
1. McClelland, J. and Koslicki, D., 2016. EMDUnifrac: Exact linear time computation of the Unifrac metric and identification of differentially abundant organisms. arXiv preprint arXiv:1611.04634.
2. Mangul, S. and Koslicki, D., 2016. Reference-free comparison of microbial communities via de Bruijn graphs. bioRxiv, p.055020.
3. Evans, S.N. and Matsen, F.A., 2012. The phylogenetic Kantorovich–Rubinstein metric for environmental sequence samples. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(3), pp.569-592.

Research Talk: Vineet Bafna
Detecting the favored allele in an ongoing selective sweep
1. Ronen, R., Tesler, G., Akbari, A., Zakov, S., Rosenberg, N.A. and Bafna, V., 2015. Predicting carriers of ongoing selective sweeps without knowledge of the favored allele. PLoS Genet, 11(9), p.e1005527.

Research Talk: Francesca Chiaromonte
Statistics for large, complex data and its role in “Omics” research
1. Liu, Y., Chiaromonte, F. and Li, B., 2016. Structured Ordinary Least Squares: A Sufficient Dimension Reduction approach for regressions with partitioned predictors and heterogeneous units. Biometrics.
2. Guo, Z., Li, L., Lu, W. and Li, B., 2015. Groupwise dimension reduction via envelope method. Journal of the American Statistical Association, 110(512), pp.1515-1527.
3. Ma, Y. and Zhu, L., 2013. A review on dimension reduction. International Statistical Review, 81(1), pp.134-150.
4. Bertsimas, D., King, A. and Mazumder, R., 2016. Best subset selection via a modern optimization lens. The Annals of Statistics, 44(2), pp.813-852.
5. Campos-Sánchez, R., Cremona, M.A., Pini, A., Chiaromonte, F. and Makova, K.D., 2016. Integration and fixation preferences of human and mouse endogenous retroviruses uncovered with functional data analysis. PLoS Comput Biol, 12(6), p.e1004956.
6. Cremona, M.A., Sangalli, L.M., Vantini, S., Dellino, G.I., Pelicci, P.G., Secchi, P. and Riva, L., 2015. Peak shape clustering reveals biological insights. BMC bioinformatics, 16(1), p.349.
7. Tibshirani, R., 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pp.267-288.
8. Zou, H. and Hastie, T., 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), pp.301-320.

Research Talk: Sagiv Shifman
Genomics approaches to study neurodevelopmental disorders
1. Ben-David, E. and Shifman, S., 2012. Networks of neuronal genes affected by common and rare variants in autism spectrum disorders. PLoS Genet, 8(3), p.e1002556.
2. Ben-David, E. and Shifman, S., 2013. Combined analysis of exome sequencing points toward a major role for transcription regulation during brain development in autism. Molecular psychiatry, 18(10), p.1054.
3. Shohat, S., Ben-David, E. and Shifman, S., 2016. Varying intolerance of gene pathways to mutational classes explain genetic convergence across neuropsychiatric disorders. bioRxiv, p.054460.
4. Fromer, M., Pocklington, A.J., Kavanagh, D.H., Williams, H.J., Dwyer, S., Gormley, P., Georgieva, L., Rees, E., Palta, P., Ruderfer, D.M. and Carrera, N., 2014. De novo mutations in schizophrenia implicate synaptic networks. Nature, 506(7487), pp.179-184.
5. Parikshak, N.N., Luo, R., Zhang, A., Won, H., Lowe, J.K., Chandran, V., Horvath, S. and Geschwind, D.H., 2013. Integrative functional genomic analyses implicate specific molecular pathways and circuits in autism. Cell, 155(5), pp.1008-1021.
6. Zhang, B. and Horvath, S., 2005. A general framework for weighted gene co-expression network analysis. Statistical applications in genetics and molecular biology, 4(1), p.1128.
7. Xu, X., Wells, A.B., O’Brien, D.R., Nehorai, A. and Dougherty, J.D., 2014. Cell type-specific expression analysis to identify putative cellular mechanisms for neurogenetic disorders. Journal of Neuroscience, 34(4), pp.1420-1431.
8. Study, D.D.D., 2017. Prevalence and architecture of de novo mutations in developmental disorders. Nature, 542(7642), pp.433-438.

Tutorial: Bogdan Pasaniuc
Genetic correlations to gain insights into relations between traits
1. Mancuso, N., Shi, H., Goddard, P., Kichaev, G., Gusev, A. and Pasaniuc, B., 2017. Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits. The American Journal of Human Genetics, 100(3), pp.473-487.
2. Pasaniuc, B. and Price, A.L., 2016. Dissecting the genetics of complex traits using summary association statistics. Nature Reviews Genetics.
3. Shi, H., Kichaev, G. and Pasaniuc, B., 2016. Contrasting the genetic architecture of 30 complex traits from summary association data. The American Journal of Human Genetics, 99(1), pp.139-153.
4. Gusev, A., Ko, A., Shi, H., Bhatia, G., Chung, W., Penninx, B.W., Jansen, R., De Geus, E.J., Boomsma, D.I., Wright, F.A. and Sullivan, P.F., 2016. Integrative approaches for large-scale transcriptome-wide association studies. Nature genetics.

Research Talk: Jessica (Jingyi) Li
Neyman-Pearson (NP) classification algorithms and NP Receiver Operating Characteristic (NP-ROC) curves
1. Tong, X., Feng, Y. and Li, J.J., 2016. Neyman-Pearson (NP) classification algorithms and NP receiver operating characteristic (NP-ROC) curves. arXiv preprint arXiv:1608.03109.
2. Li, J.J. and Tong, X., 2016. Genomic Applications of the Neyman–Pearson Classification Paradigm. In Big Data Analytics in Genomics (pp. 145-167). Springer International Publishing.

Tutorial: Fabio Vandin
Computational discovery of significantly mutated genes and pathways in cancer
1. Vandin, F., Upfal, E. and Raphael, B.J., 2011. Algorithms for detecting significantly mutated pathways in cancer. Journal of Computational Biology, 18(3), pp.507-522.
2. Raphael, B.J., Dobson, J.R., Oesper, L. and Vandin, F., 2014. Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine. Genome medicine, 6(1), p.5.
3. Leiserson, M.D., Vandin, F., Wu, H.T., Dobson, J.R., Eldridge, J.V., Thomas, J.L., Papoutsaki, A., Kim, Y., Niu, B., McLellan, M. and Lawrence, M.S., 2015. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nature genetics, 47(2), pp.106-114.
4. Vandin, F., Papoutsaki, A., Raphael, B.J. and Upfal, E., 2015. Accurate computation of survival statistics in genome-wide studies. PLoS Comput Biol, 11(5), p.e1004071.
5. Hansen, T. and Vandin, F., 2016. Finding Mutated Subnetworks Associated with Survival in Cancer. arXiv preprint arXiv:1604.02467.

Research Talk: Sriram Sankararaman
Deep learning for population genomic inference

Research Talk: Kirk Lohmueller
Variation in positive and negative selection across the Tree of Life
1. Galtier, N., 2016. Adaptive protein evolution in animals and the effective population size hypothesis. PLoS Genet, 12(1), p.e1005774.
2. Kim, B.Y., Huber, C.D. and Lohmueller, K.E., 2016. Inference of the distribution of selection coefficients for new nonsynonymous mutations using large samples. bioRxiv, p.071431.

Research Talk: Fereydoun Hormozdiari
Discovery of genetic variants and modules in neurodevelopmental disorders
1. O’Roak, B.J., Vives, L., Girirajan, S., Karakoc, E., Krumm, N., Coe, B.P., Levy, R., Ko, A., Lee, C., Smith, J.D. and Turner, E.H., 2012. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature, 485(7397), pp.246-250.
2. Parikshak, N.N., Gandal, M.J. and Geschwind, D.H., 2015. Systems biology and gene networks in neurodevelopmental and neurodegenerative disorders. Nature Reviews Genetics, 16(8), pp.441-458.
3. Turner, T.N., Hormozdiari, F., Duyzend, M.H., McClymont, S.A., Hook, P.W., Iossifov, I., Raja, A., Baker, C., Hoekzema, K., Stessman, H.A. and Zody, M.C., 2016. Genome sequencing of autism-affected families reveals disruption of putative noncoding regulatory DNA. The American Journal of Human Genetics, 98(1), pp.58-74.
4. Hormozdiari, F., Penn, O., Borenstein, E. and Eichler, E.E., 2015. The discovery of integrated gene networks for autism and related disorders. Genome research, 25(1), pp.142-154.
5. Gilman, S.R., Iossifov, I., Levy, D., Ronemus, M., Wigler, M. and Vitkup, D., 2011. Rare de novo variants associated with autism implicate a large functional network of genes involved in formation and function of synapses. Neuron, 70(5), pp.898-907.
6. Linh and Hormozdiari Recomb 2017

Tutorial: Noah Zaitlen
Covariate adjustment in genetics and genomics
1. Zaitlen, N., Lindström, S., Pasaniuc, B., Cornelis, M., Genovese, G., Pollack, S., Barton, A., Bickeböller, H., Bowden, D.W., Eyre, S. and Freedman, B.I., 2012. Informed conditioning on clinical covariates increases power in case-control association studies. PLoS Genet, 8(11), p.e1003032.
2. Aschard, H., Vilhjalmsson, B., Patel, C., Skurnik, D., Yu, J., Wolpin, B., Kraft, P. and Zaitlen, N., 2016. Playing Musical Chairs in Big Data to Reveal Variables Associations. bioRxiv, p.057190.
3. Aschard, H., Vilhjálmsson, B.J., Joshi, A.D., Price, A.L. and Kraft, P., 2015. Adjusting for heritable covariates can bias effect estimates in genome-wide association studies. The American Journal of Human Genetics, 96(2), pp.329-339.

Research Talk: Barbara Engelhardt
Transcriptional time series responses: Challenges, approaches, and opportunities
1. Heard, N.A., Holmes, C.C. and Stephens, D.A., 2006. A quantitative study of gene regulation involved in the immune response of anopheline mosquitoes: An application of Bayesian hierarchical clustering of curves. Journal of the American Statistical Association, 101(473), pp.18-29.
2. Qin, Z.S., 2006. Clustering microarray gene expression data using weighted Chinese restaurant process. Bioinformatics, 22(16), pp.1988-1997.
3. McDowell, I.C., Manandhar, D., Vockley, C.M., Schmid, A., Reddy, T.E. and Engelhardt, B., 2017. Clustering gene expression time series data using an infinite Gaussian process mixture model. bioRxiv, p.131151.