2019 SUMMER SCHOOL FOR BIG DATA IN BIOLOGY


The Center for Biomedical Research Support will host the Annual Summer School for Big Data in Biology May 28-31, 2019. The summer school provides a unique hands-on opportunity to acquire valuable skills directly from experts in the field, with courses tailored towards novices or intermediate and advance users.

Each will meet for four half-days (either mornings or afternoons) for a total of twelve hours. Instructors will post lectures, datasets, exercises, and course information on a website accessible to enrolled participants. There will be no examinations, but participants may request certificates of completion. Academic credit will not be issued. Please carefully check the specified prerequisite knowledge before enrolling in a course. Payment information for courses is at the bottom of this page.

  1. UTEID: To obtain a UTEID, go here
  2. TACC: To sign up for a TACC account, go here.

Make sure your Unix and TACC skills are up to date for the Summer School!

Many courses prefer familiarity with Unix and TACC. If you are in need of a refresher for these topics, check out the following 3-hour short courses.

Introduction to Unix: Wednesday, May 8. For course details and to register, visit: this site

Introduction to TACC: Wednesday, May 15. For course details and to register, visit: this site

Course Listings

  • Introduction to Core NGS Concepts and Tools
  • Introduction to RNA-Seq
  • Genome Variant Analysis
  • Introduction to Python
  • Principles of Machine Learning for Bioinformatics
  • Introduction to Biocomputing
  • Introduction to Proteomics
  • Practical Approaches to Analyzing Biological Data with R
  • Creating a reproducible data analysis workflow: A Software Carpentry Workshop
  • Course Descriptions

    Introduction to Core NGS Concepts and Tools


    Day and Time: Tue-Fri 9:00 a.m. – 12:00 p.m. Location: PAR 101
    Description: This course provides an introduction to the concepts and vocabulary of Next Generation Sequencing (NGS) with an emphasis on common protocols, tools and file formats used in NGS data analysis. Subjects covered include quality assessment and manipulation of raw NGS sequences (FastQC, cutadapt), read mapping (bwa, bowtie2), the Sequence Alignment Map (SAM) format, and tools for manipulating BAM files (samtools, bedtools). Participants will gain hands-on experience using these and other NGS tools in the Linux command line environment at TACC, as well as exposure to the many bioinformatics resources TACC makes available.
    Instructor: Anna Battenhouse, Associate Research Scientist and Bioinformatics Consultant, CBRS
    Teaching Assistant: Claire McWhite
    Instructor Bio: Anna is a research scientist in the lab of Dr. Edward Marcotte as well as leading the Biomedical Research Support Facility's mission to support the IT and computational needs of UT Austin's biological sciences community. She has extensive experience working with NGS data over the last 10 years, and develops and maintains NGS analysis pipeline scripts for UT's BioITeam. Anna received a B.A. in English Literature from Carleton College in 1978. After a long career in commercial software development Anna began her "retirement career" at UT Austin in 2007, and obtained a B.S. in Biochemistry in 2013.
    Preferred or prerequisite skills: None
    Computer requirement: Students must bring their own laptops. UTEID required for wireless access. Please be sure you know your UT EID when you come to class. To obtain a UTEID, go here

     Please read this disclaimer if you are using the UT ProCard for payment!

    Back to top

    Introduction to RNA-Seq


    Day and Time: Tue-Fri 1:30 p.m. – 4:30 p.m. Location: PAR 103
    Description: This four-day course provides an introduction to methods for analysis of RNA-seq data. It assumes familiarity and comfort with Linux command line and TACC. A typical RNA-seq workflow will be featured, starting from quality assessment of raw data, mapping (bwa, kallisto), differential expression analysis (DESeq2), and downstream analyses and visualization . The course also describes analysis methods for dealing with single-cell RNA-Seq data. Participants will gain hands-on experience using these tools in a Linux command line environment at TACC.
    Instructor: Dhivya Arasappan, Assistant Professor of Practice and Bioinformatics Consultant, CBRS
    Instructor Bio: Dhivya Arasappan has 9 years experience analyzing NGS data from multiple platforms. Her areas of expertise include RNA-Seq analysis (specifically involving large-scale brain expression datasets and coexpression network analysis), de novo genome assembly (particularly using hybrid sequencing data) and benchmarking of bioinformatics tools. She is the research educator for the Big Data in Biology Freshman Research Initiative stream.
    Preferred or prerequisite skills: familiarity working in a UNIX environment. You may register for Intro to UNIX, a one-day short course in mid May, by clicking here.
    Computer requirement: Students should have their own laptop computer. UTEID required for wireless access. Please be sure you know your UT EID when you come to class. To obtain a UTEID, go here.

       Please read this disclaimer if you are using the UT ProCard for payment!

    Back to top

    Genome Variant Analysis


    Day and Time: Tue-Fri 9:00 a.m. - 12:00 p.m. Location: PAR 105
    Description: This course is designed to teach you how to identify genomic variants from a variety of NGS library sources (mixed populations, whole genome, enriched/targeted panels, rare variant, amplicon, etc.) for both prokaryotic and eukaryotic organisms. The course emphasizes using existing data sources to allow participants to analyze real data in the same step-by-step manner that one would analyze their own data. The modular nature of exercises allows participants of all computational skill levels to benefit from both instruction and hands-on practice in areas they are personally most interested in while providing introductory resources to analysis types they may encounter in the future. Additional lecture/discussion will focus on understanding strengths and weaknesses of different sequencing library types, alternative analysis programs, different sequencing platforms, and how to best utilize TACC resources and existing pipelines to make analysis faster. Major data analysis steps include: sequencing quality assessment and improvement, reference genome construction, read mapping, variant calling, visualization and reporting. Using programs and pipelines such as: FastQC, Trimmomatic, SPAdes, SAMtools, Bowtie2, bedtools, breseq, IGV, GATK.
    Instructor: Daniel Deatherage, PhD (Postdoctoral Research Associate, Molecular Biosciences)
    Instructor Bio: Daniel Deatherage earned his doctorate at The Ohio State University studying epigenetic effects of ovarian cancer. His postdoctoral work in Dr. Jeffrey Barrick’s lab has focused on using next generation sequencing to identify ultra rare mutations within evolving populations and diagnose synthetic biology constructs failure modes. In general, he is interested in using next generation sequencing to answer novel questions that may not be answerable by other methods.
    Preferred or prerequisite skills: None.
    Computer requirement: Students must use their own laptops. TACC Account and UTEID required. Please be sure you know both your UT EID and your TACC username when you come to class. To obtain a UTEID, go here. To sign up for a TACC account, go here.

      Please read this disclaimer if you are using the UT ProCard for payment!

    Back to top

    Introduction to Python


    Day and Time: Tue-Fri 9:00 a.m. – 12:00 p.m. Location: PAR 103
    Description: This four-day course will introduce students to basic concepts in scientific computing in the Python language. Trainees will learn introductory topics such as data structures, control flow, functions, and file input/output and data parsing.
    Instructor: Katie Lyons, Graduate Student
    Instructor Bio: Katie Lyons is a graduate student in the Department of Integrative Biology. She learned Python when she arrived at UT, and uses it for data analysis, especially of DNA sequences and phylogenetic trees.
    Preferred or prerequisite skills: None.
    Computer requirement: Students must provide laptops able to connect to the internet, and a Firefox or Chrome browser. UTEID required for wireless access. Please be sure you know your UT EID when you come to class. To obtain a UTEID, go here.

      Please read this disclaimer if you are using the UT ProCard for payment!

    Back to top

    Principles of Machine Learning for Bioinformatics


    Day and Time: Tue-Fri 1:30 p.m. - 4:30 p.m. Location: PAR 105
    Description: This four-day course will introduce a selection of machine learning methods used in bioinformatic analyses with a focus on RNA-seq gene expression data. We will cover unsupervised learning, dimensionality reduction and clustering; feature selection and extraction; and supervised learning methods for classification (e.g., random forests, SVM, LDA, kNN, etc.) and regression (with an emphasis on regularization methods appropriate for high-dimensional problems). Participants will have the opportunity to apply these methods as implemented in R and python to publicly available data.
    Instructors: Dennis Wylie, Research Scientist and Bioinformatics Consultant, CBRS
    Instructor Bio: Dennis Wylie joined the bioinformatics group in 2015. He has experience in NGS data analysis including variant calling and RNA-Seq-based biomarker discovery and predictive modeling (classification, regression, etc.). Prior to UT, he earned a PhD in Biophysics from UC Berkeley applying stochastic simulation methods in immunology, did postdoctoral work modeling the transmission of infectious disease, and spent six years as a bioinformatician in industry.
    Preferred or prerequisite skills: This course is recommended for students with some prior knowledge of either R or python. Participants are expected to provide their own laptops with recent versions of R and/or python installed. Students will be instructed to download several free software packages (including R packages and python libraries such as including pandas and sklearn).
    Computer requirement: Students should have their own laptop computer. UTEID required for wireless access. Please be sure you know your UT EID when you come to class. To obtain a UTEID, go here.

      Please read this disclaimer if you are using the UT ProCard for payment!

    Back to top

    Introduction to Biocomputing


    Day and Time: Tue-Fri 1:30 pm – 4:30 pm Location: PAR 101
    Description: An introduction to the Unix command line and Python. Unix basics will include file navigation, pipes, and core utilities. Python basics will cover data types, loops, conditionals, and objects. After the basics are covered, the focus will turn to bioinformatics applications. No previous programming experience assumed.
    Instructor: Benni Goetz, Associate Research Scientist and Bioinformatics Consultant, CRBS
    Instructor Bio: Benni is a bioinformatics consultant in the CBRS. Python, Bash, and huge computing clusters are some of his favorite things. In a previous life, Benni studied pure math: differential geometry in particular.
    Computer requirement: Students should have their own laptop computer. UTEID required for wireless access. Please be sure you know your UT EID when you come to class. To obtain a UTEID, go here.

    Preferred or prerequisite skills: No previous programming experience is assumed.

      Please read this disclaimer if you are using the UT ProCard for payment!

    Back to top

    Introduction to Proteomics


    Day and Time: Tue-Fri 1:30 p.m. - 4:30 p.m. Location: PAR 204
    Description: This four-day course will focus on the fundamental knowledge and skillsets needed to utilize mass spectrometry-based proteomics for biological research. We will cover key concepts of experimental design, instrumentation, and data processing. Participants will learn how to use a standard data analysis pipeline to generate peptide and protein identifications and interpret the biological significance of results. Topics will include quantitative proteomics and post-translational modifications. The goal of this course is to provide non-experts with the foundational knowledge necessary to access the power of proteomics for their own research interests.
    Instructor: Dan Boutz, PhD, Research Associate, Molecular Biosciences
    Instructor Bio: Daniel Boutz received his Ph.D. in Molecular Biology from the University of California, Los Angeles. As a post-doctoral researcher and now Research Associate with Edward Marcotte at the Center for Systems and Synthetic Biology at UT Austin, he specializes in the use of mass spectrometry-based proteomics for systems-level studies of cellular biology and immunology.
    Preferred or prerequisite skills: None.
    Computer requirements: Students will need their own laptops. Most of the software we will use is both Windows and Mac compatible, however a couple of programs are Windows-only. For these programs, Mac users can still follow the tutorial as presented. UTEID required for wireless access. Please be sure you know your UT EID when you come to class. To obtain a UTEID, go here.

      Please read this disclaimer if you are using the UT ProCard for payment!

    Back to top

    Practical Approaches to Analyzing Biological Data with R


    Day and Time: Tue-Fri 9:00 a.m. - 12:00 p.m. Location: PAR 204
    Description: Modern researchers need basic data literacy. This four-day course will introduce how to use the R programming language to analyze and visualize biological data on small and large scales. We will focus on the practical tools you need to quickly import your data, clean it up, analyze it, and then generate publication-quality plots. Along the way we’ll briefly address best practices for coding in R, and even how to effectively find help online. This course uses the tidyverse ecosystem of R packages, and upon completion you’ll have used dplyr, tidyr, ggplot2, and more. Finally, I will also introduce Bioconductor, a collection of R packages designed for numerous biology-specific applications like RNA-sequencing, 16S ribosomal profiling, and genomic sequence analysis. No previous coding experience is required for this course.
    Instructor: Sean Leonard, Graduate Student, Moran and Barrick labs
    Instructor Bio: Prior to starting the UT Cell and Molecular biology PhD program, Sean received an MS in Biotechnology from UTSA (2014) and bachelor’s degree from Loyola University Maryland (2006). Between his undergraduate and graduate studies, he served six years as an officer in the United States Army. Sean now studies the gut microbiome of honey bees, under the advisement of Dr. Nancy Moran and Dr. Jeffrey Barrick.
    Preferred or prerequisite skills: none
    Computer requirement: Students will need their own laptops. UTEID required for wireless access. Please be sure you know your UT EID when you come to class. To obtain a UTEID, go here.

      Please read this disclaimer if you are using the UT ProCard for payment!

    Back to top

    Creating a reproducible data analysis workflow: A Software Carpentry Workshop


    Day and Time: Tue-Fri 9:00 a.m. – 12:00 p.m. Location: PAR 206
    Description: Have you ever: Wanted to learn how to code in an inclusive environment? Had to re-do the same analysis over and over? Tried to work with many of files all at once? Come across a GitHub link in a paper and wonder how it got there? Lost your data or analysis because your computer crashed? Come learn and practice methods for reproducible research and reusable workflows in a workshop that is designed to be welcoming and is open to anyone at any academic level (undergraduate to senior faculty). We encourage people with little to no coding experience in any field to join us.
    We will introduce learners to creating a reproducible scientific workflow, one that can be used as a model for a current or future project, which will use the bash command line, version control with git, R programming, and how to combine each of these tools together into a workflow. If you’re struggling with how to make your data analysis more reproducible, backed up, and automated, this is the workshop for you!
    Instructor: Marian Schmidt, Simons Postdoctoral Research Fellow, Ochman lab
    Instructor Bio: When Marian started graduate school in 2012, she had no idea how to program or that she would need it in her work. After creating her first next generation sequencing dataset, she realized that coding was imperative for her research. Over many months, she struggled to teach herself how to code. Since 2015, she has enjoyed teaching programming workshops and making coding more accessible to others. Currently, she is a postdoctoral researcher in the Department of Integrative Biology. Her research focuses on the microbial ecology and evolution of bacterial communities that inhabit freshwater and marine systems.
    Preferred or prerequisite skills: None
    Computer requirement: Students will need their own laptops. UTEID required for wireless access. Please be sure you know your UT EID when you come to class. To obtain a UTEID, go here.

      Please read this disclaimer if you are using the UT ProCard for payment!

    Back to top

    REGISTRATION AND PAYMENT

    We will accept personal credit cards (American Express, MasterCard, Visa, Discover), UT ProCards (but please read this for important information regarding the use of the ProCard that could effect your registration), and IDT (interdepartmental transfer). Registration dates and fees are as follows:

    Registration dates

    Category

    Registration Fees

    Feb 7, 2019 - May 20, 2019

    UT System

    • Students and post-docs* $195/per student per course
    • Faculty or Staff* $295/per student per course

    Non-UT Schools

    • Students and post-docs* $275/per student per course
    • Faculty or Staff* $500/per student per course

    Other

    • Non-Profit $500/ per student per course
    • Government $500/ per student per course
    • Industry: $1000/per student per course
    • Groups of 5 or more from same agency or institution: receive 20% discount for groups of 5 or more (email Nicole Elmer for group registration)

    Refund and Cancellation Policy
    A full refund of registration fees, less a $25 cancellation fee, will be available if requested in writing and received by May 24th, 2019. No refunds will be made after that date. Please note that course substitutions cannot be made. If you fail to cancel by the deadline and do not attend, you are still responsible for full payment. UT-Austin reserves the right to cancel courses and to return all fees in the event of insufficient registration.

    Miscellaneous

    Location: All workshops will take place in the PAR building.

    Food: Beverages and snacks will be served during 15 minute morning and afternoon breaks. There are also soda and snack machines located in the MEZ building across the way.

    Parking: Parking on campus is at a premium. Since the Summer School occurs during the break between the spring and summer semesters, the UT Shuttles will not be operating. The nearest parking garages are the Brazos parking garage (BRG) located at 210 E. MLK, the San Antonio Garage (SAG) located at 2420 San Antonio Street, and the AT&T Executive Education & Conference Center parking garage (CCG) located at 1900 University Avenue. Parking in any of the "A," "F," "D," "OV" or "O" spaces on campus might result in the issuance of a citation. Rates for all campus garages.

    Visitor Information: 


    • UT Campus Visitor Center information

    UT Austin maps
    Austin and the UT Drag Travel Guide