2016 SUMMER SCHOOL FOR BIG DATA IN BIOLOGY


REGISTRATION IS NOW CLOSED FOR 2016 COURSES.

The Center for Computational Biology and Bioinformatics at The University of Texas at Austin is proud to host the 3rd Annual Summer School for Big Data in Biology May 23–26, 2016. The summer school provides a unique hands-on opportunity to acquire valuable skills directly from experts in the field, with courses tailored towards novices or intermediate and advance users.

This year we are offering 11 courses. Each will meet for four half-days (either mornings or afternoons) for a total of twelve hours. Instructors will post lectures, datasets, exercises, and course information on a website accessible to enrolled participants. There will be no examinations, but participants may request certificates of completion. Academic credit will not be issued. Please carefully check the specified prerequisite knowledge before enrolling in a course. Payment information for courses is at the bottom of this page.

We now offer credit for educational professional development opportunities during our 2016 Summer School for Big Data in Biology! Teachers currently working in PK–12 settings can earn 12 hours of Continuing Professional Education (CPE) by attending one of our courses held May 23-26, 2016.

The archive of 2015 courses can be found here.

  1. UTEID: To obtain a UTEID, go here
  2. TACC: To sign up for a TACC account, go here.

Need a quick refresher before the summer school?

Many courses prefer familiarity with Unix, R, and or Python. If you are in need of a refresher for these topics, check out the following 3-hour short courses.

R-Data Analysis and Graphics: Tues March 22. For course details and to register, visit: https://stat.utexas.edu/training/software-short-courses

Intro to UNIX for Bioinformatics: Thursday April 28. For course details and to register, visit: http://ccbb.biosci.utexas.edu/shortcourses.html

Intro to R for Bioinformatics: Mon May 5. For course details and to register, visit: http://ccbb.biosci.utexas.edu/shortcourses.html

TOPIC MORNING COURSES Mon - Thur, May 23-26, 9 a.m.-12 p.m. AFTERNOON COURSES Mon - Thur, May 23-26, 1:30 p.m.-4:30 p.m.
Programming

(New!) Bash Beyond Basics

Introduction to Python

DNA and RNA sequencing methods and analyses

Introduction to Core NGS Tools

Genome Variant Analysis

(New!) Metagenomic Analysis of Microbial Communities

(New!) Clinical Genomics

Machine Learning Methods for Gene Expression Analysis

Introduction to RNA-Seq

Proteomics Introduction to Proteomics

Computational Modeling

Protein Modeling Using Roseta

Computational Modeling to Study Evolution in Action

 

Course Descriptions

TOPIC: PROGRAMMING

Bash Beyond Basics

Day and Time: Mon-Thur 9:00 a.m. – 12:00 p.m. Location: MEZ 1.118
Description: The course will focus on being more productive in the Bash shell. We will learn about regular expressions, Unix utilities like cut/sort/join, awk, advanced piping, process substitution, string manipulation, and Bash scripting. Learn to love the command line and increase your productivity with rapid manipulation of bioinformatic data!
Instructor: Benjamin (Benni) Goetz, M.S., Bioinformatics Consultant
Instructor Bio: : After graduating from UT Austin with a Masters degree in mathematics, Benni Goetz joined the CCBB as a bioinformatics consultant.
Preferred or prerequisite skills: Basic familiarity with the command line is assumed, but nothing beyond basic file manipulation.
Laptop requirement: Students must bring their own laptops. Windows users should have PuttySSH installed. Mac or Linux users will not need to install anything extra.

 Please read this disclaimer if you are using the UT ProCard for payment!

Back to top

Introduction to Python

Day and Time: Mon-Thur 1:30 p.m. – 4:30 p.m. Location: MEZ 4.136
Description: This four-day course will introduce students to basic concepts in scientific computing in the Python language. Trainees will learn introductory topics such as data structures, control flow, functions, and file input/output and data parsing.
Instructor: Stephanie Spielman, Ph.D.
Instructor Bio: Stephanie Spielman recently obtained her Ph.D. in Claus Wilke’s lab through the Ecology, Evolution, and Behavior Program. Her research focuses on computational molecular evolution. Stephanie conducts research using Python, R, UNIX, and C++. She is broadly interested in methods in computational molecular evolution and phylogenetics.
Preferred of prerequisite skills: This course is intended for beginners interested in learning fundamental skills in computer programming in Python, with an application to biological and sequence data analysis. The course is geared towards novices and assumes no prior knowledge of computer programming.
Computer Lab: This course will take place in a computer lab. Software will be installed in class.

   Please read this disclaimer if you are using the UT ProCard for payment!

Back to top

TOPIC: DNA AND RNA SEQUENCING ANALYSIS

Introduction to Core NGS Tools THIS COURSE IS FULL!

Day and Time: Mon-Thur 9:00 a.m. - 12:00 p.m. pm Location: MEZ 4.128
Description: This course provides an introduction to common analysis tools and file formats currently used in Next Generation Sequencing (NGS), with emphasis on quality assessment and manipulation of raw NGS sequences (FastQC, cutadapt), read mapping (bwa, bowtie2), the Sequence Alignment Map (SAM) format, and tools for manipulating BAM files (samtools, bedtools). Participants will gain hands-on experience using these and other NGS tools in the Linux command line environment at TACC, as well as exposure to the many bioinformatics resources TACC makes available.
Instructors: Anna Battenhouse and Amelia Weber Hall
Instructor Bios: Anna Battenhouse is an Associate Research Scientist in the Iyer lab. Anna received a B.A. in English Literature from Carleton College in 1978. After a career in commercial software development from 1982-2005, Anna began her retirement career as a Research Associate in the Iyer lab. She obtained her B.S. in Biochemistry from UT Austin in 2013.

Amelia Weber Hall is a graduate student in the Iyer Lab. Amelia received her B.S. in Molecular Genetics from the University of Rochester in 2007. She worked as a laboratory technician for Richard Aldrich in the UT Austin Department of Neuroscience from 2007-2010. In 2010, she began her Microbiology PhD at UT Austin and is currently a 6th year student in the Iyer lab.
Teaching Assistant: Dakota Derryberry
Preferred of prerequisite skills: This course is intended for researchers who are planning to or just starting to perform NGS experiments. A background in UNIX is preferred but not required.
Computer Lab: This course will take place in a computer lab with internet access and a terminal program.
UTEID and TACC Account required: Attendees must have UT EIDs for access to our course wiki, as well as accounts on TACC. Please be sure you know both your UT EID and your TACC username when you come to class. To obtain a UTEID, go here. To sign up for a TACC account, go here.

  THIS COURSE IS FULL!! Please read this disclaimer if you are using the UT ProCard for payment!

Back to top

Genome Variant Analysis

Day and Time: Mon-Thur 1:30 p.m. – 4:30 p.m. Location: MEZ 4.144
Description: This four-day course is designed to teach you how to analyze next generation sequencing data via a series of interactive tutorials designed to provide hands-on familiarity with a variety of analysis tools (such as Trimmomatic, fastQC, SAMtools, Bowtie2, BWA, breseq, IGV, GATK, and more). Major data analysis topics covered will include read pre-processing, analyzing read quality, genome assembly, read alignment, detection of single nucleotide variants, detection of structural variants, visual representation of such variants, rare variant detection within population, target enrichment strategies, and more. Initially the class will focus on prokaryotic samples as many of the same principles of analysis will apply, later portions of the class will provide an option for each participant to choose between more in-depth prokaryotic analysis or eukaryotic analysis depending on personal relevance. The class will primarily focus on Illumina sequencing data, but discussion will cover alternative library preparation methods as well as alternative technologies.
Instructor: Daniel E. Deatherage, Ph.D., Postdoctoral Fellow
Instructor Bio: Daniel Deatherage earned his doctorate at The Ohio State University studying epigenetic effects of ovarian cancer. His postdoctoral work in Dr. Jeffrey Barrick’s lab has focused on using next generation sequencing to identify ultra rare mutations within evolving populations. He has been teaching or assistant teaching this class for 4 years.
Teaching Assistant: Sean Leonard
Preferred or prerequisite skills Bio: The use of interactive tutorials allows self paced progress meaning no background required; however, familiarity with command line is helpful and will allow you to complete more content during the course.
Computer Lab: This course will take place in a computer lab with all the necessary programs preinstalled.
TACC Account required:: Attendees will need an account TACC. Please be sure you know your TACC username when you come to class. To sign up for a TACC account, go here.

  Please read this disclaimer if you are using the UT ProCard for payment!

Back to top

Metagenomic Analysis of Microbial Communities

Day and Time: Mon-Thur 9:00 a.m. - 12:00 p.m. Location: MEZ 1.102
Description: In this four-day course students will learn how to analyze next-generation metagenomic sequence data from entire microbial communities. We will cover theoretical aspects as well as provide different hands-on sessions to go through all the steps necessary to complete an analysis of microbial communities. The course will primarily focus on targeted amplicon metagenomics (e.g. hyper variable region of 16S rRNA gene), but shotgun metagenomics approaches will also be presented. Real Next-Generation sequencing data (provided by the instructor) will be used in multiple interactive tutorials. These tutorials will cover raw sequence analysis through to statistical analysis and data interpretation. The data will be analyzed using a number of programs including Qiime, LefSe, Oligotyping, and R.
Instructor: Kasie Raymann, Ph.D., Post-doctoral Fellow with Dr. Nancy Moran and Howard Ochman
Instructor Bio: Kasie Raymann earned a B.S. in biology from Indiana University Bloomington and a Ph.D. in evolutionary biology from Institut Pasteur in Paris, France. Her doctoral research focused on using large-scale comparative genomics and phylogenetics to investigate the evolutionary relationship between Archaea and Eukaryotes. Kasie started her postdoctoral research at the University of Texas at Austin in October 2014, under the mentorship of Nancy Moran and Howard Ochman. As a postdoctoral fellow, she is addressing organismal evolution at a finer scale, within the gut microbiome of animals. Her research seeks to understand the population dynamics of microbial communities and the evolutionary processes that shape communities over time.
Teaching Assistant: Louis Marie Bobay
Preferred or prerequisite skills: Familiarity with basic Linux/Unix command line is recommended.
UTEID and TACC Account required: Attendees must have UT EIDs for access to our course wiki, as well as accounts on TACC. Please be sure you know both your UT EID and your TACC username when you come to class. To obtain a UTEID, go here. To sign up for a TACC account, go here.

  Please read this disclaimer if you are using the UT ProCard for payment!

Back to top

Clinical Genomics

Day and Time: Mon-Thur 1:30 pm – 4:30 pm Location: MEZ 2.102
Description: This four-day course will introduce a selection of genomics methodologies in a clinical and medical context. We will cover genomics data processing and interpretation, quantitative genetics, association between variants and clinical outcomes, cancer genomics, and the ethics/regulatory considerations of developing medical genomics tools for clinicians. The course will have an optional lab component where participants will have the opportunity to explore datasets and learn basic genomics and clinical data analysis.
Instructor: Matthew Cowperthwaite, Ph.D., Director of Research
Instructor Bio: Matthew Cowperthwaite is the Director of Research and Technology for St. David’s Neuroscience and Spine Institute. Since 2008, he has leads the overall growth and development of the Institute’s research programs in the computational and biomedical sciences, including cancer genomics, big-data analytics, and image processing. He is actively working on brain-tumor evolutionary genetics and biomarker discovery, as well as the development of predictive analytics for surgical-site infections following lumbar spine fusions. Previously, Matt was the Biomedical Informatics Program Coordinator at the Texas Advanced Computing Center (TACC) at The University of Texas at Austin. In this position, he led the Center’s efforts in the biomedical computing space, including leading grant funding strategies, submissions of multi-institutional grant proposals, developing and deploying software stacks for biomedical researchers, and providing consulting service to the TACC biomedical research community. Matt earned a Bachelor of Science in Plant Biology, magna cum laude, from the University of Maryland in College Park, and a Ph.D. from The University of Texas at Austin in Cellular and Molecular Biology under the supervision of Dr. Lauren Ancel Meyers.
Preferred or prerequisite skills: Students interested in the laboratory component should be familiar with Bash/Unix, TACC, and Python/R programming.
Computer Lab: This course will take place in a computer lab with access to TACC.
TACC Account required:: Attendees will need an account TACC. Please be sure you know your TACC username when you come to class. To sign up for a TACC account, go here.

  Please read this disclaimer if you are using the UT ProCard for payment!

Back to top

Machine Learning Methods for Gene Expression Analysis

Day and Time: Mon-Thur 9:00 am - 12:00 p.m. Location: MEZ 1.122
Description: This four-day course will introduce a selection of machine learning methods used in bioinformatic analyses of RNA-seq, RT-qPCR, and microarray data. We will cover normalization, unsupervised learning and clustering, feature selection and extraction, and supervised learning methods for classification (e.g., random forests, SVM, LDA, kNN, etc.) and regression (with an emphasis on regularization methods appropriate for high-dimensional problems). Participants will have the opportunity to apply these methods as implemented in R and python to publicly available data.
Instructor: Dennis Wylie, Ph.D., Bioinformatics Consultant
Instructor Bio: Dennis Wylie joined the CCBB Bioinformatics group in 2015. He has experience in NGS data analysis including variant calling and RNA-Seq-based biomarker discovery and predictive modeling (classification, regression, and time-to-event). Prior to UT, he earned a PhD in Biophysics from UC Berkeley applying stochastic simulation methods to problems in immunology, did postdoctoral work modeling the transmission of infectious disease, and spent six years as a bioinformatician in industry.
Preferred or prerequisite skills: This course is recommended for students with some prior knowledge of either R or python.
Laptop requirements: Participants are expected to provide their own laptop with R ≥ 3.1 and python 2.7 installed. Students will be instructed to download several free software packages (including R packages and python libraries such as including pandas and sklearn).
TACC Account required: Attendees will need an account TACC. Please be sure you know your TACC username when you come to class. To sign up for a TACC account, go here.

  Please read this disclaimer if you are using the UT ProCard for payment!

Back to top

Introduction to RNA-Seq

COURSE IS FULL

Day and Time: Mon-Thur 1:30 p.m. - 4:30 p.m. Location: MEZ 4.128
Description: This four-day course provides an introduction to methods for analysis of RNA-seq data. It assumes familiarity and comfort with Linux command line and TACC. A typical RNA-seq workflow will be featured, starting from quality assessment of raw data, mapping (bwa, tophat2), differential expression analysis (DESeq, cuffdiff), splice variant analysis (cufflinks) and downstream analyses and visualization (cummeRbund). The course also describes analysis methods for non-traditional RNA-seq experiments such as RipSeq. Participants will gain hands-on experience using these tools in a Linux command line environment at TACC.
Instructor: Dhivya Arasappan, M.S., Bioinformatics Consultant
Instructor Bio: Dhivya Arasappan joined UT's Genome Sequencing and Analysis Facility (GSAF) as a Bioinformatician in 2009. Dhivya has over 6 years experience analyzing NGS data from multiple platforms including Illumina, PacBio and SOLiD. Her areas of expertise include de novo genome assembly, particularly using hybrid sequencing data, RNA-Seq analysis, exome analysis, and benchmarking of bioinformatics tools.
Preferred or prerequisite skills: This course is intended for students who are familiar with Unix, TACC, and R programming.
Computer Lab: This course will take place in a computer lab with a terminal, SSH client, and the Broad Integrative Genomics Viewer installed.
TACC Account required:: Attendees will need an account TACC. Please be sure you know your TACC username when you come to class. To sign up for a TACC account, go here.

  COURSE IS FULL!!!! Please read this disclaimer if you are using the UT ProCard for payment!

Back to top

TOPIC: PROTEOMICS

Introduction to Proteomics

Day and Time: Mon-Thur 9:00 a.m. - 12:00 p.m. Location: MEZ 4.144
Description: This four-day course will focus on the fundamental knowledge and skillsets needed to utilize mass spectrometry-based proteomics for biological research. We will cover key concepts of experimental design, instrumentation, and data processing. Participants will learn how to use a standard data analysis pipeline to generate peptide and protein identifications and interpret the biological significance of results. Topics will include quantitative proteomics and post-translational modifications. The goal of this course is to provide non-experts with the foundational knowledge necessary to access the power of proteomics for their own research interests.
Instructor: Dr. Daniel Boutz, Research Associate
Instructor Bio: Daniel Boutz received his Ph.D. in Molecular Biology from the University of California, Los Angeles. As a post-doctoral researcher and now Research Associate with Edward Marcotte at the Center for Systems and Synthetic Biology at UT Austin, he specializes in the use of mass spectrometry-based proteomics for systems-level studies of cellular biology and immunology.
Teaching Assistant: Andrew Horton
Computer Lab: This course will take place in a Windows computer lab with the software preinstalled.

  Please read this disclaimer if you are using the UT ProCard for payment!

Back to top

TOPIC: COMPUTATIONAL MODELING

Protein Modeling Using Rosetta THIS COURSE IS CANCELLED!

Day and Time: Mon-Thur 9:00 a.m. – 12:00 p.m. Location: MEZ TBA
Description: This four-day course is intended to give students a strong foundation and working knowledge for understanding computational protein design, a tool of growing importance in protein science. The course will introduce students to working with the Rosetta suite of software for macromolecular structure prediction and design. Course topics will include visualizing and assessing protein structures; understanding key concepts of scoring and sampling in computational protein design; using various protein design strategies; loop modeling; and homology modeling. Knowledge of general biochemistry, and basic understanding of x-ray crystallography methods for protein structure determination are prerequisites for this course.
Instructor: Dr. Kevin Drew, Ph.D., Postdoctoral Fellow
Instructor Bio: Kevin Drew is a computational biologist working as a postdoctoral fellow in Edward Marcotte’s Lab at UT Austin. His main research interests involve protein interactions and protein complexes specifically understanding their structure, function and ways to modulate their activity. He received his Ph.D. from New York University where he worked in the lab of Richard Bonneau. His thesis involved two components: 1) the Human Proteome Folding Project and 2) Peptidomimetic design. The Human Proteome Folding Project (HPF and HPF2) was a genome wide function annotation of 100 genomes using Rosetta protein structure predictions produced on IBM’s World Community Grid. His work on peptidomimetic design involved developing code with in the Rosetta software suite to design inhibitors of protein protein interactions.
Preferred or prerequisite skills: The intended audience will have basic knowledge of biochemistry of proteins. Familiarity with using Linux or Unix terminals and Pymol are strongly encouraged by not required.
Computer Lab: This course will take place in a Linux computer lab with PyMol pre-loaded.

 Please read this disclaimer if you are using the UT ProCard for payment!

Back to top

Computational Modeling to Study Evolution in Action THIS COURSE IS CANCELLED!

Day and Time: Mon-Thur 9:00 a.m. – 12:00 p.m. Location: MEZ TBA
Description: This class is about the study of evolution using computational model systems. We will use two different systems for digital evolution: Avida and “Markov Gate Networks” exploring many different possibilities of using computational systems for evolution research. The first two days of this course will feature a high level discussion of artificial life approaches to evolutionary questions and then dive into a hands on introduction to the Avida Digital Evolution Research Platform, a popular artificial life system for biological research. Participants will get experience with setting up and running an Avida experiment in a Unix command-line environment on TACC, as well as with compiling, visualizing, and interpreting the resulting data in Python. On days three and four, we will continue to use python and introduce the Markov Gate Network modeling framework. We will explore ways to study questions pertaining to neuro evolution, behavior, and artificial intelligence.
Instructors: Dr. Arend Hintze, Ph.D., and Emily Dolson
Instructor Bios: Arend Hintze is a Professor for Integrative Biology and Computer Science and Engineering, at Michigan State University. He uses computational model systems to study the evolution of intelligence, artificial intelligence, decision making, and game theory.

Emily Dolson is a 3rd year Ph.D. Student in Computer Science, Ecology, and Evolutionary Biology in Charles Ofria's lab at Michigan State University. She uses Avida to study the fundamental evolutionary principles behind the generation and maintenance of diversity in complex ecosystems.

Preferred or prerequisite skills: This course is designed for researchers who are interested in applying artificial life techniques to address biological questions. Familiarity with Unix, TACC, and Python is preferred. If you bring C++ programming knowledge as well, we can also explore more complex problems.
Computer Requirements: A laptop with a terminal, SSH client, and Jupyter Notebook. (Optional: a c++ development environment installed like XCODE or Visual Studio will help those students with a deeper programming background.)
TACC Account required: Attendees will need an account TACC. Please be sure you know your TACC username when you come to class. To sign up for a TACC account, go here.

We would like to acknowledge the BEACON Center for the Study of Evolution in Action for supporting the Computational Modeling to Study Evolution in Action course.

 Please read this disclaimer if you are using the UT ProCard for payment!

Back to top

REGISTRATION AND PAYMENT

We will accept personal credit cards (American Express, MasterCard, Visa, Discover), UT ProCards (but please read this for important information regarding the use of the ProCard that could effect your registration), and IDT (interdepartmental transfer). Registration dates and fees are as follows:

Registration dates

Category

Registration Fees

March 2, 2016 - May 13, 2016

UTAustin or BEACON

  • Students* $175/course
  • Faculty or Staff* $275/course

UT-System

  • Students* $275/course
  • Faculty or Staff* $275/course

Non-UT Other

  • Students** $275/course
  • Participants $550/course
  • Groups of 5 or more from same agency or institution: $440 per person/course (Call 512-471-5261 for group registration)
  • * Our staff will confirm affiliations with UT.
  • ** Non-UT students must send us a copy of their current student identification. Send PDF scan to this email address
  • Contact our office at 512-471-5261 for more information.

Refund and Cancellation Policy
A full refund of registration fees, less a $25 cancellation fee, will be available if requested in writing and received by May 16, 2016. No refunds will be made after that date. Please note that course substitutions cannot be made. If you fail to cancel by the deadline and do not attend, you are still responsible for full payment. UT-Austin reserves the right to cancel courses and to return all fees in the event of insufficient registration.

Miscellaneous

Location: All workshops will take place in the MEZ building. Room number will be released to class roster shortly before course begins.

Food: Beverages and snacks will be served during 15 minute morning and afternoon breaks. There are also soda and snack machines located in the MEZ building.

Parking: Parking on campus is at a premium. Since the Summer School occurs during the break between the spring and summer semesters, the UT Shuttles will not be operating. The nearest parking garages are the Brazos parking garage (BRG) located at 210 E. MLK, the San Antonio Garage (SAG) located at 2420 San Antonio Street, and the AT&T Executive Education & Conference Center parking garage (CCG) located at 1900 University Avenue. Parking in any of the "A," "F," "D," "OV" or "O" spaces on campus might result in the issuance of a citation. Rates for all campus garages.

Visitor Information: 


• UT Campus Visitor Center information

UT Austin maps
Austin and the UT Drag Travel Guide