header
Personnel

bdibsummer

2015 SUMMER SCHOOL FOR BIG DATA IN BIOLOGY

The Center for Computational Biology & Bioinformatics at The University of Texas at Austin is proud to host the 2nd annual Summer School for Big Data in Biology May 26–29, 2015.

The Summer School offers intensive four-day workshops on diverse topics for analysis of large-scale DNA, RNA, and protein datasets. The summer schools provides a unique hands-on opportunity to acquire valuable skills directly from experts in the field, with courses tailored towards novices or intermediate and advance users.

This year we offer 10 workshops. Each workshop will meet for four half-days (either mornings or afternoons) for a total of twelve hours. Instructors will post lectures, datasets, exercises, and course information on a website accessible to enrolled participants. There will be no examinations, but participants may request certificates of completion. Academic credit will not be issued. Please carefully check the specified prerequisite knowledge before enrolling in a course.

If you have never used TACC, Linux/Unix, or R, but the course suggests familiarity with these, here are some useful links.

    1. UTEID: To obtain a UTEID, go here.
    2. TACC: To sign up for a TACC account, go here.
    3. Linux: For a Linux refresher or to gain basic skills, go here.
    4. R: For an R refresher or to gain basic skills, go through the intro and Data2 sections here.
MORNING 9 a.m.-12 p.m. AFTERNOON 1:30 p.m.-4:30 p.m.
Introductory Courses (no programming experience necessary)

Introduction to Python

 

Introduction to Core NGS Tools

DNA and RNA sequencing methods and analyses

Bacterial Genome Annotation

Introduction to RNA-Seq

 

Genome Variant Analysis

Global Gene Expression Profiling with Tag-Based RNA-Seq

 

Machine Learning Methods for Gene Expression Analysis

 

Proteomics and Protein Modeling

Introduction to Proteomics

Protein Modeling Using Rosetta

Systems Biology

Using Biological Networks to Interpret Data

 

 

Workshop Descriptions

Introduction to Python

Time: 1:30 pm – 4:30 pm
Description: This four-day workshop provides a basic introduction to python programming in the context of biological computing. Students will learn basic Unix as well as fundamental concepts in python programming, including data structures, control flow, and code hygiene. Skills and topics introduced in this course will contribute to a solid foundation and familiarity with basic programming concepts for students to apply in their own research.
Intended Audience: Anyone interested in performing computational analyses on biological (or other) data is welcome to attend. No background is required, although basic familiarity with Unix and programming concepts may be helpful.
Computer Requirements: Students should bring their own laptop. Software will be installed in class.
Instructor: Stephanie Spielman, Graduate Student
Instructor Bio: Stephanie Spielman is a 4th year Ph.D. candidate in Claus Wilke’s lab through the EEB program. Her research focuses on computational molecular evolution. She is broadly interested in “best practices” in evolutionary data analysis - in other words, what are the best approaches/methodologies that we can use to extract meaningful biological information from genetic data? Stephanie is also interested in teaching, mentoring, and education.

Back to top

Bacterial Genome Annotation

Time: 9 am – 12 noon
Description: In this four-day workshop, we'll examine strategies and tools for annotating bacterial sequences. We'll start with advanced Unix core tools to manipulate tables of data. We'll talk about strategies for finding homology in public databases, focusing on BLAST and HMMER. The course will also feature a quick introduction to SQL and relational databases, with applications to finding gene candidates for particular functions and storing annotation data. The focus will be on bacteria, but the tools should be useful for any organism.
Intended Audience: Familiarity with Unix is preferred. All participants will need a TACC account.
Computer Requirements: Students should bring their own laptop. Mac users will already have Terminal installed. Windows users should download PuTTY and WinSCP.
Instructor: Benjamin Goetz, M.S., Bioinformatics Consultant
Instructor Bio: Benni Goetz is a bioinformatics consultant with the Center for Computational Biology & Bioinformatics (CCBB). He has worked with NGS data for the past two years and has implemented some common pipelines on TACC. Before joining CCBB, Benni received his masters degree in mathematics from UT Austin.

 

Back to top

Introduction to Core NGS Tools

Time: 1:30 pm – 4:30 pm
Description: This four-day workshop provides an introduction to common analysis tools and file formats currently used in next generation sequencing (NGS), with emphasis on read mapping (bwa, bowtie2), the Sequence Alignment Map (SAM) format, and tools for manipulating BAM files (samtools, bedtools). Participants will gain hands-on experience using these and other NGS tools in the Linux command line environment at TACC, as well as exposure to the many bioinformatics resources TACC makes available.
Intended Audience: Intended audience are researchers planning or just starting to perform NGS experiments. Attendees must have UT EIDs for access to our course wiki, as well as accounts on TACC. Please be sure you know both your UT EID and your TACC username when you come to class.
Computer Requirements: This course will take place in a computer lab with internet access and a terminal program (e.g. putty).
Instructor: Anna Battenhouse, B.S., Associate Research Scientist
Instructor Bio: After a 20+ year career in commercial software development (Texas Instruments, Motorola, etc.), Anna Battenhouse returned to school in 2005 to study Biochemistry. She joined the Iyer Lab in 2007 as a "retirement career"" in computational biology. As part of the Iyer Lab's main research focus in functional genomics (large-scale transcriptional reprogramming in response to diverse stimuli), the lab has been performing high-throughput sequencing experiments since 2007. To date we have more than 1,500 NGS datasets from ChIP-seq, RNA-seq, RIP-seq, MNase-seq, and miRNA-seq experiments among others.
Co-Instructors: Amelia Weber-Hall and Nathan Abell

 

Back to top

Genome Variant Analysis

Time: 9 am – 12 noon
Description: This four-day workshop is designed to teach you how to get from next generation sequencing data to the identification of potentially important mutations within a sample or population. Multiple analysis tools will be introduced including breseq, fastQC, SAMtools, Bowtie2, the Burrows-Wheeler Aligner, and the Integrative Genomics Viewer. Real sequencing results will be used in a wide range of interactive tutorials.
Intended Audience: Familiarity with Linux/Unix is preferred. All participants will need a TACC account.
Computer Requirements: This course will take place in a computer lab.
Instructor: Daniel E. Deatherage, PhD., Postdoctoral Fellow
Instructor Bio: Dr. Deatherage earned his doctorate at The Ohio State University studying epigenetic effects of ovarian cancer. His postdoctoral work in Jeff Barrick’s lab has focused on using next generation sequencing to identify ultra rare mutations within evolving populations.
Teaching Assistant: Sean Leonard

 

Back to top

Introduction to RNA-Seq

Time: 1:30 pm – 4:30 pm
Description: This four-day workshop provides an introduction to methods for analysis of RNA-seq data. It assumes familiarity and comfort with Linux command line and TACC. A typical RNA-seq workflow will be featured, starting from quality assessment of raw data, mapping (bwa, tophat2), differential expression analysis (DESeq, cuffdiff), splice variant analysis (cufflinks) and downstream analyses and visualization (cummeRbund). The course also describes analysis methods for non-traditional RNA-seq experiments such as RipSeq. Participants will gain hands-on experience using these tools in a Linux command line environment at TACC.
Intended Audience: This course is designed for researchers who will have or already have RNA-seq data. Familiarity with Unix, TACC, and R programming is preferred.
Computer Requirements: This course will take place in a computer lab with a terminal, SSH client, and the Broad Integrative Genomics Viewer installed.
Instructor: Dhivya Arasappan, M.S., Bioinformatics Consultant

Instructor Bio: Dhivya Arasappan joined UT's Genome Sequencing and Analysis Facility (GSAF) as a Bioinformatician in 2009. She has 5+ years experience handling and analyzing NGS data. Areas that she has experience working in are: RNA-Seq analysis, de novo genome assembly, identification of variants in exomes, and Benchmarking of Bioinformatics tools. Prior to UT, she was working at the National Center of Toxicological Research, FDA and was involved in analysis of gene expression data. She has a masters in Bioinformatics from Virginia Commonwealth University and a Bachelors in Computer Science from Anna University, India.

 

Back to top

Global Gene Expression Profiling with Tag-based RNA-seq

Time: 1:30 pm – 4:30 pm
Description: Tag-based RNA-seq is a low-cost alternative ($50/sample) to conventional RNA-seq for quantifying the abundances of polyadenylated (protein-coding) transcripts. Low cost and ease of implementation allows for experimental designs involving extensive biological replication. This results in high power to detect differentially expressed genes and offers the possibility to apply network-based approaches to gene expression analysis. That being said, of the topics that the class will cover only the first two are specific for the tag-based method, the rest are relevant for any RNA-seq flavor.
Topics during this four-day workshop will cover:

  • Dry wet lab: outline of RNA isolation and library preparation procedures
  • From reads to counts: sequence data processing using LINUX-based high performance computing (HPC) cluster
  • Say ""R"": identifying differentially expressed genes using generalized linear models (DESeq2 package) and network approach (WGCNA package
  • So many changes but what do they all mean? - Functional summaries and visualization.
  • Laying the foundation: de novo assembly and annotation of transcriptomes.

Intended Audience: This course is intended for people who are familiar with Linux and R.  All participants should have a TACC account.
Computer Requirements: Students should bring their own laptop with the following pre-installed.
Windows users: Install R-studio, PuTTY and WinSCP. Login to TACC’s Lonestar using PuTTY.
Mac users: Install the basic R (http://cran.r-project.org/). Mac users should login to TACC’s Lonestar using internal Mac program Terminal.
Instructor: Mikhail V. Matz, PhD., Associate Professor
Instructor Bio: The Matz lab studies how the environment shapes the structure and function of the genome, with the focus on reef-building corals. Researchers in the lab look at the inter-relationships between genetic variation, gene expression, and fitness in natural coral populations aiming to understand the process of acclimatization and adaptation to the natural diversity of reef environments and to the effects of climate change. They make extensive use of next-generation sequencing technologies and continuously develop novel laboratory and bioinformatic methods for genomic analysis of non-model organisms.

 

Back to top

Machine Learning Methods for Gene Expression Analysis

Time: 9 am - 12 noon
Description: This four-day workshop will introduce a selection of machine learning methods used in bioinformatics analyses of RNA-seq, RT-qPCR, and microarray data. We will cover normalization, unsupervised learning and clustering, feature selection and extraction, and supervised learning methods for classification (e.g., random forests, SVM, LDA, kNN, etc.) and regression (with an emphasis on regularization methods appropriate for high-dimensional problems). Participants will have the opportunity to apply these methods as implemented in R and python to publicly available data.
Intended Audience: Some prior knowledge of either R or python is recommended.
Computer Requirements: Participants are expected to provide their own laptop with R ≥ 3.1 and python ≥ 2.7 installed. Students will be instructed to download several free software packages (including R packages and python libraries (including pandas and sklearn).
Instructor: Dennis Wylie, PhD., Bioinformatics Consultant
Instructor Bio: Dennis Wylie joined the CCBB Bioinformatics group in 2015. He has experience in NGS data analysis including development and application of SNV-calling algorithms as well as RNA-Seq-based biomarker discovery and predictive modeling (classification, regression, and time-to-event). Prior to UT, he earned a PhD in Biophysics from UC Berkeley applying stochastic simulation methods to problems in immunology and did postdoctoral work in modeling the transmission of infectious diseases on contact networks before spending six years as a bioinformatician in industry.

 

Back to top

Introduction to Proteomics

Time: 9 am – 12 noon
Description: This four-day workshop will focus on teaching the fundamental knowledge and skillsets needed to understand mass spectrometry-based proteomic data. We will cover key concepts and practical applications of experimental design, instrumentation, and data processing. Participants will learn how to use a standard data analysis pipeline to generate peptide and protein identifications and interpret the biological significance of results. Topics will include quantitative proteomics and post-translational modifications. The goal of this course is to establish a foundational understanding of proteomic data and a working knowledge of standard tools and methods on which to build.
Intended Audience: This course is designed for researchers with a limited-to-intermediate background in protein mass spectrometry and/or bioinformatics. Knowledge of general biochemistry is required, with an emphasis on protein chemistry. No programming experience is needed, but familiarity with Microsoft Excel is strongly encouraged.
Computer Requirements: This course will take place in a computer lab.
Instructor: Daniel Boutz, PhD., Research Associate
Instructor Bio: Daniel Boutz received his PhD in Molecular Biology from the University of California, Los Angeles. As a post-doctoral researcher and now Research Associate with Edward Marcotte at the Center for Systems and Synthetic Biology at UT Austin, he specializes in the use of mass spectrometry-based proteomics for systems-level studies of cellular biology and immunology.
Co-Instructor: Viswanadham Sridhara, PhD., Bioinformatics Consultant

 

Back to top

Protein Modeling Using Rosetta

Time: 1:30 pm – 4:30 pm
Description: This four-day workshop is intended to give students a strong foundation and working knowledge for understanding computational protein design, a tool of growing importance in protein science. The course will introduce students to working with the Rosetta suite of software for macromolecular structure prediction and design. Course topics will include visualizing and assessing protein structures; understanding key concepts of scoring and sampling in computational protein design; using various protein design strategies; loop modeling; and homology modeling. Knowledge of general biochemistry, and basic understanding of x-ray crystallography methods for protein structure determination are prerequisites for this course.
Intended Audience: The intended audience will have basic knowledge of biochemistry of proteins. Familiarity with using Linux or Unix terminals and Pymol are strongly encouraged by not required.
Computer Requirements: This course will take place in a Linux computer lab with PyMol pre-loaded
Instructor: Dr. Oana I. Lungu, Postdoctoral Fellow
Instructor Bio: Dr. Oana Lungu earned a B.S. in biochemistry from the University of Minnesota Twin Cities and a Ph.D. in Biochemistry from the University of North Carolina at Chapel Hill. During her graduate studies, she worked on computationally designing, expressing, and characterizing light-activated proteins, which allow for the manipulation of protein signaling networks. As a Postdoctoral fellow, she is applying knowledge of computational protein structure modeling to a different problem: antibody structure prediction. Diversity in the sequences and structures of antibodies that constitute the human immune repertoire is critical for mounting an effective response against foreign antigens. Oana’s research under the mentorship of George Georgiou and Andrew Ellington seeks to characterize and understand the structural antibody repertoire and its relationship to antibody sequence in B cell populations.

 

Back to top

Using Biological Network Analysis to Interpret Data

Time: 9 am – 12 noon
Description: This four-day workshop is a guide to biological networks: their application, analysis, and construction. In the post-genomic era, data sets in biology have grown ever larger and harder to interpret. Biological networks provide conceptual tools for understanding large amounts of experimental data and annotation within a coherent framework. This course will introduce the foundational concepts and tools needed to understand biological networks and use them to reveal the biological meaning concealed in large data sets. The course will include networks based on gene expression, protein-protein interactions, phenotype profiles, and other large scale data sets. This course is approximately 50% lecture and 50% computer lab.
Intended Audience: Biologists with experience handling at least moderately sized data sets (e.g., from RNA-seq, microarrays, proteomics, etc.) will benefit from the course. Half of the lab portion of the course will be taught using R. Familiarity with at least one programming language is STRONGLY preferred, e.g., R, Python, Perl. Students without any programming experience will find this section of the lab difficult to understand. Experience using the command line in Linux/Unix will be helpful.
Computer Requirements: This course will take place in a Linux computer lab with Linux machines with R, RStudio, some R packages, Cytoscape 3.+, and Perl.
Instructor: Kris McGary, Ph.D., Postdoctoral Fellow
Instructor Bio: Dr. Kris McGary received his Ph.D. from the University of Texas at Austin in Cell and Molecular Biology (advisor: Edward Marcotte). His research seeks to understand how biological networks are encoded in the genome, how they evolve, and how these networks shape organismal traits and diseases. Kris is currently a post-doctoral fellow at Vanderbilt University, in the lab of Antonis Rokas.
Teaching Assistant: Jon Laurent


Back to top

REGISTRATION AND PAYMENT

Registration dates and fees are as follows:

Registration dates

Category

Registration Fees

March 12, 2015 - May 18, 2015

UT-Austin or BEACON

  • Students* $175/course
  • Faculty or Staff* $275/course

UT-System

  • Students* $275/course
  • Faculty or Staff* $275/course

Non-UT Other

  • Students** $275/course
  • Participants $550/course
  • Groups of 5 or more from same agency or institution: $440 per person/course (Call 512-471-5261 for group registration)

Refund and Cancellation Policy
A full refund of registration fees, less a $25 cancellation fee, will be available if requested in writing and received by May 17, 2015. No refunds will be made after that date. Please note that course substitutions cannot be made. If you fail to cancel by the deadline and do not attend, you are still responsible for full payment. UT-Austin reserves the right to cancel workshops and to return all fees in the event of insufficient registration.

Miscellaneous

Location: All workshops will take place in the MEZ building. Room number will be released to class roster shortly before course begins.

Food: Beverages and snacks will be served during 15 minute morning and afternoon breaks. There are also soda and snack machines located in the MEZ building.

Parking: Parking on campus is at a premium. Since the Summer School occurs during the break between the spring and summer semesters, the UT Shuttles will not be operating. The nearest parking garages are the Brazos parking garage (BRG) located at 210 E. MLK, the San Antonio Garage (SAG) located at 2420 San Antonio Street, and the AT&T Executive Education & Conference Center parking garage (CCG) located at 1900 University Avenue. Parking in any of the "A," "F," "D," "OV" or "O" spaces on campus might result in the issuance of a citation. Rates for all campus garages.

Visitor Information: 


• UT Campus Visitor Center information

UT Austin maps
Austin and the UT Drag Travel Guide

 

OTHER TRAINING OPTIONS

Short Courses

Informal Semester-long Classes

Collaboratorium

For-Credit Courses

Division for Statistics and Scientific Computing (SCS)

TACC (Texas Advanced Computing Center)

See archived 2014 summer school courses here

Return to main CCBB training page