Course Description

Bioinformatics encompasses the analysis of gene sequences, macromolecular structures, and functional genomics data on a large scale. It represents a major practical application for modern techniques in data mining and simulation. Specific topics to be covered include sequence alignment, large-scale processing, next-generation sequencing data, comparative genomics, phylogenetics, biological database design, geometric analysis of protein structure, molecular-dynamics simulation, biological networks, normalization of microarray data, mining of functional genomics data sets, and machine learning approaches for data integration.

Overall Flow of the Class:
(Module = Group of Lectures)
  • Introduction
  • Module on "the Data" (Genomic, Proteomic & Structural Data), introducing the main data sources (their properties, where you access, &c)
  • Module on Databases & Data Science Issues (Knowledge Representation incl. Sem. Web & Privacy, Provenance & Standards)
  • Module on Mining (Alignment & Variant Calling, Supervised & Unsupervised Approaches, Networks)
  • Module on Cell Modeling
  • Module on Molecular Modeling
  • MW 1:00 - 2:15 PM, Bass 305 (plus some Fridays at same place and location)
Discussion Section:
  • F 1:00 - 2:00 PM, Bass 405

Different headings for this class (4 variants)

  • CB&B752/CPSC752 - Grad. w/ programming
This graduate-level version of the course consists of lectures, in-class tests, programming assignments, and a final programming project.
  • MB&B452/MCDB452 - Undergrad. 
This undergraduate version of the course consists of lectures, in-class tests, written problem sets, and a final (semi-computational section and a literature survey) project. 
  • MB&B752/MCDB752 - Grad. w/o programming 
This graduate-level version of the course consists of lectures, in-class tests, written problem sets, and a final (semi-computational section and a literature survey) project. Unlike CBB752, there is no programming required.
  • MB&B 753a3/MB&B 754a4 - Modules
For graduate students the course can be broken up into two "modules" (each counting 0.5 credit towards MB&B course requirement):
753 - Bioinformatics: Practical Application of Data Mining (1st half of term)
754 - Bioinformatics: Practical Application of Simulation (2nd half of term)
Each module consists of lectures, in-class tests, written problem sets, and a final, graduate level written project that is half the length of the full course's final project.
  • Auditing
This is allowed. We would strongly prefer if you would register for the class.


The course is keyed towards CBB graduate students as well as advanced MB&B undergraduates and graduate students wishing to learn about types of large-scale quantitative analysis that whole-genome sequencing will make possible. It would also be suitable for students from other fields such as computer science or physics wanting to learn about an important new biological application for computation.

Students should have:
  1. A basic knowledge of biochemistry and molecular biology. 
  2. A knowledge of basic quantitative concepts, such as single variable calculus, basic probability and statistics, and basic programming skills.
These can be fulfilled by: MBB 200 and Mathematics 115 or permission of the instructor.

Class Requirements
Discussion Section / Readings

Papers will be assigned throughout the course. These papers will be presented and discussed in weekly 60-minute sections with the TFs. A brief summary (a half-page per article) should be submitted at the beginning of the discussion session.

In-class tests: Midterm & Quiz

  • There will be a midterm covering the 1st half of the course.
  • There will be a quiz covering 2nd half of the course comprising SIMPLE questions that you should be able to answer from the lectures plus the main readings.
For references, please refer the previous quizzes and answer keys from Fall 2012

Programming Assignments (Req'd for CBB and CS students)

  • There will be FOUR homework assignments. We will try to promote the idea of reproducible research and using version control system, specifically GitHub, in facilitating the process of homework submission.
  • For Homework 1, you will be given an opportunity to get familiar with GitHub and programming with version control. You can choose to either submit your homework through GitHub OR through email. However, for the later assignments, you will only be able to submit homework through GitHub.
  • For the programming assignments, you can use either R or Python. However, if you would like to use other programming languages, please contact the TAs and request for a permission.
  • For detailed instruction and information, please refer the Start up for Homework 1 & Homework Submission Instructions.

Non-programming Assignments (For MB&B and MCDB students)

  • There will be equivalent FOUR homework assignments for MB&B and MCDB students without a programming background. Programming part will be replaced with assignments involving the use of web-based tools or essay questions.

Pages from previous years

2016 is the 19th time Bioinformatics has been taught at Yale. Pages for the 18 previous iterations of the class are available. Look at how things evolve!

  • Homework 3 DUE DATE: April 27th (Wednesday) 2016, 11:59 pmHomework 3 covers lectures from both Prof. Kleinstein and Prof. O'Hern.Choose to do either MCDB&MBB or CBB&CS ...
    Posted Apr 18, 2016, 11:31 AM by Donghoon Lee
  • Homework 2 DUE DATE: March 9th (Wednesday) 2016, 11:59 pmChoose to do either MCDB&MBB or CBB&CS homework, depending on your academic affiliation. No late submissions will be accepted ...
    Posted Feb 24, 2016, 8:57 AM by Donghoon Lee
Showing posts 1 - 2 of 4. View more »
Final Project
NameDue DateDescription
Showing 0 items from page Final Project sorted by Due Date, edit time. View more »
  • TASession1.pdf   8799k - Feb 17, 2016, 1:52 PM by (v1)
    ‎Slides for the first TA section‎
Showing 1 files from page Section Readings.