1 Content


1.1 Schedule and due dates

1.2 Topical outline

1.2.1 Introduction, big picture, review, technical setup

  1. Inspiration, introduction, syllabus, etc
    1. Slides: Content/00-Inspiration.html
    2. Review syllabus
    3. Review Canvas
    4. Meet your neighbor and exchange contact information
  2. The second scientific revolution (e-science)
  3. Tech setup, ipython, git-classes, code notebooks
  4. Python review
  5. Genetics review
  6. Bioinformatics basics

1.2.1.1 Programming assignment 00 (pa00-platform)

1.2.2 Biological sequence processing

To get the relevant files for this section,
start up the class VM,
navigate to a directory you want to store the lecture notes in,
and then:
git clone https://gitlab.com/bio-data/sequence-informatics.git
As I add lectures, or improve old ones,
they’ll be updated in the repo,
so you can just do this is the repo to get the latest notebook scripts:
git pull

  1. Introduction and bioinformatics software (read as part of pa00-platform, not lecture)
  2. Pairwise sequence alignment
  3. Database searching and sequence homology
  4. Multiple sequence alignment
  5. Phylogenetic reconstruction
  6. Sequence mapping and clustering, SOM
  7. Diversity informatics
  8. Machine learning in bioinformatics

A fun recent example of a large genomics project employing some neat machine learning analyses to the data:
https://zoonomiaproject.org/ 
A pop news article about it:
https://www.vice.com/en/article/4a3wwg/scientists-sequenced-dna-of-nearly-every-mammal-on-earth-in-unprecedented-project
One paper using theses methods to understand the genetics of brain size expansion throughout evolution:
https://www.science.org/doi/10.1126/science.abm7993 

1.2.2.1 Programming assignments 01

Pairwise alignment and automated PCR primer selection

1.2.2.2 Programming assignments 02

Multiple sequence alignment (MSA) and OTU clustering

1.2.2.3 Programming assignments 03

K-means vs. self-organizing map (SOM) for cancer clustering and dimensionality reduction

1.2.3 Classification of bio-data

git clone https://gitlab.com/bio-data/machine-learning.git

  1. sklearn, Breast Cancer dataset intro
  2. Intro to supervised learning, k-NN on pre-extracted Breast Cancer image features
  3. Regression, Bayes models, and Decision trees on pre-extracted Breast Cancer image features
  4. Random forest, Neural networks, SVM on pre-extracted Breast Cancer image features
  5. Classification of Leukemia sub-type by gene expression data

1.2.3.1 Programming assignment 04

pa04_supervised
Topic: (Human breast cancer diagnosis via gene expression data categorization)

1.2.4 Network/Graph theory in biological and neurological sciences

git clone https://gitlab.com/bio-data/graph-network.git

  1. Formalizing biological networks
  2. Network and Graph theory primer
    1. Reading:
      • http://barabasi.com/f/147.pdf
      • http://networksciencebook.com/ chapter 1, 2
    2. Slides:
  3. Python tools for networks and graphs
  4. Network datasets
  5. Demonstration of graph processing for human brain connectivity
  6. Human diseaseome

1.2.4.1 Programming assignment 07

1.2.5 Vision in bioinformatics / Bioimage informatics

git clone https://gitlab.com/bio-data/computer-vision.git

  1. Basic image encoding and processing:
  2. Bioimage informatics

1.2.5.1 Programming assignment 06

Topic: (Cell image nucleus identification and labeling)
https://datasciencebowl.com/competitions/spot-nuclei-speed-cures/
https://www.kaggle.com/c/data-science-bowl-2018
https://www.nature.com/articles/s41592-019-0612-7
https://www.youtube.com/watch?v=Dbiq6l50zO8
https://www.youtube.com/watch?v=eHwkfhmJexs

1.2.6 Computational epidemiology

Maybe, if we get this far:
Content/24-CompEpi.html

1.3 (maybe below, if there’s ever time)

1.3.1 Human whole genome analysis

5Practical approach to human genome analysis
a. +PracticalWGS

1.3.1.1 Programming assignment 06

Topic: script to analyze a sample human genome