The schedule and due dates will be updated as we progress through
the semester (on Canvas).
Please check back regularly for changes.
1.2 Topical outline
Slides will usually be posted before class.
The outline will be filled in with slides, readings, and assignments
as the semester moves forward.
The topic schedule may change slightly as the semester
progresses.
mycode.tar.gz is a zip-like archive file.
In Linux/Unix/Mac
Just click on it in your file browser of choice, and it should open
an “archive manager” which can unzip it
At the command line, you can unpack it by executing in the directory
with the file: $tar -xf mycode.tar.gz
In Windows
you can open it with http://7-zip.org/
A file notebook.ipynb or notebook.md is an
example name for a Jupyter notebook, which can be opened on your
computer with the python suite tool suite detailed here: Content/02-PlatformTools.html
1.2.1 Introduction, big picture,
review, technical setup
Topic: Class virtual machine, jupyter, numpy, pandas,
matplotlib
Log into https://git-classes.mst.edu to see this and all future
assignments.
1.2.2 Biological sequence
processing
To get the relevant files for this section,
start up the class VM,
navigate to a directory you want to store the lecture notes in,
and then: git clone https://gitlab.com/bio-data/sequence-informatics.git
As I add lectures, or improve old ones,
they’ll be updated in the repo,
so you can just do this is the repo to get the latest notebook
scripts: git pull
Introduction and bioinformatics software (read as part of
pa00-platform, not lecture)
A fun recent example of a large genomics project employing some neat
machine learning analyses to the data:
https://zoonomiaproject.org/
A pop news article about it:
https://www.vice.com/en/article/4a3wwg/scientists-sequenced-dna-of-nearly-every-mammal-on-earth-in-unprecedented-project
One paper using theses methods to understand the genetics of brain size
expansion throughout evolution:
https://www.science.org/doi/10.1126/science.abm7993
1.2.2.1 Programming assignments
01
Pairwise alignment and automated PCR primer selection
1.2.2.2 Programming assignments
02
Multiple sequence alignment (MSA) and OTU clustering
1.2.2.3 Programming assignments
03
K-means vs. self-organizing map (SOM) for cancer clustering and
dimensionality reduction