1 22-DataVis

Previous: 21-FunctionalProg.html

1.1 Screencasts

1.2 Modeling the world with matrices

* Why should you care about crossword?
* Why should you care about matrices!
* What did the first pre-computers process?
* What is a List?
* Just a special case of a matrix
* THE universal user interface (UI)
* What is an image?
* ../../Bioinformatics/Content/20-ImageBasics.html (actually review the top of this page.html)
* 22-DataVis/data_00a_matplotlib.py (pudb3)
* How do yo detect a face?
* How do you detect a simple shape?
* How do you detect a line?
* How do you detect an edge?
* 22-DataVis/data_00b_images.py (spyder)
* How do I do computer vision, or machine learning face recognition?
* How does one store/model a 3D environment, like a realistic game map?
* How does one image a brain, a brain over time?
* How does one keep abstract time series data?
* How does one keep abstract experimental data?
* How does one simulate a game or real-world conflict over a space?

Matrices are deeply intertwined with computation!
Welcome to the MATRIX!

+++++++++++ Cahoot-22a.1

1.3 ++++++++++++ Lecture 2 starts here

1.4 Better numerical matrices

1.4.1 Code

Step the code!
* 22-DataVis/data_01_numpy.py

1.4.2 numpy

Show these links, but don’t go over them in detail:
* https://scipy-lectures.org/intro/numpy/index.html
* https://numpy.org/doc/stable/user/index.html
* https://numpy.org/doc/stable/user/absolute_beginners.html
* https://numpy.org/doc/stable/user/quickstart.html
* https://numpy.org/doc/stable/user/basics.html
If you are interesting computational math, modeling, physics, AI, or machine learning, I highly suggest you read the above tutorials in full.

++++++ Cahoot-22b.1

1.5 ++++++++++++ Lecture 3 starts here

1.6 Data analysis

"In data science, 85 percent of time spent is preparing data, 10 percent of time is spent complaining about the need to prepare data, and 5 percent of the time is actually analyzing or modeling data..."

**"Datasets are like people... interrogate them enough, and they will tell you whatever you want to hear... whether or not it is true."**

The state of data analysis in many domains of science is indeed actually this dark, sometimes in this way:

If you can’t see the pattern, with simple descriptive statistics and graphs, the pattern is probably not real!

1.6.1 Critiques of the publication model

What to do about it??

**Dr. Taylor's Tao of data analysis: Follow the data, and abstract as little as possible!**

Occasionally, thoughtful abstraction and summary statistics will be needed and helpful, but much more rarely, and usually only in the end-stage analysis or automation, not in initial exploration (initial bushwhacking science).

1.6.2 Code

1.6.3 Reading statistics

For the one-off little summary, not really for large-scale data analysis:
* https://docs.python.org/3/library/statistics.html

If we are doing science, how do we organize our data correctly the first time, so as not to have to spend all that time wrangling it?
* Wide, narrow, columns, rows?

If you are doing data analysis, what language do you use?
* Python
* R
* Matlab
* Julia
Provide some history and context on these and the dataframe.

To learn more:
https://learnxinyminutes.com/docs/pythonstatcomp/ pandas

Q: How did the panda interpret the data wrong?
A: He was “Bamboozled”!
pandas has created pandamonium in the data science world!

A great way to pander to the needs of your data… as you ponder the dataset’s deeper meaning.

The pandas dataframe allows you to arbitrarily retrieve complex subsets of your data!!!

Note: pandas was/is a rapidly evolving package, and they have ruthlessly broken backwards compatibility for new optimizations over the years, so these (or any) cheatsheets may not be current.


In the past, you may have pulled data you wanted to analyze into excel, whereas pandas can do all that and more! More on data wrangling and analysis

* https://www.tomasbeuzen.com/python-programming-for-data-science/README.html (good interactive ipynb book).
* https://jakevdp.github.io/PythonDataScienceHandbook/ (good book in Jupyter notebooks)
* https://pythonprogramming.net/data-analysis-tutorials/
* http://data-analysis-in-python.org/
* https://pandas.pydata.org/pandas-docs/stable/tutorials.html
* http://shop.oreilly.com/product/0636920023784.do
* Pandas cheat sheet: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

+++++++++++++ Cahoot-22c.1

1.7 ++++++++++++ Lecture 4 starts here

1.8 More plotting

Data scientists love beautiful data pictures!
* http://www.scipy-lectures.org/intro/matplotlib/index.html
* https://matplotlib.org/tutorials/
* https://matplotlib.org/tutorials/introductory/usage.html (go over in lecture)
* https://matplotlib.org/tutorials/introductory/pyplot.html

1.9 Jupyter/Qarto/Jupytext/Markdown/Mathematica/Sagemath notebooks

See: ../../Bioinformatics/Content/02-PlatformTools.html

1.10 Follow-ups if you’re interested General data analysis

These are some resources to actually learn data analysis and science in a focused, sequential way:
* https://jakevdp.github.io/PythonDataScienceHandbook/ (looks like a quite good book, built from Jupyter notebooks)
* http://data-analysis-in-python.org/

+++++++++++ Cahoot-22d.1
What is jupyter notebook?

A lab notebook
A format for tutorials
A python interpreter
Sci-fy love story

Next: 23-Regex.html