1. Motivation and plan

I cannot imagine a career more wonderful than that of a scientist.

The day-to-day work in science today involves using computers at all times. Scientists who master their computers and can program them with agility are the ones who enjoy the job the most and are often in great demand: they can carry out unique new research.

I have developed a series of lessons on scientific computing, aimed at kids who have already taken my “Serious Programming For Kids” course [Gal15]. This booklet covers lessons 1-4. I have two goals with these lessons:

  1. introduce the tools and tricks for scientific computing, and
  2. take a tour of diverse scientific problems that demonstrate “realy interesting” things you can do with some programming knowledge.

The course teaches scientific computing using Python on the GNU/Linux operating system. There are other possible choices of programming language and operating system, and some of them are adequate, but there are specific reasons for which I chose Python and GNU/Linux. Some are those given in the “Serious Programming for Kids” teacher’s manual, but here are some other reasons which are specific to scientific work:

  • Scientific software often matures into sophisticated programs which need to be executed on production computers and in a reproducible manner. For this the use of a free/open-source operating system and language interpreter are crucial.
  • Much scientific infrastructure is available as an integral part of the GNU/Linux distributions. For example, on a current Debian GNU/Linux or Ubuntu distribution you will find that the GNU Scientific Library, astropy, scipy, a remarkable number of R science packages, and much much more are “just there” as part of the operating system. This comes in part from the fact that the GNU/Linux operating system is developed by hackers for hackers: programming is a seamless part of such systems.
  • Python spread rapidly soon after its initial development. Thanks to some key early developers being part of physics, astronomy and biology research groups, it was rapidly adopted by the scientific community. The result is a vast collection of scientific libraries.
  • Many research projects have very long lives, and the software is used for years after it is first written. My opinion, and that of many who observe the business of scientific computing, is that programs written in Python on a GNU/Linux system will be stable [1]
  • Reproducibility again: using proprietary software in scientific research makes it impossible to reproduce or verify a result.
  • Reproducibility and verifiability also dictate that scientific software should be able to run in batch mode, rather than through a graphical user interface (GUI).*A GUI is not necessarily a bad thing, but after initial exploration of data with a GUI, the scientist needs to then generate a batch program to reproduce the results.*

1.1. Notes for teachers

This is a teacher’s manual for the mini courses. In the 10-hour hacking camp workshop which introduces Python from scratch, I teach at a blackboard (or whiteboard nowadays).

This course is quite different: it is for students who have already taken the 10-hour workshop, and already have a laptop ready and running a GNU/Linux distribution.

The format is 1.5 hours, and I lecture with a projector or large TV screen, working on examples in emacs or in the command line.

While I lecture I have the students load this book, usually from my bitbucket site at http://markgalassi.bitbucket.io/ – this allow them to past in code samples if they are too long to type.

I usually project a couple of terminals (one for python snippets, one for shell commands), a browser window with the relevant chapter of this book, and the emacs editor. This allows the students to see how I do the work.

The lecturing style should be one of quickly getting a juicy example up on their screens: something that gives visible results for the students. Then step back a bit to make sure they understood how we got to it, and then quickly on to the next example.

This is hard work for the students: I have developed this course to include serious material they might otherwise not learn until college, so I often ask the students to “suspend their not understanding” [2] and just latch on to one or two things they can remember. For example I introduce Fourier Analysis (Chapter [chap:looking-deeply]), and when I give that lecture I frequently repeat “remember: it is OK to not understand most of this, but repeat after me the one thing I want you to understand: all these signals look like wild jumbles, but they are made up of simple waves which let us understand part of their musical nature.”

In broad strokes you can think of two main categories of scientific computing effort: analyzing data from experiments, and simulating your own physical situation with a computer program that generates fake (but, we hope, realistic) experimental data. We will look at both of these types, and introduce the words: experiments and simulation as we go through the examples.

The way in which kids approach computers today allows them to not understand some concepts which are very important for scientific programming (and in fact any kind of programming). Because of this we must first get comformtable with the following concepts:

  • What is a data file.
  • How to plot a data file.
  • How to write a program which takes a data file, does some processing of the data, and writes out another file with the processed data.

Once we have these skills we can:

  • Tell the story of that plot.
  • Generate simulated data.
  • Retrieve data from online sources.
  • Record data from an experiment.
  • Analyze data to go beyond that initial story.

1.2. Acknowledgements

Thanks to David Palmer and Laura Fortunato for discussing this curriculum with me in detail before I developed it. Thanks to the many students who have taken the course and helped me develop it. Most of all thanks to Leina Gries for close collaboration on the book and for writing parts of it.

1.3. Status of the book

Some chapters are largely complete and just need polishing and proofreading; some have just a title; some are partially written.

Until the status is a bit more uniform, I will be putting a “readiness” status note at the top of the chapter. If you do not see such a status note then the chapter is probably not complete!

1.4. Footnotes

[1]Programs written in the C programming language on a GNU/Linux system will be even more stable, thanks to the maturity and stability of the C standard. C is also a delightful and powerful language, but it is not in the scope of what I teach to younger kids.
[2]A pun on Coleridge’s “suspension of disbelief” – with topics of great complexity it is important for students to be flexible about temporarily accepting a building block that they don’t undersand so that they can keep with the flow.