Introduction

This course/book is designed as your 'second course' in programming. It is assumed you've done a first course which taught you the basics of programming in an interpreted language (probably either python or R), and have now had some experience at using these in your research.

You might have felt that your code is holding you back, and that if you could make it faster or use less memory you could take your analysis further. Or you might have got a solution but are unsure whether it's the right approach, or whether you could make it better.

Sometimes seeing the impressive (but sometimes intractable!) code and methods we use for research in bioinformatics can feel intimidating, and I feel there isn't much formal guidance on how to engineer your code this way.

My hope is that this course will give you some extra skills and confidence when developing and improving your research code.

Content

  1. Optimising python code. We'll start with some ways to make your interpreted code faster, such as arrays, multithreading, JIT compilation and sparse matrices.
  2. HPC. How to effectively use and monitor your computation on HPC systems, including compiling code with different options.
  3. Compiled languages. How using a statically typed language can be faster, how to control memory, and other pros/and cons (using rust as an example). Also understanding how your problem scales, and typical elements of your algorithmic toolbox to solve problems.
  4. Software engineering. How to turn your research code into a fully fledged software package.
  5. Recursion and closures. Some new patterns, with more challenging examples to implement.

Possible future content

Future modules to be added (possibly):

  • CUDA programming for GPU parallelism.
  • Foreign function interfaces - linking your compiled code with R and python.
  • Machine learning/deep learning.
  • Cloud computing.
  • Web development.

Resources

Some of this material is based on: