I have a couple of book reviews in the pipeline, so I am starting a new category for review of books I find useful (or not so useful). I wrote this review for this great book months ago, but, like many things in my life, I’m just now getting it online.
Like many people, my research has been changing in recent years. I have been spending an increasing amount of time in front of a computer and less time at the lab bench. I can’t see myself ever forsaking the wet lab or field experiments, but I’m using computers more than ever before. There’s now so much data to process – mostly text in the form of sequence data – and I’ve become increasingly reliant on a computer to search large data sets and convert data file formats. Even if you aren’t a biologist in the area of genomics/genetics, new data collection instruments for physiology, ecology, and atmospheric sciences are recording data at incredible rates, and, additionally, sorting through citations is getting more and more time consuming. It’s impossible to ignore the data revolution that is taking place no matter where your foundation within the biological sciences (or physics, chemistry, etc.) lies.
I wish the book Practical Computing For Biologists (and Companion Website), by Steven H. D. Haddock & Casey W. Dunn, would have come along sooner, but I am so glad it’s available now, because learning to deal with data more efficiently is where this book comes in. When considering my research and use of time, this book has been the most important book I’ve read in the last year, perhaps the last decade. If you’re a biologist (or anyone for that matter) who finds themselves clicking away at a database file (such as Excel) or cutting and pasting from online data repositories (such as GenBank, national weather databases, etc.) then this book is for you. In reality, this book is for anyone who wants to use a computer to work more efficiently with data.
The book can be broken up into six sections dedicated to the following topics: (1) manipulating and searching text files, (2) working within your computer’s shell, (3) basic programming for biologists, (4) combining methods (this is a section on database management and tool selection), (5) dealing with graphics for data communication, and (6) advanced topics such as remote computer access and installing software.
This book devotes a large portion, and rightfully so, to addressing how to manipulate text files and other file formats used to store and communicate data. Beginning with text editing using regular expressions, what I learned in the initial chapters immediately saved me time during large text processing and parsing of sequencing data. A section at the end of the book focused on remote access and remote scripting helped me to start dealing with text and files on other computers.
The book focuses on Unix based platforms (Linux, OSX) due to ease of programming, but it does not ignore DOS (Windows) based platforms. An appendix at the end of the book is useful in translating one platform to another. When the book recommends the use of specific software, which is rare, the focus is on free open-source options. The programming language Python is the language of choice for much of the book, but an Appendix at the end of the book helps to sort out differences in the many programming languages used in biology. The open-source MySQL database platform is addressed for storing and communicating data. One important goal of the programming and data organization aspect of the book is to standardize reproducibility and improve collaborative work through automation and transparency.
Surprisingly little attention is given to the actual communication of data in graduate coursework and training, so it’s refreshing to see a focus on image basics communicated here across a few chapters in the book. These sections focus on basic image creation and manipulation using both commercial and open-source options.
Striking a perfect balance by guiding you through tutorials and nudging your own self-exploration, the book has just enough guided direction to not annoy or overwhelm. This text is not a solution cookbook, but, more importantly, a guide to help get you started in data analysis and file format manipulation and to help you think for yourself to address your research problems. While this book will help you deal with text, it doesn’t address software for word processing (Word, OpenOffice), Presentation (Powerpoint, Keynote), Spreadsheet (Excel), or statistics (R, SAS, SPSS, etc.), as this would create a huge giant book. This book does not cover software for phylogenetics or population genetics and I don’t think it should.
Just to be clear, I’m not being paid here to promote this book. I just honestly have found this book extremely helpful to my own research and I want to communicate that. I haven’t read many books which have been able to change my life in a self-actualizing way, but this book helped (…and is still helping) me to do what I was doing before, but more efficiently.