Yesterday, I participated in an R workshop hosted by the Québec Centre for Biodiversity Science (better known to members as QCBS, CSBQ pour les membres francophones). For those who aren’t familiar, R is a free, open-source computer language that allows you to manipulate data, perform statistical analyses, and make pretty plots and graphs for publications, all under the same umbrella. I’ve been hearing about the wonders of R for years from other graduate students, but this is the first opportunity I’ve had to actually learn it. And now that I have some data that I’m trying to produce pretty graphics of for publications, it seemed like a good opportunity to learn something new! The workshop itself, Zero to R Hero, was led by members of the R Montreal user group, who have taken it upon themselves to spread the good news of R to those of us (myself included) who are just starting out. Like any new computer language, there is a steep learning curve, and getting going can be intimidating. The idea of the workshop was to help you to get over the first hurdles and to be able to use R for your own research.
As with any course or workshop, you get out what you put in (and you tend to focus on the information you need). Let me put it this way – previous data-handling workshops that I have attended were never so useful as when I actually had my own data to play with. It is one thing to play with a sample data set, and quite another to play with your own. Especially since real data tends to be full of idiosyncrasies and formatting issues. Which is the first thing that I discovered about my own data – I will have to spend some time so that it will “play nice” with this new way of handing it. But now, thanks to the workshop, I have the basic tools that will allow me to do this.
Computing workshops can be a challenging thing (just as much for the instructor as for the participant), but are a necessity. Science is producing more and more data. For example, the computers at CERN, home of the Large Hadron Collider, handle about 1 petabyte, or 1 million gigabytes per day. Biology is no different. Figuring out the sequence of DNA in the genome has never been faster or cheaper. I have some of my own data of this type which is about 150 gigabytes of raw data for only a handful of samples. All of this data needs to be transferred, stored, and processed. This requires the development of new software, which even if you aren’t the one writing the code, requires at least a basic understanding of coding in order to get it to run. However, most undergraduate programs don’t prepare you for this reality. The biggest challenge in computing workshops can be different levels of computer fluency. If you have never used the command line, you are starting at a disadvantage. This is not to say that learning basic commands are impossible, but like learning any new language, you need to learn the new vocabulary in order to function. And as with any new language, this gets better with practice.
Learning the new skills that you need to complete your work is a major part of graduate school, and part of the process of becoming an independent researcher. In many cases, it can feel like a very lonely process, but if you look around, there are resources everywhere to help you. Although this was a formal workshop, it was still handled by graduate students who are friends and colleagues. It is often the case in graduate school that the best teachers are your peers. These are the people around you who are also doing the day-to-day work of research. If they’ve been around longer than you, they’ve already learned how to use that piece of equipment/run that software/perform that analysis/deal with that administrator. They have a wealth of knowledge about your office, your department, the university and the city. And eventually, you too will know how to do those things and be able to help your peers!