Cloud-based hosting services have been around for a while, but with the launch of Dropbox-like desktop clients of both Google Drive and Microsoft SkyDrive this week, it appears to be the hot topic to be discussing now.
My goal here is not to discuss the pros and cons of each. There are plenty of thorough comparisons out there. Instead, as fast as the hype about all the cool things came, the discussion is now quickly shifting towards ill-defined terms of service that can jeopardize the confidentiality of your data. What does it mean in terms of storing your research data?
This article from Time’s Techland blog provides some background on this issue concerning the recently launched Google Drive. Basically the main issue revolves around these lines of their terms of service:
When you upload or otherwise submit content to our Services, you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works (such as those resulting from translations, adaptations or other changes we make so that your content works better with our Services), communicate, publish, publicly perform, publicly display and distribute such content.
I don’t think any Google employee would actually steal one of my manuscripts and/or a presentation and claim it was their own. But the above lines appear troubling to me when it comes down to confidentiality of our research subjects. Picture, for example, the following plausible – and fairly harmless – scenario: if you interview a bunch of people on a sensitive subject, you can only publish these results if the subjects are guaranteed anonymity. But when you store the interview transcripts (that can only be seen by you and your research associates) on Google Drive together with the contact information of this person, it is possible that Google will crawl this information to serve “better” ads to this person based on this document. To be honest, I am not sure if this is desirable, let alone compliant with our own REB requirements.
(This is why that services like Google Refine do not require you to upload your data to their servers. You can treat them locally, even though you interact through the web browser).
All these file-syncing tools provide wonderful collaboration means that make our lives simpler as researchers. It is also very convenient not to worry about versioning and/or carrying our files around in USB drives. But as cloud-based solutions become more and more popular, shouldn’t we be more clear on what does this represent in terms of data privacy?
How do you handle your research data in the cloud?