Open Science, Open Data: Lessons from the Montreal Neurological Institute
Posted by: Chantel Ridsdale, RDC Intern
The Montreal Neurological Institute (MNI) A.K.A. “The Neuro”, located on the McGill University campus, is an independently-funded research and clinical facility with a focus on nervous system disorders. As home to approximately 1,200 staff (600 clinicians and 600 researchers), it provides greater opportunity for collaboration than other institutions, “at The Neuro, science informs medicine and patients inform science.”
The Neuro was founded by Dr. Wilder Penfield in 1934, and in 80 years has become the largest neuroscience research and clinical center in Canada, and one of the foremost in the world. Furthering their reputation for innovation, they officially announced in May 2016 through Nature and Science that they will be undertaking an Open Science, Open Data policy following five principles:
- All information about a study will be made publicly available by the time of publishing – positive or negative, models, algorithms, reagents and software.
- All data and resources generated through new research partnerships must follow the same rules.
- The institute’s biobank (tissue samples and brain-scan data) will be opened up (may be a nominal fee).
- The institute will not pursue any intellectual property protections for research discoveries.
- Although the institute will not support activities that undermine these open-science principles, it will respect its researchers’ autonomy.
“These principles stem from a few areas,” explains Guy Rouleau, the Neuro’s Director. “We think that by sharing data quickly, we’ll be able to accelerate the discovery of mechanisms and eventually new medicines.’” Vivane Poupon, the institute’s director of partnerships and strategic initiatives explains the initiative was an attempt to diminish the ~95% failure rate of drug candidates. Rouleau goes further to clarify “open” as “meaningfully available” with infrastructure supporting the use of huge amounts of data.
The Neuro produces a vast amount of data, and has developed several ways to deal with it: the Brain Imaging Center (BIC), the Longitudinal Online Research and Imaging System (LORIS), and CBRAIN.
BIC is a large training center and includes 7 scanners that produce 4,000 scans/year, 120 Primary Investigators and hundreds of trainees. The Centre has had over 30,000 registered users for training sessions over the last 5 years.
LORIS is open-source, web-based, data and project management software aimed at storing and processing behavioural, clinical, neuroimaging and genetics data. Patient records are coded for confidentiality reasons, which allows for opening/sharing, and makes it easy to process longitudinal studies. LORIS is linked to a variety of visualization tools and allows users to leverage external tools. Features include project management and study design; data collection; data management and quality control; data visualization; and data sharing. LORIS currently has 400+ international partners.
In 1999, the challenges of developing a system to store the large amount of data began, and the result was CBRAIN. CBRAIN is a repository which binds together the major brain research facilities (about 200,000 processors) and make it available to the data community, including the tools and processing power required to interact with this quantity and quality of data. It also handles all data transfer to and from the clusters (data storage units). Another feature CBRAIN provides a simple interface that researchers without programming skills can use to collaborate with peers around the globe. Features include: convenient secure web access and distributed storage with automated multipoint data transfer. CBRAIN current has over 400 users.
“CBRAIN is a name that could be misleading,” shares Marc-Etienne Rousseau, CBRAIN’s System Architect and Technology Manager. “It actually does not have anything to do with the brain, and has everything to do with digital bits and bytes.” There is flexibility to adapt the platform to extremely heterogeneous computing and data sites, due to the fact that most data collection projects require the same data management supports (80% commonality is a good rule). The challenge is how best to leverage the substantial investments the Neuro and its funding partners (like the Canada Foundation for Innovation and CANARIE) have made in technology development. RDC intends to facilitate this conversation and encourages feedback from the broader community.
Finally, for Research Data Canada, the Neuro, and the implemented systems and best practices highlight what a research organization can do when it embraces a deliberate and sustainable approach to research data management. The Neuro has taken a leadership role in the Open Data/Open Science community by announcing a 5-year period where they will abide by an Open approach and share all their research outputs. When combined with the recent announcement of an $84 million CFREF grant for the Neuro, the next 5 years should indeed prove interesting.
This public commitment is a model for all publicly-funded research organizations in Canada and should be emulated by more labs, especially those where the benefits for society are substantial. We hear about efforts to share data openly when public health emergencies like Zika emerge, but imagine how far research and discovery would advance if all research domains took a similar approach…Surely much farther than we see with the current 350-year old model of knowledge dissemination.