Academic group says it's time for researches to begin sharing source code

Apr 16, 2012 by Bob Yirka report

(Phys.org) -- A diverse group of academic research scientists from across the U.S. have written a policy paper which has been published in the journal Science, suggesting that the time has come for all science journals to begin requiring computer source code be made available as a condition of publication. Currently, they say, only three of the top twenty journals do so.

The group argues that because are now an integral part of research in almost every scientific field, it has become critical that provide the source code for custom written applications in order for work to be peer reviewed or duplicated by other researchers attempting to verify results.

Not providing source code, they say, is now akin to withholding parts of the procedural process, which results in a “black box” approach to science, which is of course, not tolerated in virtually every other area of research in which results are published. It’s difficult to imagine any other realm of scientific research getting such a pass and the fact that code is not published in an open source forum detracts from the credibility of any study upon which it is based. Articles based on computer simulations, for example, such as many of those written about astrophysics or environmental predictions, tend to become meaningless when they are offered without also offering the source code of the simulations on which they are based.

The team acknowledges that many researchers are clearly reticent to reveal code that they feel is amateurish due to computer programming not being their profession and that some code may have commercial value, but suggest that such reasons should no longer be considered sufficient for withholding such code. They suggest that forcing researchers to reveal their code would likely result in cleaner more portable code and that open-source licensing could be made available for proprietary code.

They also point out that many researchers use public funds to conduct their research and suggest that entities that provide such funds should require that created as part of any research effort be made public, as is the case with other resource materials.

The group also points out that the use of code, both off the shelf and custom written will likely become ever more present in research endeavors, and thus as time passes, it becomes ever more crucial that such code is made available when results are published, otherwise, the very nature of peer review and reproducibility will cease to have meaning in the scientific context.

Explore further: Best of Last Week - Zero friction quantum engine, twisted radio beams and Ebola outbreak update

More information: Shining Light into Black Boxes, Science 13 April 2012: Vol. 336 no. 6078 pp. 159-160 DOI: 10.1126/science.1218263

Abstract
The publication and open exchange of knowledge and material form the backbone of scientific progress and reproducibility and are obligatory for publicly funded research. Despite increasing reliance on computing in every domain of scientific endeavor, the computer source code critical to understanding and evaluating computer programs is commonly withheld, effectively rendering these programs “black boxes” in the research work flow. Exempting from basic publication and disclosure standards such a ubiquitous category of research tool carries substantial negative consequences. Eliminating this disparity will require concerted policy action by funding agencies and journal publishers, as well as changes in the way research institutions receiving public funds manage their intellectual property (IP).

Related Stories

Symantec urges users to disable pcAnywhere

Jan 26, 2012

Symantec is recommending that users of its pcAnywhere software disable the product following the theft of source code from the US computer security firm.

The solution to a 200-year-old encryption

Jan 11, 2010

(PhysOrg.com) -- The mathematician who deciphered the final, encrypted page of a letter sent to President Thomas Jefferson in 1801 will visit the University of Oregon to tell how he did it.

Google Dart debut sparks chatter of JavaScript coup

Oct 12, 2011

(PhysOrg.com) -- When the news appeared earlier this week that Google was unveiling a new programming language, Dart, for developers. tech blogs ignited with talk of how Google is staging a JavaScript coup. ...

Recommended for you

Ig Nobel winner: Using pork to stop nosebleeds

Sep 19, 2014

There's some truth to the effectiveness of folk remedies and old wives' tales when it comes to serious medical issues, according to findings by a team from Detroit Medical Center.

History books spark latest Texas classroom battle

Sep 16, 2014

As Texas mulls new history textbooks for its 5-plus million public school students, some academics are decrying lessons they say exaggerate the influence of Christian values on America's Founding Fathers.

Flatow, 'Science Friday' settle claims over grant

Sep 16, 2014

Federal prosecutors say radio host Ira Flatow and his "Science Friday" show that airs on many National Public Radio stations have settled civil claims that they misused money from a nearly $1 million federal ...

User comments : 16

Adjust slider to filter visible comments by rank

Display comments: newest first

Duude
1.4 / 5 (9) Apr 16, 2012
The black box is critical to guaranteeing government money keeps flowing.
Lurker2358
3.4 / 5 (8) Apr 16, 2012
They also point out that many researchers use public funds to conduct their research and suggest that entities that provide such funds should require that source code created as part of any research effort be made public, as is the case with other resource materials.


Anything developed using public funding should be considered public property, whether it's an astrophysics simulation, a $1 toy invention, or the next 100 Billion dollar idea, if it was publicly funded it should be public property.
DaFranker
1.8 / 5 (5) Apr 16, 2012
Anything developed using public funding should be considered public property, whether it's an astrophysics simulation, a $1 toy invention, or the next 100 Billion dollar idea, if it was publicly funded it should be public property.


By extension, if I catch on correctly, a default "Public Domain" patent should also be created automatically, so that rich idiots can't make life hell for anyone who wants to actually use the research concepts / inventions.

But then again, I personally think the US would be better off without their patent system altogether as things stand. Free market, competition, Darwin and all that.
Doug_Huffman
2 / 5 (4) Apr 16, 2012
But then again, I personally think the US would be better off without their patent system altogether as things stand. Free market, competition, Darwin and all that.
So you own no intellectual property, all of your inventions were signed over to your employer as a condition of employment? Why should I be prevented from benefiting from my intellectual labors? Does your largesse extend also to copyright?
Deathclock
1 / 5 (4) Apr 16, 2012
They also point out that many researchers use public funds to conduct their research and suggest that entities that provide such funds should require that source code created as part of any research effort be made public, as is the case with other resource materials.


Anything developed using public funding should be considered public property, whether it's an astrophysics simulation, a $1 toy invention, or the next 100 Billion dollar idea, if it was publicly funded it should be public property.


Smartest thing I've ever seen you post, but who could argue with it?
Jotaf
not rated yet Apr 16, 2012
I agree, it would certainly have a positive impact on the quality of research. I don't want to argue against this sort of policy but it's not as simple as it sounds.

Coding for science is not the same as your typical PHP or C# database application. There's always complex math involved, numerical stability issues, and new things are constantly being discovered as we code. If it was well within my comfort zone I wouldn't call it science, I'd call it engineering.

For that reason, much of research software is a brittle patchwork of barely working, very complex parts. Only the author, and maybe a few more people in the world (if they are willing to invest their time) can understand what's going on. If you make it open-source, you're committing to supporting that software forever. Imagine the horror of debugging this software through e-mail because some grad student doesn't know how to get it to work.
Jotaf
not rated yet Apr 16, 2012
Lately I've been trying to shift my focus more to research that can be more easily open-sourced, that is, 90% of the work is mathematical and in the end you come up with a handful of lines of code that do the same as many huge systems:

http://berkeley.i...eatures/

I cannot tell you how limiting this is. It's an interesting challenge, yes, but there's a lot of stuff that is impossible with this approach. I'm sure many researchers don't have that luxury.
Lurker2358
1 / 5 (3) Apr 16, 2012
So you own no intellectual property, all of your inventions were signed over to your employer as a condition of employment? Why should I be prevented from benefiting from my intellectual labors? Does your largesse extend also to copyright?


Actually, many U.S. employers require you to sign over all intellectual property rights to anything you might invent while working for them.

Blizzard entertainment won their case against the guy who invented their compression algorithms.

Fact is, you don't have rights to anything, even if you think you do, unless you invented it in your own garage on your own time.

Even then, a big company with enough money could win a case against you in a court and steal your idea anyway.
Deathclock
1 / 5 (4) Apr 16, 2012
Actually, many U.S. employers require you to sign over all intellectual property rights to anything you might invent while working for them.


Yep, I know I hold no claim to any of the software I create at work, including the machine-learning assisted pattern analysis algorithms that we use to find points of interest on data plots...
Sanescience
1 / 5 (3) Apr 17, 2012
yes, Yes, YES!

I have posted in the past about my distrust of software simulations because I write software myself. And maybe only a few people might know how it works in total, it can be pretty easy to spot common coding mistakes that only take a couple of lines of code to examine to find.

If only I had a nickle for every off-by-one bug I have found and fixed. And don't even get me started on STL container iterator abuses. And those are the "honest mistakes", not to be confused with the occasional 1000 lines of code to compute a value only for the last line to return a constant as a "test" value, but left in by accident. And I'll share one of my favorite comments I found once:

/*
Here we continue the hack we don't understand given to us by the developer that is no longer in business so we cant ask about it but we cant take it out because with out it the client is shown swear words in large font and a vulgar graphic if a memory leak is detected.
*/
alfie_null
not rated yet Apr 17, 2012
For that reason, much of research software is a brittle patchwork of barely working, very complex parts. Only the author, and maybe a few more people in the world (if they are willing to invest their time) can understand what's going on. If you make it open-source, you're committing to supporting that software forever. Imagine the horror of debugging this software through e-mail because some grad student doesn't know how to get it to work.

This is the situation for some groups where I work. Lots of Fortran-II code written in the 1960s going to extreme lengths to efficiently utilize limited memory and expensive cpu cycles. Yet the embodied algorithms are important and would benefit from re-implementation using current paradigms. _Exactly_ what open-sourcing would encourage. Absolutely doesn't mean you have to commit anything at all to support.
DaFranker
1 / 5 (4) Apr 17, 2012
But then again, I personally think the US would be better off without their patent system altogether as things stand. Free market, competition, Darwin and all that.
So you own no intellectual property, all of your inventions were signed over to your employer as a condition of employment? Why should I be prevented from benefiting from my intellectual labors? Does your largesse extend also to copyright?

Did you even read my post, or are you just trolling me? I never mentioned copyright. Abolishing patents means nothing CAN prevent you from benefiting from your work. No patents means companies CANNOT be the legal owner of your invention - NO ONE CAN. It's patents, right now, that are being taken by companies and thus legally prevent inventors from doing anything about it.

No patents means it doesn't matter - you can make it, you can profit from it. I also didn't specify that it's perfect - only that it's "better off without" (...) "as things stand".
Jotaf
not rated yet Apr 17, 2012
@Sanescience: Don't worry, we hear regularly from software engineers that they'd do it much better if they were in our place! They assume it's the same thing, but it's not -- believe me, I've been on both sides.

The truth is that we're the ones here in the trenches, and we get things done, while arrogant DB programmers (that say we do it wrong) never left their comfort zone. Yes, scientific code can be messy, as I said it's in the frontiers of knowledge. We're experimenting, there's no set path -- you can't plan it in your head and go (as you do when you're coding a GUI for the 20th time).

It doesn't mean you can dismiss any new algorithm or simulation. Things that work stay, things that don't die off. What's your alternative? If the brightest people in the world can't do it, who can? Do we just forbid new developments?

@alfie_null: In my experience, if they're useful enough they eventually get re-implemented -- generic algorithms and important simulations fall into this category.
Davecoolman
1 / 5 (5) Apr 17, 2012
About time, the stakes are so high that all data should be provable, dependably.
DaFranker
1.8 / 5 (5) Apr 19, 2012
The truth is that we're the ones here in the trenches, and we get things done, while arrogant DB programmers (that say we do it wrong) never left their comfort zone. Yes, scientific code can be messy, as I said it's in the frontiers of knowledge. We're experimenting, there's no set path -- you can't plan it in your head and go (as you do when you're coding a GUI for the 20th time).


This is even more reason to make it open-source. Those arrogant DB programmers will be stump-faced by the code or shut up when asked for a better example, and the real coders and hackers can take a crack at the frontier and suggest improvements. You'd be surprised how much a good one can improve scientific code even when they have no idea what the program is doing. Problem is, you won't find those as "professional" programmers "offering their services". You *need* to make the code openly available and let them come to you. It's a different market.
Sanescience
1 / 5 (3) Apr 20, 2012
@Jotaf

Touchy! There are many kinds of programming, my references were C , not DB.

"The truth is that we're the ones here in the trenches ... Yes, scientific code can be messy, as I said it's in the frontiers of knowledge. We're experimenting."

I've seen plenty of trenches in my time, I was in the occasional PC game project and 500 hour work months that led to the lawsuits over working conditions at game companies and reformed labor law.

I must again emphasis there are (at least) two classes of programming "errors", procedural, and implementation. Someone who doesn't know a thing about the simulation math of a model can still look at a line of code and say "hey, are you using any computers with a certain release of gcc or g ? Because if you heavily iterate this line of code it will introduce precision errors using this order of operations in long long variables." And there are innumerable little things like that, no one knows them all.