Feds: Harvard fellow hacked millions of papers

Jul 19, 2011 By JAY LINDSAY , Associated Press

(AP) -- A Harvard University fellow who was studying ethics was charged with hacking into the Massachusetts Institute of Technology's computer network to steal nearly 5 million academic articles.

Aaron Swartz, 24, of Cambridge, was accused of stealing the documents from JSTOR, a popular research subscription service that offers digitized copies of more than 1,000 academic journals and documents, some dating back to the 17th century.

In an indictment released Tuesday, prosecutors say Swartz stole 4.8 million articles between September 2010 and January after breaking into a computer wiring closet on MIT's campus. Swartz, then a student at the Harvard's Center for Ethics, downloaded so many documents during one October day that some of JSTOR's computer servers crashed, according to the indictment.

Prosecutors say Swartz intended to distribute the articles on file-sharing websites.

Swartz turned himself in Tuesday and was arraigned at U.S. District Court, where he pleaded not guilty to charges including wire fraud, computer fraud and unlawfully obtaining information from a protected computer. He was released on $100,000 unsecured bond and faces up to 35 years in prison, if convicted.

"Stealing is stealing whether you use a computer command or a crowbar, and whether you take documents, data or dollars," U.S. Attorney Carmen Ortiz said in a statement. "It is equally harmful to the victim whether you sell what you have stolen or give it away."

A call to Swartz's attorney wasn't immediately returned. Swartz is due back in court Sept. 9.

A spokeswoman for JSTOR said Tuesday that Swartz had agreed to return all the articles, so the company can ensure they aren't distributed.

"We don't own any of this content. We really have to responsible stewards of it," said spokeswoman Heidi McGregor. "We worked hard to find out what was going on. We worked hard to get the data back."

A Harvard spokesman said Swartz was placed on leave from a 10-month fellowship after the university learned about the investigation. He said the fellowship ended last month.

Swartz had legitimate access to JSTOR through Harvard, but the company has usage restrictions that would have prevented such colossal downloads.

The nonprofit JSTOR, founded in 1995, enables libraries to save space, time and labor by digitally storing centuries worth of academic journals. Its oldest publication is a Proceedings of the Royal Society of London from 1665.

Its annual subscription fees can cost a large research university as much as $50,000.

According to the indictment, Swartz connected a laptop to MIT's system in September 2010 through a basement network wiring closet and registered as a guest under the fictitious name, Gary Host, in which the first initial and last name spell "ghost." He then used a software program to "rapidly download at extraordinary volume of articles from JSTOR," according to the indictment.

In the following months, MIT and JSTOR tried to block the recurring and massive downloads, on occasion denying all MIT users access to JSTOR. But Swartz allegedly got around it, in part, by disguising the computer source of the demands for data.

In November and December, Swartz allegedly made 2 million downloads from JSTOR, 100 times the number made during the same period by all legitimate JSTOR users at MIT.

The indictment also alleges that on Jan. 6, Swartz went to the wiring closet to remove the laptop, attempting to shield his identity by holding a bike helmet in front of his face, and seeing his way through its ventilation holes. It said that he fled when MIT police tried to question him that day.

An MIT spokeswoman said the school had no comment on the apparent breach.

McGregor said JSTOR recognizes it's very difficult for any institution at any level to protect its data.

"Hacking is rampant," she said. "Protecting systems is a huge challenge right now for any industry, and in the academic space it's especially challenging because we all want to be as open as we can and have policies that promote use."

Explore further: Spain: Google News vanishes amid 'Google Tax' spat

2.7 /5 (3 votes)
add to favorites email to friend print save as pdf

Related Stories

Fla. man accepts plea in record data theft case

Aug 28, 2009

(AP) -- An accused computer hacker charged in what prosecutors call the largest identity fraud case in U.S. history has agreed to plead guilty to conspiracy, wire fraud and aggravated identity theft charges.

NY document: ID theft ring targets Apple stores

Feb 02, 2011

(AP) -- Dozens of people have been charged with forming a prolific identity theft ring that used thousands of stolen credit card numbers to shop at Apple stores around the country, according to a court document and a law ...

4 charged with hacking into concert ticket sites

Mar 01, 2010

(AP) -- Federal prosecutors in New Jersey say four California men made more than $25 million reselling tickets to concerts and sporting events they acquired by hacking into Ticketmaster.com and other Web sites.

Charges in Silicon Valley secrets theft

Dec 23, 2005

A former Netgear engineer was charged Thursday with stealing trade secrets from his former company and distributing them to colleagues at his new job.

US reaches plea deal with NSA spy whistle-blower

Jun 09, 2011

An ex-senior official in the top secret US National Security Agency will plead guilty to exceeding authorized use of a computer in a classified information leak case, court papers showed Thursday.

Recommended for you

Spain: Google News vanishes amid 'Google Tax' spat

Dec 16, 2014

Google on Tuesday followed through with a pledge to shut down Google News in Spain in reaction to a Spanish law requiring news publishers to receive payment for content even if they are willing to give it away.

Brazil: Google fined in Petrobras probe

Dec 15, 2014

A Brazilian court says it has fined Google around $200,000 for refusing to intercept emails needed in a corruption investigation at state-run oil company Petrobras.

Microsoft builds support over Ireland email case

Dec 15, 2014

Microsoft said Monday it had secured broad support from a coalition of influential technology and media firms as it seeks to challenge a US ruling ordering it to hand over emails stored on a server in Ireland.

User comments : 25

Adjust slider to filter visible comments by rank

Display comments: newest first

Strings0305
5 / 5 (9) Jul 19, 2011
A sad case of "theft" for something that should be accessible to everyone for free in the first place.
Sonhouse
3 / 5 (6) Jul 19, 2011
Sounds like he failed his ethics boards too. How ironic.
frajo
4.4 / 5 (7) Jul 19, 2011
Sounds like he failed his ethics boards too. How ironic.

Depends on your specific value prioritization.
Callippo
4.7 / 5 (7) Jul 19, 2011
Is it ethical to cover the results of research sponsored mostly from public NSF grants (i.e. money of tax payers) behind paywall..?
Temple
3 / 5 (2) Jul 19, 2011
Is it ethical to cover the results of research sponsored mostly from public NSF grants (i.e. money of tax payers) behind paywall..?


Well, anybody (even non-students) can walk into the library of a university campus and use the public terminals to access JSTOR.

Are you arguing the right for the information to be available publicly or the right for the information to be available from your living room?
Vendicar_Decarian
4.1 / 5 (9) Jul 19, 2011
"Well, anybody (even non-students) can walk into the library of a university campus and use the public terminals to access JSTOR." - Temple.

Well - no...

Acess even for professors can cost as much as $50,000 per year.

The positive ethics of acquiring and distributing these articles on line is positive and pure.

Information wants to be free, and all attempts to assist it in it's wishes are by their very nature, ethical.
Burnerjack
2 / 5 (6) Jul 19, 2011
"Cambridge Ethics"? Makes me wonder as to the viability of secure cloud computing. Why spend on R&D if you can just steal someone else's efforts? Pretty sad. Hard to imagine an operation as sophisticated as MIT, let alone the DOD seem defenseless against these intrusions.
Burnerjack
1.8 / 5 (5) Jul 19, 2011
VD, there is nothing ethical about it. I submit that you sir, are morally bankrupt. Your parents should be proud.
skitterlad
2 / 5 (1) Jul 19, 2011
He did is so wrong.

Should have used a lan over power 500MB/s in the closet and then connect it somewhere else on the power grid. They would have to search every where.
gwargh
3.9 / 5 (7) Jul 19, 2011
VD, there is nothing ethical about it. I submit that you sir, are morally bankrupt. Your parents should be proud.

There is nothing ethical about attempting to give free access to information to those who cannot otherwise afford such access? I submit that you sir don't quite have a grasp of the issue, and should refrain from insulting someone from merely one statement.
Vendicar_Decarian
3.5 / 5 (10) Jul 19, 2011
"VD, there is nothing ethical about it." - Tard of Tards

The free flow of information is the foundation of science and enlightenment.

Anything attempt to thwart the free flow of information of any kind is therefore a threat to enlightenment and the foundation of science, and therefore is pure and absolute evil, as are those who claim that access to information must be limited and controlled.

bottomlesssoul
4.2 / 5 (5) Jul 19, 2011
@Decarian, You rock. Information can not be controlled, only people can. It seems all the more ugly that these prestigious universities share the same petty selfish mental state as Metalica.
Vendicar_Decarian
4.2 / 5 (5) Jul 20, 2011
Upon further investigation it appears that the documents that were duplicated were in fact publicly available for download.

The "crime" then is in the volume downloaded, and not in the duplication of the files themselves.

It also appears that these files may have copied for research purposes rather than re-distribution.
finitesolutions
1 / 5 (4) Jul 20, 2011
They are treating this case as movie piracy. The documents in question are the production of the MIT staff and they reserve the copyright to it. You want the stuff you have to pay.
Nothing is free in capitalism or communism.
You want to consume documents you have to pay.
iiiears
not rated yet Jul 20, 2011
Why does something like this cost 50,000 dollars a year to subscribe to?

I really would like to see Microsoft/Apple/Android add distributed file sharing to their OS. Everyone might have 1-10g of storage that would form a second tier network.
gwargh
not rated yet Jul 20, 2011
They are treating this case as movie piracy. The documents in question are the production of the MIT staff and they reserve the copyright to it. You want the stuff you have to pay.
Nothing is free in capitalism or communism.
You want to consume documents you have to pay.

The documents in question are the "property" of a large journal database, JSTOR, their authors ranging wildly by universities of affiliation. MIT pays to have access to these documents, so no, it is not the production of MIT staff, and most scientists do not own the copyright to their own research (it is transferred to publishers the moment it's published.)
lovenugget
5 / 5 (2) Jul 20, 2011
if hackers wanted to do something truly beneficial to society, they would find the means to upload ALL of the academic articles from ALL of the databases and make them available for free to download. capitalism has seriously interfered with the public's ability to access research articles. as a college student i was issued a username/pass to access JSTOR but it's still an issue to me.

recently it has come to light that database login credentials have hit the black market. this alone is a strong indicator of how desperately folks want access to information and how much the public abhors having to pay money to access it. it seems to me what mr. swartz has done is entirely ethical and would have increased the welfare of humanity. free the man.
Vendicar_Decarian
3 / 5 (2) Jul 21, 2011
While it is unlikely that most people would be able to comprehend what is in these articles - public access to scientific research costs essentially nothing. What does cost is respected peer review and the management and selection of articles to be published. These things take real effort and can not be automated.

It is essential that there remain a distinction between quack publications like "Energy and Environment" and reputable publications like Nature and Science.
Skepticus_Rex
5 / 5 (1) Jul 21, 2011
I really would like to see Microsoft/Apple/Android add distributed file sharing to their OS. Everyone might have 1-10g of storage that would form a second tier network.


Microsoft already has similar technology embedded into the Operating System. It is called BranchCache. All you need for Distributed Mode branchcaching is Windows 7 Ultimate or Windows 7 Enterprise, and to set it up. (There is a little more configuration required, and so forth, but there it is. True distributed file sharing, however, is a security risk on so many levels. Best to avoid it.
Isaacsname
5 / 5 (1) Jul 24, 2011
Free the knowledge, I say. We have constrained ourselves to the nth degree by not sharing it freely. How many scientists/explorers have wasted time making the same discoveries that others made, soley for the reason they did not know ? Go to youtube and watch laypeople like myself " discovering " some exciting new " effects " based on plywood whirligigs and magnets. Rediscovering the same things that were discovered and quantized over 100 years ago. I'm sure the same thing goes on in labs all over the planet.

VD is correct, information finds a way, from Platonic Idealism to memetics, it just keeps popping up doesn't it ?

:)
TheGhostofOtto1923
1.3 / 5 (3) Jul 25, 2011
Information wants to be free, and all attempts to assist it in it's wishes are by their very nature, ethical.
I think I concluded last time VD that you obviously don't produce anything of intellectual value.

If you did you would understand that people who produce info are WORKING to do so just like everybody else. If someone steals what they have produced and they don't get paid for their efforts then they will have to do something else to earn a living, and this info doesn't get produced.

Same with the people who gather, catalogue, and maintain info for services like JSTOR. They need to be paid for their time. Equipment and terminals have to be housed, powered, maintained and upgraded when needed. How would you propose to pay for this if money is not collected for using it?

What makes you think that the time, effort, and education needed to produce knowledge is not WORTH anything? Maybe you have little respect for knowledge itself?
gwargh
2 / 5 (1) Jul 25, 2011
What makes you think that the time, effort, and education needed to produce knowledge is not WORTH anything? Maybe you have little respect for knowledge itself?

Something tells me you yourself are not a scientist. Here's the main point: Scientists PAY to get their work published. Journals get money to cover publishing and archiving costs from the scientists. They get more money by making other scientists pay to get access to these articles. The main cost, of course, is peer review, and it's one that's worth paying for (although most reviewers also do not get paid, i.e. the cost is mainly in hiring someone to find manage reviewers). JSTOR gathers articles that are often supposed to fall into open access (most journals allow open access to articles after an initial period of time), and then makes people pay for what essentially should be free. By the point JSTOR "owns" these articles, those who PRODUCED this knowledge have gotten payed, and see no profits from this resale.
TheGhostofOtto1923
1 / 5 (2) Jul 25, 2011
Scientists PAY to get their work published.
And what do they PAY with? Money they earn by generating the data they want to publish. Or from department budgets which are filled by income based on the amount of research they do, and what it is worth.
Journals get money to cover publishing and archiving costs from the scientists.
Again, this research must be WORTH something for the scientists to glean the money needed to publish.
most reviewers also do not get paid
Are you saying they do this on their own time? If you look at university budgets you may find that a large % comes from patents and grants earned by their ability to do research. Part of this WORK is sharing and reviewing this WORK within the community.
JSTOR "owns" these articles, those who PRODUCED this knowledge have gotten payed
MIT, like other institutions, provide the facilities and infrastructure within which this research is done.
cont
TheGhostofOtto1923
1 / 5 (2) Jul 25, 2011
Income is shrinking throughout the academic world. Universities need to generate it in order to survive. JSTOR is one way MIT can increase cash flow from the research that can ONLY be done within their institution.

Additionally JSTOR doesnt maintain itself. As I said it is no doubt staffed with people who review, sort, catalogue, and maintain the archive's entries and hardware. This all takes money.

If they didnt provide a collsction point for good, solid academic info, where would users go to find it? How would they know they could trust its quality and veracity?

Some aether wizard could post some forgery on a file sharing website and no one would know the difference. Except the bad english would be a dead giveaway I suppose.
Skepticus_Rex
not rated yet Jul 27, 2011
It costs money to store those files. Selling copies of the articles helps pay for the server space and related costs. When people steal copies there is less money to pay the bills much less the salaries of those who monitor, protect, and preserve the data for the future.

It is similar to what happens in retail business. For every item shoplifted it takes the sale of 20 of the same item just to break even on the cost of the stolen item. Piracy, whether of software or scientific articles, is a form of 'shoplifting' and similar rules apply.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.