Computer scientists say it's time to start looking at treatment of data waste
July 19, 2011 by Bob Yirka
(PhysOrg.com) -- As anyone who has ever used a Windows based computer for any length of time knows, the longer you have it, the slower it goes; this is because of the accumulation of data files and entries in system logs; information that in many cases isnt really necessary. Thus, our computers slow down due to the accumulation of "waste." Now, two computer scientists from Johns Hopkins University have published a paper on arXiv, where they argue that data waste management on computer systems could, and should be handled similarly to the way physical-world waste is managed.
In their paper, Ragib Hasan and Randal Burns pick up where computer scientists at Cornel University left off after discovering in 1999 that up to 80% of files written to the hard drive by the Windows NT operating system were deleted within five seconds of being created.
Hasan and Burns analyzed three computers: a MacBook laptop, a desktop running Ubuntu Linux and a Fedora Linux fileserver in the University Library (Linux is a variant of the Unix operating system used primarily at educational and research institutions). Their intent was to find out what percentage of the files on each of the computers had not been accessed since their creation. They found that the percentages for each were: MacBook: 20.6, Desktop: 47.4 and Server: 57.1 and that the percentage of disk space used for each was 98.5, 38.1 and 99.5 respectively; clearly indicating that a large number of files using a lot of disk space had never been used again once being created. This is clearly an inefficient use of resources.
It is for this reason that the duo suggest a new approach be used for data waste, one that takes advantage of the research already done with physical waste; specifically, they suggest a pyramid approach be used, similar to the one put in place by physical waste management companies. At the bottom of the new pyramid would be the worst case scenarios, then moving up, the next best and so on till reaching the top, and that they be labeled as such: Dispose, Recover, Recycle, Reuse and Reduce, with zero data waste being the optimal goal.
In this case, Dispose is just that, erasing the data, Recover refers to extracting usable components, Recycle would be refurbishing component for reuse, and Reuse would be using those recoverable components in another way, and Reduce, the ultimate goal would be creating software that doesnt create waste data in the first place.
Besides slowing computers down due to I/O bottlenecks, data waste can also contribute to faster burnout times for flash technology, which have a limited number of lifetime write/rewrites before dying, something the authors point out, will likely become more important as such technology is increasingly being used in hand-held computing devices.
More information: The Life and Death of Unwanted Bits: Towards Proactive Waste Data Management in Digital Ecosystems, Ragib Hasan, Randal Burns, arXiv:1106.6062v2 [cs.ET] http://arxiv.org/abs/1106.6062
Abstract
Our everyday data processing activities create massive amounts of data. Like physical waste and trash, unwanted and unused data also pollutes the digital environment by degrading the performance and capacity of storage systems and requiring costly disposal. In this paper, we propose using the lessons from real life waste management in handling waste data. We show the impact of waste data on the performance and operational costs of our computing systems. To allow better waste data management, we define a waste hierarchy for digital objects and provide insights into how to identify and categorize waste data. Finally, we introduce novel ways of reusing, reducing, and recycling data and software to minimize the impact of data wastage.
© 2010 PhysOrg.com
-
From lemons to lemonade: Reaction uses carbon dioxide to make carbon-based semiconductor,
32 comments
-
Thioridazine kills cancer stem cells in human while avoiding toxic side-effects of conventional cancer treatments,
3 comments
-
SpaceX private rocket blasts off for space station (Update),
42 comments
-
Climate scientists say they have solved riddle of rising sea,
31 comments
-
SpaceX capsule has 'new car' smell, astronauts say (Update),
2 comments
-
Ideas to mitigate risk of 911 calls being misdirected
May 24, 2012
-
Live scribe pen?
May 10, 2012
-
Shallow water flow simulation
May 07, 2012
-
Tablet for taking notes?
May 05, 2012
-
Best fit tablet for me?
May 05, 2012
-
Measure of Informaton
May 04, 2012
- More from Physics Forums - Computing & Technology
More news stories
Browser wars flare in mobile space
The browser wars are heating up again, but this time the fight is for dominance of the mobile Internet.
6 hours ago |
5 / 5 (1) |
2
Probability of contamination from severe nuclear reactor accidents is higher than expected: study
Catastrophic nuclear accidents such as the core meltdowns in Chernobyl and Fukushima are more likely to happen than previously assumed. Based on the operating hours of all civil nuclear reactors and the number ...
Technology / Energy & Green Tech
May 22, 2012 |
3.6 / 5 (22) |
56
|
SpotterRF debuts Radar Backpack Kit (w/ Video)
(Phys.org) -- SpotterRF has announced a special radar backpack kit designed to enhance situational awareness for soldiers on the ground. The company says its special radar is designed for warfighters as part ...
HyperSolar shows dirty water no barrier to power world
(Phys.org) -- The Santa Barbara, California, company, HyperSolar, is set to transparently share the ups and downs of its research experiences toward the companys ultimate vision, successfully producing ...
Tesla to launch electric sedan in US on June 22
Tesla Motors said Tuesday it would begin deliveries of "the world's first premium electric sedan" on June 22, slightly ahead of schedule.
Technology / Energy & Green Tech
May 22, 2012 |
4.5 / 5 (11) |
18
Nvidia trumpets Tegra 3 phone design wins for 2012
(Phys.org) -- Nvidias competitive war paint has a name, Tegra 3. On the heels of Nvidia announcements about lowering costs of its Tegra 3 processors and Nvidia-enabled tablets running Android Ice Cream ...
Scientist: Evolution debate will soon be history
(AP) -- Richard Leakey predicts skepticism over evolution will soon be history. Not that the avowed atheist has any doubts himself.
Dell tablet leak: 10.1-inch display, two-battery choice
(Phys.org) -- Headline after headline talks about vendors tablets in the wings as likely number-one contenders for the iPad. Such claims have justifiably been taken with a grain of salt, considering ...
Keep food safety in mind this memorial day weekend
(HealthDay) -- Picnics, parades and cookouts are as much a part of Memorial Day weekend as tributes to the United States' war veterans.
Social welfare cuts ultimately come with heavy price, researchers say
(Phys.org) -- Slashing government funding for Medicaid, food stamps and other programs that serve the poor while politically popular with some lawmakers and many conservatives may do more harm ...
Is a classical electrodynamics law incompatible with special relativity?
(Phys.org) -- The laws of classical electromagnetism that were developed in the 19th century are the same laws that scientists use today. They include Maxwell’s four equations along with the Lorentz la ...
Jul 19, 2011
Rank: 2.3 / 5 (4)
For example, there's a roughly 600 MB folder in the Windows XP operating system that contains various generic drivers for devices like digital pens, ZIP drives, all sorts of obsolete hardware and all sorts of new hardware like bluetooth dongles which may never ever get used on a particular computer.
But they might - and that's the point.
I have an older netbook with just 4 GB for the system partition. I removed all the extra bits like update uninstallers and the drivers folder, and now the system is much smaller with room to breathe. But if I wanted to plug in a gamepad, which I've never used on the machine, it probably wouldn't recognize it because I removed the drivers from the driver cache.
Jul 19, 2011
Rank: 2 / 5 (4)
Because, on large filesystems, large sector sizes lead to reduced formatting losses because less data is needed to index where the files are. This however means that small files become increasingly wasteful since they won't use all of the disk space appointed to them.
A small text file may need 1.5 sectors, but it will reserve 2. This multiplied by tens of thousands of files is incredibly wasteful.
Of course you could always format a system partition where you have small sectors, and a data partition that has large sectors, except it becomes a problem when one eventually grows out of its bounds and needs to borrow space from the other. Usually it's the system partition, which grows and grows as the user adds software and updates to the software.
Jul 19, 2011
Rank: 1 / 5 (5)
The package manager conks out and s**ts itself because it refuses to delete anything before it has completed the previous operation, which can't commmence because there's no space to put the files in.
And in Linux, you Do Not Manually Delete anything unless you're willing to become the package manager yourself and figure out where everything is and where it belongs, which isn't nice because the files are shot all over the directory tree with a cannon.
Jul 19, 2011
Rank: 4.3 / 5 (3)
In the windows system, it would seem to me to be a bit more efficent to have the drivers stored somewhere on the net, microsoft hosted, where when you need a driver for a device it would download and use it.
Why didn't you read when it said "your install will take X space and you have Y space available"? You could also cancel out of the install, and uninstall the package.
And bits of information aren't scattered across the computer when you install a windows program? 1/2 of which isn't even cleaned up when you use the Windows uninstall feature. At least with Linux i know it's all been removed.
Jul 19, 2011
Rank: not rated yet
Because it didn't. Because I couldn't, because it wouldn't. (It: the Ubuntu software center thingy)
I know it says something like that somewhere there, but when you look for new software it says one thing, and then downloads ten libraries that weren't included in the package. Ubuntu is supposed to be user friendly, but like always they haven't thought it all the way through and eventually you need manual intervention.
True, but then again a great deal of the software is just a folder in the program files, and a couple lines in the registry.
I especially like the fact that I can move a folder to a different drive, and the program usually works without having to make symbolic links to patch it up.
Jul 19, 2011
Rank: 1 / 5 (1)
Which is why certain types of files must be put into pre-determined places in the file system, and the whole thing starts to look like a bunch of angry octopodes wrestling with their tenticles.
You don't know where one thing ends and another one starts, so you can't open the "programs" folder and see "hey this program X takes Y amount of space on my drive, let's delete or re-locate it". You need a special program to do that, and re-location would break all the hard-coded paths.
It's kinda like 1995 again whenever I use it.
Jul 19, 2011
Rank: not rated yet
I swear that has to be the company motto behind some of the spam I receive, which looks like it was poorly translated to Korean and back at least twice. Recycled, I guess...
Jul 19, 2011
Rank: 1 / 5 (2)
On the disk space usage point, yes, but it would induce a horrible lag: "Generic Device found, Hitachi External DVD drive, downloading drivers, 4% complete (2.35 kb/s)"
Which is actually one of the major points I personally have against Linux. It's downright useless if it isn't tethered to the internet constantly. You can't do -anything- unless you're one click away from google to ask for help, or download some library or a patch to something. You can't even carry a software package on a USB stick because it's still missing N other things that you didn't know you didn't have or need.
Oh the joy when installing Linux, and the network refuses to work for some reason.
Jul 19, 2011
Rank: not rated yet
Alternatively, the drivers could be stored in a repository online and downloaded as needed. Another option is for device manufacturers to embed the drivers for the device on-board the device itself.
Any of these, or a combination, would reduce wasted space and time needed in scanning useless files.
Jul 19, 2011
Rank: 3.5 / 5 (4)
Jul 19, 2011
Rank: 5 / 5 (2)
My apps do this all the time. In perl, I say:
use FindBin;
my $appDir = $FindBin::Bin;
If my app then wanted to find all the files owned by the user my app is running as, my app could say:
my @files = qx/find $appDir -user $ENV{"USER"}/;
What's more, UNIX has a standard /tmp directory where applications can create their temporary files and not have to worry (much) about cleaning them up. Some variants of UNIX clean the /tmp directory every time the system boots.
Jul 19, 2011
Rank: 5 / 5 (1)
That's like deep articles on cosmology and physics, where they always remember to explain what this Light Year thingy is, while never explaining the stuff you'd need to know. (Like what the interwiever could have asked, and now each reader has to spend an hour googling around.)
Jul 19, 2011
Rank: 2 / 5 (3)
Jul 20, 2011
Rank: not rated yet
I'm not working with Ubuntu (it's too windowish) but who told you that software can be installed in /root only? Or what do you mean by "system partition"?
And how do you manage to run out of disk space in the age of TB disks?
You don't have to delete manually. Use your package manager, synaptic, to remove an application.