120 petabytes: IBM building largest data storage array ever

(PhysOrg.com) -- As computers big and small grow ever faster, new and better ways to store more data must be developed as well to keep up with the demand. It wasn’t all that long ago that a 1 gigabyte hard drive on a personal computer seemed the stuff of science fiction. On the higher end, ever faster supercomputers require not just more data storage but an ability to save and retrieve data faster as well, otherwise they’d spend more and more time dedicated to doing nothing but searching for that data. To address the situation, IBM is apparently hard at work assembling the largest data array ever, which will utilize the fastest data storage and retrieval system ever devised, allowing for the storage of 120 petabytes of data (120 million gigabytes) using its newly refined GPFS file system that is capable of indexing 10 billion files in just forty three minutes.

Housed in IBM’s Almaden, California research facility, the new as yet unnamed data array is reportedly being underwritten by an unnamed client with a lot of money to spend. Based on the situation with the U.S. government’s financial situation, it seems more likely the client is someone like Microsoft, Apple or Google, though for what purpose such a huge array would serve is a mystery. Traditionally, supercomputers and their huge data servers have been used to model weather, military or physics experiments. Now however, the focus might be on ways to crunch ever more data in a faster fashion, which sounds a lot like something Google might be interested in doing, especially when you consider that so much data is moving onto the cloud.

The array works by connecting 200,000 standard (likely 1 terabyte drives) together in a traditional warehouse, though more tightly compacted together than usual in special extra wide drawers. The drives will be kept cool by normal air-conditioning in conjunction with a water cooling system, something has been using for years with its supercomputers; when finished the array will dwarf current systems, which are typically able to store just 15 petabytes of data. To make sure drive failures don’t slow down computations on the supercomputer, data is stored in a redundant fashion over multiple drives, which are striped to allow for accessing large chunks of data simultaneously. When a drive dies, data is moved slowly to its replacement, spreading processing time over a relatively long period, allowing the processor to focus on its main tasks.

The new array, while seemingly for the moment to be almost overkill, is likely to be followed by other even larger systems as more and more data storage becomes necessary. The hope though, is that some new breakthrough will occur that will allow for huge increases in capacity that won’t mean simply adding more and more conventional medium to existing technology.


Explore further

HP Introduces New High-capacity, Low-cost Disk Drives

More information: via Technology Review

© 2011 PhysOrg.com

Citation: 120 petabytes: IBM building largest data storage array ever (2011, August 29) retrieved 21 August 2019 from https://phys.org/news/2011-08-petabytes-ibm-largest-storage-array.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.
0 shares

Feedback to editors

User comments

Aug 29, 2011
This is a hot contender for "most vague article, ever"

- We don't know who orderd it [insert speculation about some companies].
- We don't what it's for [insert speculation about some uses].
- And it's "likely to be followed by even larger systems" [insert speculation that there may be future technologies in the future...Duh].

Result: I don't know what I'm supposed to do with an article like that [insert speculation about the aptitude of the author for journalism].

Aug 29, 2011
The singularity has already happened and needs the storage and speed increase to wipe us out.this is evident with all the other "unexplainable" happenings within computer controlled happenings such as speed of light trading in the stock market.this accounts for the unknowns above.

Aug 29, 2011
this is information technology evolution. HD surveillance cameras will fill an 1TB disk in hours. supercomputers need superdata capacity storage. 128bit processors will happen also in the future.

Aug 29, 2011
The information stored in a 100-trillion-synapse brain?
Mapping of a persons brain after death could lead to immortality.

Aug 30, 2011
Mapping of a persons brain after death could lead to immortality.

Making a (working) copy still means that the (biological) original dies at some point. So from the point of view of the individual copied he hasn't gained anything.

Aug 30, 2011
Yes, this new system could hold all the information in the library of congress several thousand times over, or about 1/8 of all the porn on the internet.

Aug 30, 2011
Mapping of a persons brain after death could lead to immortality.

Making a (working) copy still means that the (biological) original dies at some point. So from the point of view of the individual copied he hasn't gained anything.

I doubt a "working" model is possible using current technology. However in another decade or so a digital version might work containing all the memories and thoughts of the deceased. Further ahead, nano technology may be able to reproduce the brain and body using this saved data and stored DNA. Cryogenics for the digital age.

Sep 01, 2011
...allowing for the storage of 120 petabytes of data (120 million gigabytes)


Correct me if I am wrong, but 1Petabyte is 1024Gigabyte.
So, 120PB = 120 * 1024GB which equals only 122,880 GigaBytes

Where did the author get 120 million? That's a thousand times more! This article has poor quality written all over it.

Sep 02, 2011
1Petabyte is 1024Gigabyte

No. 1 terabyte is 1000 gigabytes. 1 petabyte is indeed a million gigabytes.

(As an aside: If you go by powers of two then it's not gigabytes (GB), terabytes (TB), etc. but gibibytes, tebibytes, ... and get the abbreviations GiB, TiB, etc. )

Sep 02, 2011
The only time 1024 comes into it is when you list the actual bytes.
1024 bytes = 1kB
So when you're converting from kB to MB to GB and wherever from there, it's all 1 to 1.
But when you go back down to bytes, you need to add the extra 24 bytes for each kB.
So, 1PB = 1,000,000,000,000,000,000kB = 1,024,000,000,000,000,000,000 bytes

In actuality, it's all translating down to bytes, and to binary bits from there. the kB, MB, GB TB, PB, etc., are just for our convenience.

Sep 02, 2011
The only time 1024 comes into it is when you list the actual bytes.
1024 bytes = 1kB
So when you're converting from kB to MB to GB and wherever from there, it's all 1 to 1.
But when you go back down to bytes, you need to add the extra 24 bytes for each kB.
So, 1PB = 1,000,000,000,000,000,000kB = 1,024,000,000,000,000,000,000 bytes


False.

Binary values are powers of 2. 2^10 = 1k, 2^20 = 1m... 2^20 = 1024 * 1024... not 1024 * 1000...

kilo = 2^10
mega = 2^20
giga = 2^30
tera = 2^40
peta = 2^50

I often wonder where people get their information...

Sep 02, 2011
The reason this is done is for computational efficiency. A division/multiplication by 1024 is a simple 10 bit left/right shift that can be done in hardware using a barrel shifter. A division/multiplication by 1000 takes much longer to accomplish by the processor and is done in a sequence of several steps.

Sep 02, 2011
kilo = 2^10
mega = 2^20
giga = 2^30
tera = 2^40
peta = 2^50

kilo literally means "a thousand" (not 2^10). It's derived from a greek word.

("mega" just means "big", but in SI convention mega, giga, etc. are reserved for powers of ten, not two)


Sep 02, 2011
kilo = 2^10
mega = 2^20
giga = 2^30
tera = 2^40
peta = 2^50

kilo literally means "a thousand" (not 2^10). It's derived from a greek word.

("mega" just means "big", but in SI convention mega, giga, etc. are reserved for powers of ten, not two)



You're dropping context... I was not giving a lesson on etymology.

Sep 02, 2011
Manufacturers use the powers of ten for their product (otherwise they'd have to give you more memory for the same label). So you can bet that these terabyte drives mentioned in the article do not contain 2^20 bytes but merely 10^12 bytes each.

Sep 02, 2011
Manufacturers use the powers of ten for their product (otherwise they'd have to give you more memory for the same label). So you can bet that these terabyte drives mentioned in the article do not contain 2^20 bytes but merely 10^12 bytes each.


I wouldn't be surprised... but engineers don't have that "ripping off the customer" bias.

Sep 02, 2011
The reason this is done is for computational efficiency. A division/multiplication by 1024 is a simple 10 bit left/right shift that can be done in hardware using a barrel shifter. A division/multiplication by 1000 takes much longer to accomplish by the processor and is done in a sequence of several steps.


Actually, it's because 1024 is divisble by eight, whereas 1000 is not. Eight bits in a byte... Don't put more into it than there actually is.

Sep 02, 2011
I wouldn't be surprised... but engineers don't have that "ripping off the customer" bias.
Unfortunately (we) engineers aren't the ones who decide stuff like that. I've seen it often enough. Giving the customer maximum benefit is NEVER a winning strategy in a company - making maximum profit is

(Just look at Apple. They have perfected the sell-overpriced-junk-through-media-hype marketing ploy. Their trinkets are fully 60% profit and the specs are abominable compared to other products on the market. )

Sep 04, 2011
This is a hot contender for "most vague article, ever"

- We don't know who orderd it [insert speculation about some companies].
- We don't what it's for [insert speculation about some uses].
- And it's "likely to be followed by even larger systems" [insert speculation that there may be future technologies in the future...Duh].

Result: I don't know what I'm supposed to do with an article like that [insert speculation about the aptitude of the author for journalism].
Hey auntie did you check the link at the end of the article, or did you just jump at the chance to dis somebody? [insert speculation about the nature of aunties gut bile and lack of energy for exploring references]

Sep 04, 2011
Yes, I did check it, and that article actually has some substance - my gripe was that the type of 'writeup' that the physorg editor did here is completely useless. A naked link would have been equally informative.

Sep 04, 2011
Thanks for correcting me!
I forgot there's Tera after Giga and before Peta!

Sep 06, 2011
The reason this is done is for computational efficiency. A division/multiplication by 1024 is a simple 10 bit left/right shift that can be done in hardware using a barrel shifter. A division/multiplication by 1000 takes much longer to accomplish by the processor and is done in a sequence of several steps.


Actually, it's because 1024 is divisble by eight, whereas 1000 is not. Eight bits in a byte... Don't put more into it than there actually is.


And why are there 8 bits in a byte instead of, say, 10?... because it is a binary power of two.

The answer is that 1024 is a power of 2, and 1000 is not... as I said.

I'm a software and firmware engineer with 15 years of experience.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more