Home Asia-Pacific II 2005 Supercomputers – the science, the fiction, the fact and the future

Supercomputers – the science, the fiction, the fact and the future

by david.nunes
Scott HoustonIssue:Asia-Pacific II 2005
Article no.:13
Topic:Supercomputers – the science, the fiction, the fact and the future
Author:Scott Houston
Title:Business Development Manager
Organisation:New Zealand Super Computer Centre
PDF size:100KB

About author

Scott Houston was the Chief Technical Officer at Weta Digital during the production of JRR Tolkien’s ‘The Two Towers’ and ‘The Return of the King’ Lord of the Rings movies. During that period, Weta Digital built the largest computer processing facility in the Southern Hemisphere. After these movies, Weta Digital and Telecom New Zealand partnered to create the New Zealand Supercomputer Centre, and Scott now represents Weta Digital in this exciting new venture. Mr Houston has over 20 years experience in the IT industry. Previously, he was the New Zealand Regional Manager for Silicon Graphics and the Southern Regional Channel Manager for Compaq in New Zealand.

Article abstract

Supercomputers are used to solve problems that are too complex, or too massive, for standard computers. Most of today’s supercomputers consist of arrays of thousands of commodity processors installed in racks. Grid computing, a form of distributed computing, involves sharing computing, application, data, storage or network resources dynamically, using geographically dispersed equipment. Originally, supercomputers and grid computing were used for massive scientific problems. Today, they are often used to handle the peak computing needs of corporations or to animate films.

Full Article

Like many people of my generation, my first encounter with a supercomputer was watching HAL 9000 in ‘2001: A Space Odyssey’, when I was about seven years old. I wonder now, as I did then, what are the limits of technology and where will this technology take us. After over 25 years in the computer industry, I have no fears of a future dominated by ‘The Matrix’ or Skynet (Terminator Movies), instead I am excited about the possibilities previously unimaginable amounts of computer processing power can give us. The history of supercomputing A supercomputer can be defined as the most powerful computer, or array of computers, in existence at the time of its construction. Supercomputers have traditionally been used to solve problems that are too complex or too massive for standard computers. The first supercomputer was developed in the early 1970s, when Seymour Cray introduced the ‘Cray 1’ supercomputer. Because microprocessors were not yet available, the processor consisted of individual integrated circuits. Successive generations of supercomputers were developed by Cray, with other companies such as IBM, NEC and Unisys introducing their own versions in the 1970s and 1980s. Historically, the fastest supercomputers were proprietary systems developed and optimised for specific needs such as code breaking, weather analysis or nuclear simulation. This performance also came with a huge price tag, and only selected researchers within these often-secretive facilities had access to this processing power. In the 1980s, the National Science Foundation in the USA created the NSF supercomputing centres. Via this program and its PACI (Partnership for Advanced Computational Infrastructure) successor, the NSF centres have provided wider community access to these high-end computers that previously had been unavailable to most scientists and IT experts. The top 500 list of the current world’s fastest supercomputers can be found at www.top500.org. In the top spot is IBM’s Blue Gene/L, with over 32,000 processors linked together; it is owned by the US Department of Energy and is primarily used for nuclear test simulation. Second is NASA’s AMES ‘Columbia’ system of over 10,000 processors in a SGI Altix server. Third on the list is NEC’s ‘Earth Simulator’ with over 5000 processors. Clustered computers The ability to cluster large numbers of PCs together has led to the top 500 list now being dominated by clusters of PCs (58.5 per cent) and Intel processors now account for 63 per cent of the processors used to drive the latest list of supercomputers. This landscape is changing rapidly; fully 71 per cent of the top 500 computers were installed in 2004, and only 14 of the top 500 supercomputers were installed before 2002. The advent of blade technology has commodified these clusters and enabled them to be sold at a fraction of the cost of traditional proprietary systems. IBM, HP, Dell and Sun have all released blade offerings that enable business to grow their processing capability, according to need, at a very granular, even processor by processor level. Server blades are mounted vertically in a chassis with most vendors supporting over 150 processors in a single 42U rack. This has allowed very low-cost entry into this market; a rack of 150 processors is now available for under US$250k. Of course this does not mean that the ‘end is nigh’ for mainframes, as most legacy applications are designed to run on very large systems or as a single system image. Consequently, there is a lot of work to do yet to port code from a legacy mainframe to a blade cluster parallel environment where applications are ‘parallelised’ and run on multiple processors. This has created a new business sector that companies like Massively Parallel and Aspeed are addressing. While blade technology certainly offers greater ‘bang for the buck’ than the traditional mainframe and you can generally fit thousands of CPUs in a single computer room, this has lead to major problems in keeping large numbers of densely populated CPUs cool. As many companies start to port their applications from mainframes to server farms, they are finding that their infrastructure, i.e. switches, UPS power and air conditioning systems, simply cannot cope with hundreds or thousands of servers in a single room. Grid and distributed computing Alongside the growth of these server farms is a new phenomenon called ‘Grid computing’. This term was coined by Ian Foster and Carl Kesselman in 1997, who are now affectionately known as the ‘Grid fathers’. The term ‘Grid’ is an analogy; it refers to computation resources being made available like electricity. Grid computing is a form of distributed computing that involves coordinating and sharing computing, application, data, storage or network resources across dynamic and geographically dispersed organizations. Grid technologies promise to change the way organizations tackle complex computational problems. However, the vision of large scale resource sharing is still, in great part, in the hands of the research community. Grid computing is an evolving area of computing, where standards and technology are still being developed to enable this new paradigm. The TeraGrid, in the USA, is probably the best-known supercomputer grid and is one of the largest grid-based infrastructures ever created. The TeraGrid project was first launched by the National Science Foundation in August 2001 with US$53 million to fund four sites: the National Center for Supercomputing Applications (NCSA), at the University of Illinois, Urbana-Champaign, the San Diego Supercomputer Center (SDSC), at the University of California, San Diego, Argonne National Laboratory, in Argonne, IL, and Center for Advanced Computing Research (CACR), at the California Institute of Technology, in Pasadena. In October 2002, the Pittsburgh Supercomputer Center was added to the TeraGrid partnership when NSF announced US$35 million in supplementary funding. The Teragrid provides a high bandwidth (over 40Gbit) connection to these sites and uses the shared computing resources for such scientific challenges as the study of cosmological dark matter, real time weather forecasting, quantum chemistry and biomolecular electrostatics. At CERN, the European Organization for Nuclear Research, in March 2005, the Large Hadron Collider Computing Grid (LCG) project announced that the computing Grid it is operating now includes more than 100 sites in 31 countries. This makes it the world’s largest international scientific Grid. This Grid was established to deal with the anticipated huge computing needs of the Large Hadron Collider (LHC), currently being built at CERN near Geneva, Switzerland. The sites participating in the LCG project are primarily universities and research laboratories. They contribute more than 10,000 Central Processor Units (CPUs) and a total of nearly 10 million Gigabytes of storage capacity on disk and tape. This ‘Grid’ concept takes a very large problem, i.e. a nuclear physics experiment and uses a batch-scheduling system to break the problem up and run it across a shared pool of resources in various locations. In the late 1990s, Berkley Labs in the USA were trying to solve a similar problem: that of analysing the vast amounts of data from radio telescopes to find signals that might indicate extra-terrestrial intelligence. Their novel approach was to break down this data into very small work groups that could be run on a PC at home or work. They announced plans for SETI@home in 1998, and 400,000 people pre-registered during the next year. In May 1999 they released the Windows and Macintosh versions of the client. Within a week, about 200,000 people had downloaded and run the client; today, this number has grown to over 5 million. People in 226 countries run SETI@home and have dedicated over two million years of CPU time in the search for extra terrestrial intelligence. Today, you can also use the spare CPU cycles on your desktop to predict weather patterns (at www.climateprediction.net) or to search for spinning neutron stars (pulsars) at http://einstein.phys.uwm.edu/). Utility computing models Supercomputing is now moving from its traditional scientific sectors to more commercial markets and access to these systems has never been easier; this has given rise to the latest paradigm of ‘Utility Computing’. Many in the computer industry view this as a cyclic move back to a mainframe, bureau-type service with the fundamental differences that infrastructure costs are now far lower and, with grid computing tools, the processing power need not all be co-located in the same place. This gives rise to a vision of data processing jobs that can be launched, via the Internet, from anywhere in the world and run using multiple clusters of computers in various locations. This could prove to be the Holy Grail for many IT managers and CIO’s around the world for two reasons. The first reason is the promise of virtually unlimited processing power and storage available and billed on a when and as needed basis to reduce operating costs. The second reason is the possibility, at a later date, to leverage their own infrastructure investments and earn incremental revenue by leasing their own ‘spare’ CPU cycles to others. Imagine the capacity available in schools, government departments and business that if instead of turning off their PC’s in the evening and at the weekends, these systems could be used to provide weather modelling or seismic simulations at a commercial rate. It is not as easy as it seems to coordinate disparate systems, in different locations, on different networks, and using different operating systems and security levels, but batch scheduling tools, such as CONDOR from the University of Wisconsin and the GLOBUS TOOLKIT, do just that. These are open source applications and, for a small fee, end user support is often available. For most IT managers, except schools and Universities, this is too difficult and is probably not worth the effort or the cost. The future I believe the development of the grid will lead to the evolution of truly global commercial supercomputer environments, made up of dedicated sites spread around the world, and subscribers that use or provide data processing capacity on an on-demand basis. This may even lead to a new currency and exchange, that of the spot price of a spare CPU based on a Giga hertz per hour rating. Behind the scenes there is a lot of jockeying for position in this new market with companies such as IBM, its large data centre in Poughkeepsie and its ‘Deep Computing On Demand’ initiative. Sun Microsystems Inc has also been very vocal about entering this market. Sun’s objective, senior executives have said, is to provide grid computing power to corporations, academic institutions and government agencies for specific data processing projects, offering a less costly substitute for buying and assembling the computing capacity themselves. “Grid computing promises to be the fourth important new wave of technology that the company has developed in its 24-year history, after workstations, application servers and Java Web services”, noted Sun Chairman and CEO, Scott McNealy. Don’t write off Microsoft in this space either, after losing ground to Linux as the operating system of choice for these large clusters, expect to see Microsoft fight back, and they have the budget and people to make it happen. The biggest challenge in this new market won’t be security or bandwidth, neither is insurmountable, but the philosophical objections to sending your company’s valuable data outside your firewall or running external processes and tasks inside your network. Like the creator of the first supercomputer, Seymour Cray, it will be the visionaries within the industry and user base that take the first steps to commercialising this model and within a few years we won’t understand how we ever got by without it.

Related Articles

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More