An emerging technology under research these days, Grid computing is widely used in business to wring more profits and productivity out of IT resources.
Grid computing is a network of computation just as the internet is a network of communication. Different organizations share their tools and resources and contribute to computational problem-solving. In grid technology an organization may be anything from PCs, to servers and from mainframes to supercomputers. These are known as virtual organizations.
Unlike conventional networks that focus on communication among devices, grid computing harnesses unused processing cycles of all computers in a network for solving problems too intensive for any stand-alone machine. It uses the power of idle computers to work on difficult computational problems.
This technology will enable universities and research institutions to share their supercomputers, servers and storage capacity, allowing them to perform massive calculations quickly and relatively cheaply.
A well-known grid-computing project is the Search for Extraterrestrial Intelligence (SETI), in which PC users worldwide donate unused processor cycles to help the search for signs of extraterrestrial life by analyzing signals coming from outer space. This method saves the project both money and resources.
Uses Grid computing was invented in research and academic institutions. Now, it is also being used by businesses. Some of the uses include:
1. Speeding up trade transactions in the financial services industry and crunching huge volumes of data.
2. Securing and integrating vast stockpiles of data by government agencies.
3. Processing, cleaning, cross-tabulating, and comparing huge amounts of data.
Advantages Due to grid computing, we gain:
1. increased personal productivity because of increased access to the resources required;
2. increased corporate productivity by delivering more products in shorter time periods;
3. A reduction the time required to market with new products and services because development can occur simultaneously using multiple machines.
Types There are three key classes of grids: (a) cluster grids — these are the simplest grids, consisting of one or more systems working together to provide a single point of access to users in a single project or department; (b) campus grids — these enable multiple departments within an organization to share computing resources. Several tasks can be handled this way; and (c) global grids — these are a collection of campus grids that cross organizational boundaries to create very large virtual systems. Users have access to compute power that exceeds the resources available within their own organization
Evolution Grid computing came into existence as a seqence of earlier technology. This is described below.
— First came the mainframes, which were huge computers in corporate and university laboratories placed in a special kind of atmosphere.
Then came a period of desktop machines, which were mini and microcomputers usable by a large community of users. This was followed by client-server and networking technologies and protocols to hook all these machines together and allow them to communicate.
Finally there was the internet, through which users can share information with any other computer on the planet. You can think of the internet as a network of communication, whereas grid computing is a network of computation
Grid computing can be understood in another way. Consider two types of parallel computing:
1. Symmetric Multiprocessing (SMP), which is is done by servers containing two or more processors. The idea is that by putting multiple processors in a computer it can do more work, in a shorter time, by allocating different tasks to different processors.
2. Massively Parallel Processing (MPP), which contains many processors all doing their work in parallel.
MPP differs from SMP in the way that SMP uses a small number of processors (eight or less) connected by a common data bus. In contrast, MPP uses an arbitrary number of CPUs linked in an arbitrary way, with some complex processor allocation software built on top.
Approach Grid computing is an extension of the MPP concept. The idea is similar for you have a set of computers, connected in an arbitrary way, but these systems can survive in different buildings, cities or continents. An example is the North Carolina BioGrid, which currently spans five sites of the University of North Carolina.
Grid computing distributes the processing capability over a number of sites. It is not one vast parallel system but is the combination of many small systems, possibly on different sites. They can be used independently or pooled into any parallel system depending on the demand. Generally speaking, two average-sized systems are less costly to procure than one big one.
Why is it important? Grid computing uses the power of idle computers to perform massive calculations quickly and cheaply.
Most IT departments are being forced to do more with less. Budgets are tight, resources are thin, and skilled human resources can be scarce or expensive.
To top it off, corporate managers know that they have an abundance of idle computing power so we have much of the unused computing capacity in organizations. Mainframes are idle 40 per cent of the time. UNIX servers are actually “serving” something less than 10 per cent of the time. Most PCs do nothing for the better half of a day. These companies don’t need more horsepower, but an efficient use of existing horsepower
With grid computing, resources can be used efficiently. They can be shared across networks.
Key components Grid computing contains the following key components:
1. Security.
2. Data management
3. Resource management
4. Information services
Can I build one today? There are vendors’ proprietary and open source tools available if one needs to build a grid. A good place to start is to download the Open Source Globus Toolkit 3.0 (GT3). GT3 is the first full-scale implementation of the OGSI standard. The toolkit was developed by the Globus Project and is a set of services and software libraries to support grids and grid applications.
GT3 includes software for security, information infrastructure, resource management, data management and communication. Also available are Commodity Grid Kits (CoG) that provide access to grid services through a particular framework, including Java, Python, and Perl.
Types of tools — Infrastructure. This includes schedulers and resource managers, messaging systems and file transfer mechanisms like GridFTP.
— Directory services. These are based on past successful models, such as LDAP, DNS, network management protocols, and indexing services.
— Schedulers and load balancers. These maximize efficiency. Schedulers ensure that jobs are completed in order and load balancers distribute tasks and data management across systems to decrease the chance of bottlenecks.
— Developer tools. These are used to build a grid developer’s focus on different niches (file transfer, communications, environment control), and range from utilities to full-blown APIs.
— Security. This means authentication and authorization, that is, controlling who can access grid resources.
Closer look From a developer’s perspective, grids are composed of virtual organizations using a common suite of protocols. These protocols allow users and applications to run services in a secure manner.
Virtual organizations can be servers or desktop PCs in a single room, or systems scattered around the world connected via the internet. All these systems are able to work together because of certain protocols, which control connectivity, resource allocation and management.
The architecture is defined in the Open Grid Services Architecture (OGSA) standard, developed by the Global Grid Forum (GGF). OGSA defines grid services and structure. GGF tries to bring all the above protocols under OGSA. This is responsible for describing and building a well-defined set of interfaces from which systems can be built. They are all based on open standards like the Web Services Description Language (WSDL).
When grid experts talk about an individual service (for example, an information query), they call it a service instance. They might run at scheduled times, or at arbitrary times.
Good services provide virtualization. A good set of services can hide the complexity of certain requests. Solid virtualization can transform computing into a ubiquitous grid that is more akin to our current electric and water utilities.
Think about it When you plug an appliance into the wall, you don’t know how electricity flows into that appliance, nor do you know where it comes from. It just works, and you access that electricity grid to perform a task. On the computing front, imagine being able to lease a query tool from a grid only when you need it, and not having to worry about databases, browsers, and operating systems.
Challenges There are many challenges with respect to this technology. For instance, grids must be able to quickly ascertain what resources are available on any computer that joins it, without being bogged down by a slow or outdated system.
Another huge issue is making applications work in a grid environment. Right now, most applications work in server or desktop environments, where one set of processors do the work. On a grid, the work can be parceled out to as many systems as are needed to do the work. The results are assembled and sent back to the requesting system.
Once those applications are ported over to a grid environment, you have to start worrying about how the data is shared, chopped, sifted, moved around, secured, and managed.
The user or application that requested the data needs to be the only entity that gets the data back, and it has to be intelligible.
Security is definitely an enormous requirement, for you don’t want everyong accessing grid resources.
Hence, those who add their systems to a grid will want to control who has access to use their resources. Reliability and performance are also important; if the grid isn’t efficient, then it is of little use.
Grid computing remains virtually unexplored, and so many groups are turning their attention towards emerging open standards.
The writer mhrmazhar@yahoo.com is a software engineering student