Thursday, May 15, 2008

Workload Partitioning (WPAR) in AIX 6.1

The most popular innovation of IBM® AIX® Version 6.1 is clearly workload partitioning (WPARs). Once you get past the marketing hype, you'll need to determine the value that WPARs can provide in your environment. What can WPARs do that Logical Partitions (LPARs) could not? How and when should you use WPARs? Equally as important, when should you not use Workload Partitioning. Finally, how do you create, configure, and administer workload partitions? These topics will all be discussed in this article.

Introduction

WPARs are a bold new innovation, implemented within AIX 6.1. It allows administrators to virtualize their operating system, which allows for fewer operating system images on your IBM System p™ partitioned server. Prior to WPARs, you would need to create a new Logical Partition (LPAR) for each new "isolated" environment. This is no longer necessary (with AIX 6.1 only), as there are many circumstances when one can get along fine with multiple WPARs within one LPAR. Why is this important? Every LPAR requires its own operating system image and a certain number of physical resources. While you can virtualize many of these resources, there are still some physical resources that must be allocated to the system. Furthermore, you need to install patches and technology upgrades to each LPAR. Each LPAR requires its own archiving strategy and DR strategy. It also takes some time to create an LPAR; you also need to do this outside of AIX, through a Hardware Management Console (HMC) or the Integrated Virtualization Manager (IVM).

WPARs are much simpler to manage and can actually be created from the AIX command line or through SMIT. LPARs cannot. By far the biggest disadvantage of LPARs is maintaining multiple images, which goes along with possibly over-committing expensive hardware resources, such as CPU and RAM. In other words, while partitioning helps you consolidate and virtualize hardware within a single box, operating system virtualization through WPAR technology goes one step further and allows for an even more granular approach of resource management. It does this by sharing OS images and is clearly the most efficient use of CPU, RAM, and I/O resources.

Rather than a replacement for LPARs, WPARs are a complement to them and allow one to further virtualize application workloads through operating system virtualization. WPARs allow for new applications to be deployed much more quickly, which is an important side-benefit. On the other side of the coin, it's important to understands the limitations of WPARs. For example, each LPAR is a single point of failure for all WPARs that are created within the LPAR. In the event of an LPAR problem (or a scheduled system outage, for that matter), all underlying WPARs will also be affected.




 

WPARs: How and when to use them

This section further defines the different types of workload partitions and discusses scenarios where WPARs should be used.

As discussed earlier, Workload Partitions (WPARs) are virtualized operating system environments that are created within a single AIX (only supported on AIX 6.1) image. While they may be self-contained in the sense that each WPAR has its own private execution environment with its own filesystems and network addresses, they still run inside the global environment. The global environment -- the actual LPAR -- owns all the physical resources of the logical partition. It is important to also note that the global environment can see all the processes running inside the specific WPARs.

There are two types of WPARs: system workload partitions and application workload partitions. The system WPAR is much closer to a complete version of AIX. The system WPAR has its own dedicated, completely writable filesystems along with its own inetd and cron. Application WPARs are real, lightweight versions of virtualized OS environments. They are extremely limited and can only run application processes, not system daemons such as inetd or cron. One cannot even define remote access to this environment. These are only temporarily objects; they actually disintegrate when the final process of the application partition ends, and as such, are more geared to execute processes than entire applications. Overall, WPARs have no real dependency on hardware and can even be used on POWER4 systems that do not support IBM's PowerVM (formerly known as APV). For AIX administrators, the huge advantage of WPARs is the flexibility of creating new environments without having to create and manage new AIX partitions. Let's look at some scenarios that call for the use of WPARs.

Application/workload isolation

WPARs are tailor-made for working with test and/or QA and development environments. Most larger organizations have at least three environments for their applications. These include development, test, and production. Some environments have as many as five, including demo/training and stress/integration environments. Let's use an example of a common three-tier application environment: Web, application server, and database server. In the land of the LPARs, in an environment where one has five isolated environments, you would need to create 15 LPARs. This is where the WPAR has the most value. In this environment, we would need to create just five LPARs. How is that?

In Table 1, we have five different environments, consisting of a Web server, an application server, and a database server. If we wanted to isolate our environments, the only way to do this would be through logical partitioning. That would involve architecting 15 logical partitions. Of course, we could run some of our Web, application, and database on one LPAR, but if we did that, how would we be able to really mimic our production environments (which would run on separate partitions)? In today's world of 99.9% availability, it is extremely common to give each application environment its own home. With WPARs, we can now do that, without having separate AIX images.


Table 1. Web portal -- LPARs only
 
Development (3 lpars) Demo/Training (3 lpars) Test (3 lpars) Pre-Prod (3 lpars) Production (3 lpars)
1.Dweb01 4.Trweb01 7.Tstweb01 10.Ppweb-01 13.Pweb01
2.Dapp01 5.Trapp01 8.Tstweb01 11.Ppapp01 14.Papp01
3.Dora01 6.Traora01 9.Tstora01 12.Ppora01 15.Pora01

Table 2 illustrates how that is done. Each environment would have its own LPAR, with three WPARs created within each LPAR. Now let's imagine if we had four Web servers, two application servers, and two database servers supporting this environment. Yikes! AIX administrators supporting Fortune 500 companies know what I'm talking about. It can be a nightmare maintaining all these environments. WPARs dramatically simplify the overall work-effort involved in administrating this environment, while at the same time minimizing the expense of having to assign physical resources to logical partitions.


Table 2. Web portal -- WPARs inside of LPARs
 
Development
1 LPAR, 3 WPARs
Demo/Training
1 LPAR, 3 WPARs
Test
1 LPAR, 3 WPARs
Pre-Prod
1 LPAR, 3 WPARs
Production
1 LPAR, 3 WPARs
Dwparweb01 2.Trwparweb01 3.Tstwparweb01 4.Ppweb-01 5.Pweb01
1. Dwaparapp01 2.Trwpapp01 3.Tstwparapp01 4.Ppapp01 5.Papp01
1. Dwparora01 2.Trwparora01 3.Tstwparora01 4.Ppwparora01 5.Pora01

Playing nicely in the sandbox

In virtually every environment I've managed, my staff has begged to have sandbox environments in which to work. These environments would be used only by the systems administrators. It is here that administrators have the opportunity to install new software, test out new patches, install new technology levels, and generally be free to break the system without any effect to the business. Unfortunately, it is always the sandbox that is the first environment that must be given up when a new application needs to be deployed. With WPARs, you can quickly create an isolated environment in which to play. While my preference is to have several WPAR sandboxes within an overall LPAR sandbox, each of these owned by a different administrator, this now becomes less of a luxury than it used to be. Looking at this from another perspective, these WPARs are the training ground for new administrators to learn and practice their craft on. With WPARs, they can now be managed much more efficiently and created without having to assign dedicated devices to them.

Quickly testing an application

The application WPAR can be created in just a few seconds. What better way is there to quickly troubleshoot an application or wayward process? As these are temporary resources, they are destroyed as soon as they end, simplifying the manageability of these partitions.



 

WPARS: When not to use them

This section discusses situations and scenarios where you may not want to use WPARs.

Security

As stated previously, WPAR processes can be seen by the global environment from the central LPAR. If you are running a highly secure type of system, this may be a problem for you from a security standpoint. Further, the root administrator of your LPAR will now have access to your workload partition, possibly compromising the security that the application may require.

Performance

Each WPAR within LPAR is now using the same system resources of the LPAR. You need to be that much more careful when architecting your system and also when stress testing the system. For example, if you're running a performance benchmark on your pre-production system after a new build has been deployed and there are some developers working on the application server while you are testing your database, this will all be done within one LPAR sharing the same resources. Your teams will all need to understand that there will be competing resources now for the same product.

Availability

If you are in an environment where it is very difficult to bring a system down, it's important to note that when performing maintenance on an LPAR that every WPAR defined will be affected. At the same time, if there is a system panic and AIX crashes, every WPAR has now been brought down. From this standpoint, LPARs without WPARs can provide increased availability across your environment, albeit at a cost that may be prohibitive.

Production

I'm extremely conservative when it comes to production. I like to run each tier in production within its own logical partition. I do this because I like the granularity and complete OS isolation that LPARs provide, without having multiple environments (Web, applicatoin, and database) to worry about.

Physical devices

Physical devices are not supported within a WPAR. While there is a way to export devices, this can be a big problem for applications that require non-exportable devices. In this case, they would be restricted to only running in the global environment. For example, Oracle RAC is not supported using Solaris zones because of this limitation, and should not work in a WPAR environment for the very same reason.


 

Creating, configuring, and administering WPARs

This section creates, configures, and administers WPARs, both system and application.

System WPARs

The mkwpar command creates the WPAR, installs the filesystems, and prepares the system (see Listing 1). It also synchronizes the root section of the installed software.


Listing 1. The mkwpar command
 
                
lpar5ml162f_pub[/] > mkwpar -n devpayrollWPAR01
mkwpar: Creating file systems...
 /
 /home
 /opt
 /proc
 /tmp
 /usr
 /var

<< End of Success Section >>

FILESET STATISTICS
------------------
  241  Selected to be installed, of which:
      241  Passed pre-installation verification
  ----
  241  Total to be installed

+-----------------------------------------------------------------------------+
                         Installing Software...
+-----------------------------------------------------------------------------+


Filesets processed:  6 of 241  (Total time:  2 secs).

installp:  APPLYING software for:
        X11.base.smt 6.1.0.1
Filesets processed:  7 of 241  (Total time:  3 secs).
installp:  APPLYING software for:
        X11.help.EN_US.Dt.helpinfo 6.1.0.0
Filesets processed:  8 of 241  (Total time:  3 secs).
installp:  APPLYING software for:
        bos.acct 6.1.0.1
Filesets processed:  9 of 241  (Total time:  3 secs).
installp:  APPLYING software for:
        bos.acct 6.1.0.2
Filesets processed:  10 of 241  (Total time:  4 secs).
installp:  APPLYING software for:
        bos.adt.base 6.1.0.0
        bos.adt.insttools 6.1.0.0
Filesets processed:  12 of 241  (Total time:  4 secs).
installp:  APPLYING software for:
        bos.compat.links 6.1.0.0
        bos.compat.net 6.1.0.0
        bos.compat.termcap 6.1.0.0

Workload partition devpayrollWPAR01 created successfully.
mkwpar: 0960-390 To start the workload partition, execute the 
following as root: startwpar [-v] devpayrollWPAR01

 

Depending on the type of system you are using, this generally takes between two and four minutes. It took me two minutes and 40 seconds, installing 241 filesets on a one-CPU POWER5 processor running at 1654 MHz. To check the status of the WPAR, use the lswpar command (see Listing 2).


Listing 2. Use the lswpar command to check the status of the WPAR
 
                
lpar5ml162f_pub[/] > lswpar
Name              State  Type  Hostname          Directory
-------------------------------------------------------------------------
MyTestWpar1       A      S     MyTestWpar1       /wpars/MyTestWpar1
MyTestWpar2       A      S     MyTestWpar2       /wpars/MyTestWpar2
devpayrollWPAR01  D      S     devpayrollWPAR01  /wpars/devpayrollWPAR01

 

In this case, it is still in what is called the "defined state." We'll need to use the startwpar command to make it active (see Listing 3).


Listing 3. Using the startwpar command
 
                
lpar5ml162f_pub[/] > startwpar -v devpayrollWPAR01
Starting workload partition devpayrollWPAR01.
Mounting all workload partition file systems.
Mounting /wpars/devpayrollWPAR01
Mounting /wpars/devpayrollWPAR01/home
Mounting /wpars/devpayrollWPAR01/opt
Mounting /wpars/devpayrollWPAR01/proc
Mounting /wpars/devpayrollWPAR01/tmp
Mounting /wpars/devpayrollWPAR01/usr
Mounting /wpars/devpayrollWPAR01/var
Loading workload partition.
$corral_t = {
              'name' => 'devpayrollWPAR01',
              'wlm_cpu' => [
                             undef,
                             undef,
                             undef,
                             undef
                           ],
              'path' => '/wpars/devpayrollWPAR01',
              'hostname' => 'devpayrollWPAR01',
              'wlm_procVirtMem' => [
                                     -1,
                                     undef
                                   ],
              'wlm_mem' => [
                             undef,
                             undef,
                             undef,
                             undef
                           ],
              'key' => 3,
              'vips' => [],
              'wlm_rset' => undef,
              'opts' => 4,
              'id' => 0
            };
Exporting workload partition devices.
Starting workload partition subsystem cor_devpayrollWPAR01.
0513-059 The cor_devpayrollWPAR01 Subsystem has been started. Subsystem PID is 753708.
Verifying workload partition startup.
Return Status = SUCCESS.
lpar5ml162f_pub[/] >

 

You can now see that is it is in an active state (see Listing 4)


Listing 4. The WPAR is in an active state
 
                
lpar5ml162f_pub[/] > lswpar
Name              State  Type  Hostname          Directory
-------------------------------------------------------------------------
MyTestWpar1       A      S     MyTestWpar1       /wpars/MyTestWpar1
MyTestWpar2       A      S     MyTestWpar2       /wpars/MyTestWpar2
devpayrollWPAR01  A      S     devpayrollWPAR01  /wpars/devpayrollWPAR01

To login, we'll use the clogin command and our hostname for the WPAR. 

Let's login: lpar5ml162f_pub[/] > clogin devpayrollWPAR01
*******************************************************************************
*                                                                             *
*                                                                             *
*  Welcome to AIX Version 6.1!                                                *
*                                                                             *
*                                                                             *
*  Please see the README file in /usr/lpp/bos for information pertinent to    *
*  this release of the AIX Operating System.                                  *
*                                                                             *
*                                                                             *
*******************************************************************************

 

Let's run some standard AIX commands (see Listing 5).


Listing 5. Some standard AIX commands
 
                
# hostname
devpayrollWPAR01
# w
  10:59AM   up 13 mins,  1 user,  load average: 0.00, 0.00, 0.00
User     tty          login@       idle      JCPU      PCPU what
root     Global      10:59AM          1         0         0 -
# whoami
root
# ps -ef
     UID    PID   PPID   C    STIME    TTY  TIME CMD
    root 258064 573578   0 10:47:42      -  0:00 /usr/sbin/sshd
    root 340006 573578   0 10:47:55      -  0:00 /usr/sbin/rsct/bin/IBM.Servic
    root 356468 573578   0 10:47:56      -  0:00 /usr/sbin/rsct/bin/IBM.AuditR
    root 421948 573578   0 10:47:41      -  0:00 /usr/sbin/rpc.lockd -d 0
    root 471122      1   0 10:47:23      -  0:00 /usr/lib/errdemon
    root 504032 573578   0 10:47:42      -  0:00 /usr/dt/bin/dtlogin
    root 508124 643204  28 11:00:15      ?  0:00 ps -ef
    root 512114 573578   0 10:47:39      -  0:00 /usr/sbin/portmap
    root 561344 573578   0 10:47:56      -  0:00 /usr/sbin/rsct/bin/IBM.CSMAge
    root 573578      1   0 10:47:33      -  0:02 /usr/sbin/srcmstr
    root 602286      1   0 10:47:41      -  0:00 /usr/sbin/cron
    root 606358 573578   0 10:47:41      -  0:00 /usr/sbin/qdaemon
    root 630928      1   0 10:59:02      ?  0:00 clogin devpayrollWPAR01
    root 635076 573578   0 10:47:39      -  0:00 sendmail: accepting connectio
    root 643204 630928   0 10:59:02      ?  0:00 -ksh
    root 651276 573578   0 10:47:39      -  0:00 /usr/sbin/biod 6
    root 655560 573578   0 10:47:41      -  0:00 /usr/sbin/writesrv
    root 737494 573578   0 10:47:54      -  0:00 /usr/sbin/rsct/bin/rmcd -a IB
    root 741406 573578   0 10:47:39      -  0:00 /usr/sbin/inetd
    root 749714 573578   0 10:47:38      -  0:00 /usr/sbin/syslogd
    root      1      0   0 10:47:21      -  0:00 /etc/init
#

 

Your systems administrator can start and stop processes from the WPAR using the SRC or from the command line, just as they would from the global environment. As the Global (LPAR) system administrator, you will note that a WPAR has lots of filesystems. The WPAR environment is created under /wpars (see Listing 6).


Listing 6. Creating the WPAR environment under /wpars
 
                
lpar5ml162f_pub[/wpars/devpayrollWPAR01/wpars] > hostname
lpar5ml162f_pub
# df -k
Filesystem    1024-blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4           131072     19472   86%     8278    62% /
/dev/hd2          3538944    150480   96%    91842    70% /usr
/dev/hd9var        262144    246796    6%      522     1% /var
/dev/hd3           262144    259540    1%       56     1% /tmp
/dev/hd1           131072    130688    1%        8     1% /home
/dev/hd11admin      131072    130708    1%        5     1% /admin
/proc                   -         -    -         -     -  /proc
/dev/hd10opt       262144    119804   55%     3048    11% /opt
/dev/fslv12        131072    103476   22%     2244     9% /wpars/devpayrollWPAR01/ora01
/dev/fslv13        131072    128660    2%        5     1% /wpars/devpayrollWPAR01/home
/opt               262144    119804   55%     3048    11% /wpars/devpayrollWPAR01/opt
/proc                   -         -    -         -     -  /wpars/devpayrollWPAR01/proc
/dev/fslv14        131072    128424    3%        9     1% /wpars/devpayrollWPAR01/tmp
/usr              3538944    150480   96%    91842    70% /wpars/devpayrollWPAR01/usr
/dev/fslv15        131072    116448   12%      370     2% /wpars/devpayrollWPAR01/var

Here is the view from the WPAR

# hostname
devpayrollWPAR01
# df -k
Filesystem    1024-blocks      Free %Used    Iused %Iused Mounted on
/dev/fslv12        131072    103476   22%     2244     9% /
/dev/fslv13        131072    128660    2%        5     1% /home
/opt               262144    119804   55%     3048    11% /opt
/proc                   -         -    -         -     -  /proc
/dev/fslv14        131072    128424    3%        9     1% /tmp
/usr              3538944    150480   96%    91842    70% /usr
/dev/fslv15        131072    116448   12%      370     2% /var

 

Creating filesystems

Let's turn our attention back to the global environment. Let's create a filesystem through SMIT. You cannot create a f/s or volume group from the WPAR, only from the global environment (LPAR).

We need to make sure that the full path of the filesystem (including the WPAR path) is specified (see Figure 1).


Figure 1. The full path of the filesystem is specific in SMIT
The full path of the filesystem is specific in SMIT
 

Figure 2 shows the the file system has been created successfully.


Figure 2. The file system has been created successfully
The file system has been created successfully
 

After it's successfully created, you'll need to make one minor change to the filesystem: the mount group needs to be explicitly defined (see Figure 3). Note that this step is not necessary when using the command line to create the filesystem: # smit chjfs2.


Figure 3. Explicitly defining the mount group
Explicitly defining the mount group
 

Now let's turn back to the WPAR, where you'll create the mountpoint and mount the newly created filesystem (see Listing 7).


Listing 7. Creating the mountpoint and mounting the filesystem
 
                
# mkdir ora
# pwd
/
# mount ora /ora01
# df -k
Filesystem    1024-blocks      Free %Used    Iused %Iused Mounted on
/dev/fslv12        131072    103444   22%     2246     9% /
/dev/fslv13        131072    128660    2%        5     1% /home
/opt               262144    119804   55%     3048    11% /opt
/proc                   -         -    -         -     -  /proc
/dev/fslv14        131072    128424    3%        9     1% /tmp
/usr              3538944    150480   96%    91842    70% /usr
/dev/fslv15        131072    116448   12%      370     2% /var
/ora               131072    103444   22%     2246     9% /ora01
#

 

Note that you also cannot increase the size of a filesystem from the WPAR, only from the global environment. You also cannot serve NFS filesystems from within the WPAR; only NFS clients are supported.

Backups

Remember, there are no physical devices in a WPAR. When backing up the WPAR environment, you need to use the savewpar command, again from the global environment.


Listing 8. Using the savewpar command
 
                
lpar5ml162f_pub[/wpars/devpayrollWPAR01/wpars] > savewpar 
                   -f /admin/payroll.backup devpayrollWPAR01

Creating information file for workload partition devpayrollWPAR01.

Creating list of files to back up.
Backing up 2829 files
2829 of 2829 files (100%)
0512-038 savewpar: Backup Completed Successfully.
lpar5ml162f_pub[/wpars/devpayrollWPAR01/wpars] >

 

You can restore using the restwpar command.

Users and groups

You can maintain users and groups within the WPAR, either from the command line or through SMIT. You should understand that the root user for this environment does not have access to the global environment, only to the WPAR (see LIsting 9).


Listing 9. Maintaining users and groups within the WPAR
 
                
# mkuser test
# mkgroup testing
# hostname
devpayrollWPAR01
# lsuser
Usage: lsuser [-R load_module] [ -c | -f ] [ -a attr attr ... ] 
                                    { "ALL" | user1,user2 ... }
# lsuser test
test id=204 pgrp=staff groups=staff home=/home/test shell=/usr/bin/ksh 
login=true su=true rlogin=true daemon=true admin=false sugroups=ALL admgroups= 
tpath=nosak ttys=ALL expires=0 auth1=SYSTEM auth2=NONE umask=22 registry=files 
SYSTEM=compat logintimes= loginretries=0 pwdwarntime=0 account_locked=false 
minage=0 maxage=0 maxexpired=-1 minalpha=0 minother=0 mindiff=0 maxrepeats=8 minlen=0 
histexpire=0 histsize=0 pwdchecks= dictionlist= default_roles= fsize=2097151 cpu=-1 
data=262144 stack=65536 core=2097151 rss=65536 nofiles=2000 roles=
# lsgroup testing
testing id=203 admin=false users= registry=files
#

 

Now let's turn our attention back to the global environment. You can clearly see in Listing 10 that the user was not created in the global environment, only within that specific WPAR.


Listing 10. The user was not created in the global environment
 
                
lpar5ml162f_pub[/wpars/devpayrollWPAR01/wpars] > lsuser test
3004-687 User "test" does not exist.
lpar5ml162f_pub[/wpars/devpayrollWPAR01/wpars] >

 

WPAR manager

It's worth noting that there is a graphical tool called WPAR manager, which is Java™ based and allows for the centralized management of WPARs (see Figure 4).


Figure 4. WPAR manager
WPAR Manager
 

While a thorough review of this utility is outside the scope of this article, it's definitely worth looking at because using it will increase your ability to manage the overall environment. It will also help you harness innovations such as Workload Partition Manager and WPAR Mobility. Workload Partition Manager allows for resource optimization, allowing you to distribute workloads more efficiently throughout your managed system. WPAR mobility allows you to move running partitions from one frame to another, which increase availability of workloads during scheduled outages.

Application WPARs

To reiterate, an application WPAR is defined as a WPAR that allows an application and/or a process to run inside of it, similar to a wrapper. It is only temporary, not a permanent object, and it will end when the application and/or process ends. To create one, use the wparexec command.


Listing 11. Using the wparexec command to create an application WPAR
 
                
lpar5ml162f_pub[/wpars/devpayrollWPAR01/wpars] > wparexec -n templs1 /usr/bin/ls
Starting workload partition templs1.
Mounting all workload partition file systems.
Loading workload partition.
devpayrollWPAR01
Shutting down all workload partition processes.
lpar5ml162f_pub[/wpars/devpayrollWPAR01/wpars] >

 

To see how the process works while it is working, you will see the creation of the WPAR (see Listing 12).


Listing 12. Seeing the creation of the WPAR
 
                
 lpar5ml162f_pub[/] > lswpar
Name              State  Type  Hostname          Directory
-------------------------------------------------------------------------
MyTestWpar1       A      S     MyTestWpar1       /wpars/MyTestWpar1
MyTestWpar2       A      S     MyTestWpar2       /wpars/MyTestWpar2
devpayrollWPAR01  A      S     devpayrollWPAR01  /wpars/devpayrollWPAR01
evpayrollWPAR01   D      S     evpayrollWPAR01   /wpars/evpayrollWPAR01
templs1           T      A     templs1           /

 

When the process completes, it is gone, just as fast as it was created.


Listing 13. The process is gone
 
                
lpar5ml162f_pub[/] > lswpar
Name              State  Type  Hostname          Directory
-------------------------------------------------------------------------
MyTestWpar1       A      S     MyTestWpar1       /wpars/MyTestWpar1
MyTestWpar2       A      S     MyTestWpar2       /wpars/MyTestWpar2
devpayrollWPAR01  A      S     devpayrollWPAR01  /wpars/devpayrollWPAR01
evpayrollWPAR01   D      S     evpayrollWPAR01   /wpars/evpayrollWPAR01
lpar5ml162f_pub[/] >

 

Truthfully, although it's impressive that you can create application WPARs in a matter of seconds, and it's a feature that Solaris does not have, I think it is most useful for providing additional flexibility for testing purposes.



 

Summary

This article introduced WPARs and discussed the context in which to use them. The article looked at various scenarios in which WPARs should be used. It also discussed the installation, configuration, and administration of WPARs and how they relate to the global (LPAR) environment. You added users, created filesystems, and backed up WPARs. You also introduced utilities such as WPAR manager, which could be used to help manage the WPAR environment. You looked at the different types of WPARs that are available and the limitations of application WPARs compared to system WPARs. You also looked at scenarios in which WPARs may not be considered. The bottom line is that WPARs are an important innovation of AIX 6.1, and used judiciously, can increase your ability to effectively manage your system and reduce cost to the business.

Using Simple Network Management Protocol

The Simple Network Management Protocol (SNMP) is built in to many devices, but often the tools and software that can read and parse this information are too large and complicated when you only want to check a quick statistic or track a particular device or issue. This article looks at some simplified methods for getting SNMP information from your devices and how to integrate this information into the rest of your network's data map.

About this series

The typical UNIX® administrator has a key range of utilities, tricks, and systems he or she uses regularly to aid in the process of administration. There are key utilities, command line chains, and scripts that are used to simplify different processes. Some of these tools come with the operating system, but a majority of the tricks come through years of experience and a desire to ease the system administrator's life. The focus of this series is on getting the most from the available tools across a range of different UNIX environments, including methods of simplifying administration in a heterogeneous environment.



 

SNMP basics

There are many ways you can monitor your UNIX server. See the Resources for some examples of the type of monitoring available. Monitoring a single server is not a problem, but monitoring the same information across a number of servers can present problems. If one of the servers you are in charge of runs out of disk space, you want to know about it before it starts to affect your users and clients.

Monitoring multiple servers in this way, especially if they use a variety of different operating systems, can be a problem. The differences in command line tools, output formats, values, and other information all complicate what should otherwise be a simple process. What is required is a solution that provides a generic interface to the information that works, irrespective of the UNIX variant you are using.

The Simple Network Management Protocol (SNMP) provides a method for managing information about different systems. An agent runs on each system and reports information using SNMP to different managing systems.

SNMP is often a built-in component for network devices such as routers and switches, and is the only method available for retrieving statistics and status information remotely (without logging in to some sort of interface). On most hosts you will need to explicitly run SNMP software to expose information about the host over the SNMP protocol.

Information can be retrieved from an agent either explicitly, by requesting the information using a GET request, or the agent can broadcast information to management systems using the TRAP or INFORM messages. In addition, managing systems can set information and parameters on the agent, but this is usually only used to change the network configuration.

The types of information that can be shared can be quite varied. It can be everything from network settings, statistics, and metric data for network interfaces, through to monitoring CPU load and disk space.

The SNMP standard does not define what information the agent returns; instead, the available information is defined by Management Information Bases (MIBs). The MIB defines the structure of the information that is returned, and are organized into a hierarchical structure using object identifiers (OID). You access information within an agent by requesting data using a specific location within the MIB structure.

For example, some of the more common IDs are shown in Listing 1.


Listing 1. SNMP object IDs
 
                
sysDescr.0      1.3.6.1.2.1.1.1.0
sysObjectId.0   1.3.6.1.2.1.1.2.0
sysUpTime.0     1.3.6.1.2.1.1.3.0
sysContact.0    1.3.6.1.2.1.1.4.0
sysName.0       1.3.6.1.2.1.1.5.0
sysLocation.0   1.3.6.1.2.1.1.6.0
sysServices.0   1.3.6.1.2.1.1.7.0
ifNumber.0      1.3.6.1.2.1.2.1.0

 

You can see from this list that the MIBs are numerical and, effectively, in sequence. When obtaining information you can use a GET request to obtain a specific value, or GETNEXT to get the next property from the last one you read. You can also use the names. The names shown above are all part of the system tree, so you can read the value by getting using the OID 'system.sysUpTime.0'.

The values that you read are also of specific types. You can read integer, floating point, and string values that are all defined as 'scalar' objects. Within these objects are types that are identified with specific significance. For example, time interval values are reported as 'timeticks,' or hundredths of a second. These values need to be converted into a more readable human form before being displayed. There are also MIB objects that return tabular data. This is handled by returning additional OID instances that can be grouped together to make an SNMP table.

From a security perspective, SNMP agents can be associated with a specific community, and managing systems access information by using the community as a method of validating their access to the agent. In Version 1 of the SNMP standard, the community string was the only method of securing or restricting access. With Version 2 of the SNMP standard, the security was improved, but could be complex to handle. With Version 3, considered the current version since 2004, the standard was improved with explicit authentication and access control systems.



 

Getting SNMP statistics

There are many different ways of obtaining information from SNMP systems, including using professional management tools, programming interfaces, and command line tools.

Of the latter, probably the best known and easiest to use is the snmpwalk command, which is part of a larger suite of SNMP tools that allow you to obtain information from SNMP agents directly from the command line. This command will walk the entire subtree of a given management value and return all the information about the system contained within the subtree.

For example, Listing 2 shows the output when querying a local system for all the information within the 'system' tree.


Listing 2. 'Walking' an SNMP tree
 
                
$ snmpwalk -Os -c MCSLP -v 1 localhost system
sysDescr.0 = STRING: Linux tweedledum 2.6.23-gentoo-r8 
             #1 SMP Tue Feb 12 16:32:14 GMT 2008 x86_64
sysObjectID.0 = OID: netSnmpAgentOIDs.10
sysUpTimeInstance = Timeticks: (34145553) 3 days, 22:50:55.53
sysContact.0 = STRING: root@Unknown
sysName.0 = STRING: tweedledum
sysLocation.0 = STRING: serverroom
sysORLastChange.0 = Timeticks: (0) 0:00:00.00
sysORID.1 = OID: snmpFrameworkMIBCompliance
sysORID.2 = OID: snmpMPDCompliance
sysORID.3 = OID: usmMIBCompliance
sysORID.4 = OID: snmpMIB
sysORID.5 = OID: tcpMIB
sysORID.6 = OID: ip
sysORID.7 = OID: udpMIB
sysORID.8 = OID: vacmBasicGroup
sysORDescr.1 = STRING: The SNMP Management Architecture MIB.
sysORDescr.2 = STRING: The MIB for Message Processing and Dispatching.
sysORDescr.3 = STRING: The management information definitions for 
                                    the SNMP User-based Security Model.
sysORDescr.4 = STRING: The MIB module for SNMPv2 entities
sysORDescr.5 = STRING: The MIB module for managing TCP implementations
sysORDescr.6 = STRING: The MIB module for managing IP and ICMP implementations
sysORDescr.7 = STRING: The MIB module for managing UDP implementations
sysORDescr.8 = STRING: View-based Access Control Model for SNMP.
sysORUpTime.1 = Timeticks: (0) 0:00:00.00
sysORUpTime.2 = Timeticks: (0) 0:00:00.00
sysORUpTime.3 = Timeticks: (0) 0:00:00.00
sysORUpTime.4 = Timeticks: (0) 0:00:00.00
sysORUpTime.5 = Timeticks: (0) 0:00:00.00
sysORUpTime.6 = Timeticks: (0) 0:00:00.00
sysORUpTime.7 = Timeticks: (0) 0:00:00.00
sysORUpTime.8 = Timeticks: (0) 0:00:00.00

 

You can see here a range of information about the host, including the operating system (in sysDescr.0), the amount of time that the system has been available (sysUpTimeInstance), and the location of the machine. The interval time here is shown in both its original value (timeticks) and the converted, human-readable days, hours:minutes:seconds.

The uptime or availability of a machine is a very common use for SNMP, as it provides probably the most convenient and efficient method for determine whether a machine is up and processing requests. Other solutions that have been described in past parts of the series include ping or using rwho and ruptime. These latter two solutions are very CPU and network intensive and not very friendly in terms of their resource utilization.

Note, however, the limitation of the uptime described here, which is the information shown in the uptime of the SNMP agent, not the uptime of the entire machine. In most situations the two are same, especially for devices with built-in SNMP monitoring, such as network routers and switches. For computers that expose their status through SNMP, there may be a discrepancy between system and SNMP agent uptime.

You can get a quicker idea of the status of a machine through SNMP using snmpstatus. This obtains a number of data points from a specified SNMP agent, including the IP address, description, uptime, and network statistics (packets sent/received, and IP packets sent/received). For example, if we look at a Solaris host, you can see the simplified information, as shown in Listing 3.


Listing 3. Simplified information
 
                
$ snmpstatus -v1 -c public t1000
[192.168.0.26]=>[SunOS t1000 5.11 snv_81 sun4v] Up: 2:12:10.20
Interfaces: 4, Recv/Trans packets: 643/160 | IP: 456/60
2 interfaces are down!

 

This machine has recently been rebooted (hence the low uptime and packet statistics). The snmpstatus command has also determined that two of the interfaces on the machine (which has four Ethernet ports) are down. This is a good example of the sort of warning information that SNMP can provide to help notify you of an issue that requires further investigating.

For obtaining a specific piece of information, you can use the snmpget command, which reads one or more OIDs directly and reports their value. For special types, it will also convert to a human-readable format. For example, to get the system description and uptime, use the following command (in Listing 4).


Listing 4. Getting system description and uptime information
 
                
$ snmpget -v1 -c public t1000 system.sysUpTime.0 system.sysContact.0
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (867411) 2:24:34.11
SNMPv2-MIB::sysContact.0 = STRING: "System administrator"

 

In isolation, all of these methods are useful, but in reality, you need to be able to monitor and track multiple machines and multiple OIDs to get a full picture of what is going on. We can do this by using one of the many programmable interfaces to SNMP.



 

Getting SNMP data programmatically

The Net::SNMP module for Perl obtains information from one or more agents using SNMP. Other, similar, interfaces are available for other languages, including Python, Ruby, and PHP (see Resources). The interface works by you creating a session that communicates (and if necessary authenticates) with the SNMP agent on the desired host. Once you have an active and valid session, you can request data from the agent directly for one or more OIDs. The information is returned in the form of a hash of information, tied between the OID and the corresponding value.

Listing 5 shows a very simple script that will obtain the system uptime for each of the hosts supplied on the command line.


Listing 5. Getting a single SNMP agent property with Perl and Net::SNMP
 
                
#! /usr/local/bin/perl

use strict;

use Net::SNMP;

my $uptimeOID = '1.3.6.1.2.1.1.3.0';

foreach my $host (@ARGV)
{
    my ($session, $error) = Net::SNMP->session(
        -hostname  =>  $host,
        -community => 'public',
        -port      => 161
        );

    warn ("ERROR for $host: $error\n") unless (defined($session));

    my $result = $session->get_request(
        -varbindlist => [$uptimeOID]
        );

    if (!defined($result))
    {
        warn ("ERROR: " . $session->error . "\n");
    }
    else
    {
        printf("Uptime for %s: %s\n",$host, $result->{$uptimeOID});
    }

    $session->close;
}

 

In the script, we've provided the full numerical OID for the system, sysUpTime property. You have to supply the list of OIDs to obtain when using the get_request() method as a reference to an array, and then pull the information back out from the hash that is returned. In Listing 5 we build the array reference dynamically during the call, and then use the OID as the hash key when printing out the result.

Using the script, we can get a list of the uptimes for each host supplied on the command line (see Listing 6).


Listing 6. List of uptimes for each host
 
                
$ perl uptime.pl tweedledum t1000
Uptime for tweedledum: 4 minutes, 52.52
Uptime for t1000: 6 minutes, 26.12

 

Of course, watching this information manually is hardly efficient.


 

Tracking SNMP data over time

Viewing a single instance of an SNMP OID property at one time is not always very useful. Often you want to monitor something over time (for example, availability), or you want to monitor for changes in particular values. A good example is disk space. SNMP can be configured to record all sorts of information, and disk space is a common system to want to monitor so that you can identify not only when the disk space reaches a particular level, but also when there is a significant change to the disk space, which might signify a problem.

For example, Listing 7 shows a callback-based solution to constantly monitor the diskspace. In the script, we output a running total, but it could be configured to only output the warning message that is triggered when there is a reduction in the diskspace.


Listing 7. Getting a running view of SNMP properties
 
                
#! /usr/local/bin/perl

use strict;
use warnings;
use Net::SNMP qw(snmp_dispatcher);

my $diskspaceOID = '1.3.6.1.4.1.2021.9.1.7.1';

foreach my $host (@ARGV)
{
    my ($session, $error) = Net::SNMP->session(
        -hostname    => $host,
        -nonblocking => 0x1,
        );

    if (!defined($session))
    {
        warn "ERROR: $host produced $error - not monitoring\n"
    }
    else
    {
        my ($last_poll) = (0);

        $session->get_request(
            -varbindlist => [$diskspaceOID],
            -callback    => [
                 \&diskspace_cb, \$last_poll
            ]
            );
    }
}

snmp_dispatcher();

exit 0;

sub diskspace_cb
{
    my ($session, $last_poll) = @_;

    if (!defined($session->var_bind_list))
    {
        printf("%-15s  ERROR: %s\n", $session->hostname, $session->error);
    }
    else
    {
        my $space = $session->var_bind_list->{$diskspaceOID};

        if ($space < ${$last_poll})
        {
            my $diff = ((${$last_poll}-$space)/${$last_poll})*100;
            printf("WARNING: %s has lost %0.2f%% diskspace)\n",
                   $session->hostname,$diff);
        }

        printf("%-15s  Ok (%s)\n",
               $session->hostname,
               $space
               );

        ${$last_poll} = $space;
    }

    $session->get_request(
        -delay       => 60,
        -varbindlist => [$diskspaceOID]
        );
}

 

The script is in two parts, and uses some functionality within the Net::SNMP module that allows you to call a function when an SNMP value is obtained from a host, coupled with the ability to continually monitor hosts and SNMP objects in a simple, but efficient, loop.

The first part sets up each host to monitor the information. We are only monitoring one piece of information, but we could monitor others as part of the solution. The object is configured as 'non-blocking,' so that the script will not wait if the host cannot be reached, but simply move on to the next host. Finally, in the call to get_request(), we submit the callback information. The first argument here is the name of the function to be called when the response is received from the agent. The second is an argument that will be supplied to the function when it is called.

We'll use this argument to be able to record and track the previous value returned by the SNMP call. Within the callback function, we compare the newly returned value and the previous value. If there's a reduction, we calculate the percentage reduction and then report a warning.

The final part of the callback is to specify that another retrieval should occur, here specifying that the next retrieval should be delayed by 60 seconds. The existing callback information is retained. In effect, the script obtains the value from the SNMP agent, calls the callback function, which then queues up another retrieval in the future. Because the same callback is already defined, the process repeats in an endless loop.

Incidentally, the script uses the dskAvail OID value, and calculates the percentage difference based on the last and new values. The dskTable tree that this property is part of actually has a disk percentage property that we could have queried, instead of calculating it manually. However, the value returned is probably not finely grained enough to be useful.

You can see this property and current values by using snmpwalk to output the dskTable tree, which itself is part of the UCD MIB (Listing 8).


Listing 8. Getting a dump of available MIB data
 
                
$ snmpwalk -v 1 localhost -c public UCD-SNMP-MIB::dskTable
UCD-SNMP-MIB::dskIndex.1 = INTEGER: 1
UCD-SNMP-MIB::dskPath.1 = STRING: /
UCD-SNMP-MIB::dskDevice.1 = STRING: /dev/sda3
UCD-SNMP-MIB::dskMinimum.1 = INTEGER: 100000
UCD-SNMP-MIB::dskMinPercent.1 = INTEGER: -1
UCD-SNMP-MIB::dskTotal.1 = INTEGER: 72793272
UCD-SNMP-MIB::dskAvail.1 = INTEGER: 62024000
UCD-SNMP-MIB::dskUsed.1 = INTEGER: 7071512
UCD-SNMP-MIB::dskPercent.1 = INTEGER: 10
UCD-SNMP-MIB::dskPercentNode.1 = INTEGER: 3
UCD-SNMP-MIB::dskErrorFlag.1 = INTEGER: noError(0)
UCD-SNMP-MIB::dskErrorMsg.1 = STRING:

 

To find the property in the first place, you can dump all the known properties by using snmptranslate. By filtering this with grep we can see the information we want: $ snmptranslate -Ts |grep dsk.

To get a numerical value, use snmptranslate and provide the name with the -On option (see Listing 9).


Listing 9. Using snmptranslate
 
                
$ snmptranslate -On UCD-SNMP-MIB::dskAvail 
.1.3.6.1.4.1.2021.9.1.7

 

Running the script, we get a running commentary (and warnings) for the disk space usage on the specified host. See Listing 10.


Listing 10. Monitoring disk space automatically
 
                
$ perl diskspace-auto.pl tweedledum
tweedledum       Ok (50319024)
WARNING: tweedledum has lost 2.67% diskspace)
tweedledum       Ok (48976392)
WARNING: tweedledum has lost 1.65% diskspace)
tweedledum       Ok (48166292)
tweedledum       Ok (48166292)
tweedledum       Ok (48166292)
tweedledum       Ok (48166292)

 

You can see from this output that we have lost some significant space out of the space available on this disk on the specified host. To monitor more hosts, just add more hostnames on the command line.



 

Publishing information through an SNMP agent

The SNMP package includes a daemon, snmpd, which can be configured to expose a variety of information using the SNMP protocol. The configuration for the information to be exposed is controlled using the /etc/snmpd.conf file.

For example, Listing 11 shows the snmpd.conf file on the host used in the earlier examples in this article.


Listing 11. Sample snmpd.conf file
 
                
syslocation  serverroom
proc  imapd 20 10
disk  / 100000
load  5 10 10

 

Each of these lines populates different information. In the example, we set the location of the machine, and then configure some specific items to monitor.

The proc section monitors a specific process, shown here as a monitor for the IMAP daemons for a mail service. The numbers following the option specify the maximum number of processes allowed to be running, and the minimum number that should be running. You can use this to make sure that a particular service is running, and that you haven't exceeded capacity that might indicate a fault. When the process count goes above the MAX value, an SNMP trap is generated.

For the disk, you specify the path to the directory to be monitored and the minimum size (in kilobytes) that the disk should have free. Again, an SNMP trap is triggered if the disk space dips below this value.

Finally, the load information shows the maximum CPU load for 1, 5, and 15 minutes that should be reported. This is equivalent to the output of the uptime command that shows the process loading for these intervals. Like the other configured limits, a trap is raised when these limits are exceeded.

Manually setting this information is not difficult, but also not ideal. A simple menu-based solution, snmpconf, is available if you want a more straightforward method of setting the configuration.


 

Summary

Monitoring your servers and devices is a process that can be very complex, especially as the number of devices in your network increases. SNMP is an efficient, and extensible, method for exposing and reporting this information. Because the interface is consistent across all the devices, you can get uptime, network statistics, disk space, and even process monitoring using the same methods across multiple hosts.

In this article we've looked both at the basics of SNMP and also how to read specific values from different hosts. Using the Net::SNMP perl module we have also examined methods for reading information, using both one-hit and continual monitoring-based solutions. Finally, we examined the methods for configuration additional information to be exposed on a system so that you can customize and monitor the systems you need for your network when using the snmpd daemon.

Speaking UNIX: Inside TCP/IP

The Internet has played a huge role in the advancement of technology, business, and everyday life for huge numbers of the world's people. Configuring a computer to communicate over a network and connecting to the Internet has become an essential task for administrators. This article shows how to configure a server running IBM® AIX® to connect to and use the Internet.

As defined in Wikipedia, the Internet is a worldwide, publicly accessible series of interconnected computer networks that transmit data by packet switching using the standard Internet Protocol (IP). The Internet is, in a sense, the largest network in the world and spans the globe many times over.

The Internet and the Web are not the same thing, although people interchange the terms frequently. The World Wide Web (WWW), or the Web, is a collection of hypertext documents containing images, audio and video clips, and other files interlinked and accessed over the Internet.

A simplistic example of the two is when you connect to your Internet Service Provider (ISP) using a cable modem. Connecting to your ISP using a cable does exactly that: It connects you to your ISP's network and to the Internet, but you are still not using the Web—not until you open a Web browser, such as Mozilla Firefox or Apple Safari, and connect to a Web site.

Configuring TCP/IP and the network adapter

For a server running the AIX operating system to communicate over a network and connect to the Internet, you must configure the network adapter or, depending on the system, edit multiple files to set up TCP/IP. However, IBM has made this task easy with one simple switch inside the System Management Interface Tool (SMIT).

Before beginning to configure the network adapter, first document the following information:

  • IP address to assign to the network adapter
  • Host name of the target server
  • Name of the domain of which the target server is a member
  • Subnet mask
  • Name servers
  • Gateway address

To configure a network adapter on a server running AIX, perform the following steps:

  1. Log in to the system as root or su – to root.
  2. Start the SMIT program, and then choose Communications Applications and Services > TCP/IP > Minimum Configuration & Startup.

    Tip: SMIT has many shortcuts, or fast paths, to allow you to quickly get to the menu or task needed. In this case, simply typing smitty mktcpip bypasses having to navigate through the three previous menus.

  3. Select the network adapter you want to configure from the list shown in Figure 1, and then click Enter. For this example, en2 is used.

    Figure 1. Available network adapters
    Available network adapters

     

    After you select the desired network adapter, a new window is built that displays all the settings you need to configure the network adapter.

  4. Taking the information you documented earlier, type the host name, IP address, subnet mask, domain name, name server IP address, and default gateway address. If you want the network adapter to start as soon as you've made your changes, change START Now to Yes, as shown in Figure 2.

    Figure 2. Minimum configuration settings for the network adapter
    Minimum configuration settings

     
  5. Verify the information you typed, and then click Enter.

    AIX makes the changes requested and starts the TCP/IP daemons (if they haven't already been started). In Figure 3, note that the TCP/IP daemons were already running, as en0 and en1 are configured on this AIX system.



    Figure 3. Network adapter changes in progress
    Changes in progress

     
  6. Exit SMIT by clicking either F10 or Esc + 0 (zero).

 

DNS

A Domain Name System (DNS) server interprets IP addresses into domain names and locations of other computers or Web sites. Without DNS, you would need to enter the IP address into a Web browser. For example, if you didn't have access to DNS and wanted to view IBM's Web site, you would have to type 129.42.18.103 instead of www.ibm.com. DNS eases the use of Web browsing over the Internet as well as connecting to other servers over a network. It's much easier to remember www.ibm.com than 129.42.18.103!

Another advantage to using DNS is that from time to time, IP addresses change on servers. For instance, a server may need to move from one location to another, or a server may be replaced with new equipment. Performing such moves sometimes requires changing the IP address on the server after it reaches its new home because of a different network scheme at the new location. When this happens, it's much easier on users to remember the name of the server instead of having to remember what the old and new IP address are. If the server move was successful, users will never know the difference.

As mentioned earlier, when setting up the network adapter, you typed the IP address location to a name server. This server is your primary DNS server location. It is wise to have several DNS servers to rely on in case one should fail during an address lookup. If multiple DNS servers are used, when the server is looking up an IP or host name to cross-reference, if the first DNS server doesn't have the information or is unavailable, the lookup request will move to the second DNS server, and so on.

To add other DNS servers, you must modify the /etc/resolv.conf file. Listing 1 provides an example of such a file.


Listing 1. An /etc/resolv.conf file
 
                
domain  ATC-DOMAIN.com

nameserver      10.20.30.23
nameserver      10.20.30.24
nameserver 10.20.30.25

search  atc-domain2.com, atc-domain3.com, atc-domain4.com

options debug

 

The sections that follow provide descriptions of each parameter used in Listing 1.

domain

The domain parameter instructs the resolving function to append <domain name> to the end of the string to lookup if the string does not end with a . (period). For example, if the string entered for lookup is ibm, the actual string that will be used is ibm.ATC-DOMAIN.com.

If no domain is included in the domain parameter (that is, using ATC-AIX1 rather than ATC-AIX.ATC-DOMAIN.com), the current server's root domain is assumed.

Note: Only one domain entry can be used in the /etc/resolv.conf file.

nameserver

The nameserver parameter tells the server which DNS server to resolve IP addresses and host names against. The resolver queries each name server in the order provided in /etc/resolv.conf until the IP address has been properly resolved.

Note: Only three name server entries can be used in the /etc/resolv.conf file.

search

The search parameter provides a list of domains to the resolver to use when resolving an IP address or host name. Only the one domain or search entry can be used. If domain is used, search will be the value of domain.

Note: Although you can add several domain names to the search option, there is a limit of 1,024 characters.

Options

The options parameter provides an extra means of debugging and adjusts the lookup function to your liking:

  • debug: This option turns on debugging for the resolving function
  • ndots:<N>: If a domain with <N> or more periods is found, DNS attempts to resolve the string first without appending the search domain list.

    For more information on DNS, see Resources.



 

Testing the Internet connection

Now that you've configured the network adapter and modified /etc/resolv.conf to your liking, you can test your Internet connection. There are many ways to test your connection, so I cover only a couple of the basic, but useful, troubleshooting tools.

The ping command

One of the easiest ways to verify that you've configured your network adapter correctly and can communicate with the Internet is to ping an IP address. The ping command is a tool for testing whether the target is reachable by your server and its network. Basically, ping sends Internet Control Message Protocol (ICMP) packets from your server to the destination server, and then receives a response from the destination server. If the response is received, you have connectivity to the destination server. Using ping is a simple and quick way to determine if there is a problem, how fast data is being sent between servers, and if you have connectivity at all.

The following example confirms that I have connectivity to Google.com's IP address, 64.233.167.99:

                ping 64.233.167.99
PING 64.233.167.99: (64.233.167.99): 56 data bytes
64 bytes from 64.233.167.99: icmp_seq=0 ttl=240 time=40 ms
64 bytes from 64.233.167.99: icmp_seq=1 ttl=240 time=41 ms
64 bytes from 64.233.167.99: icmp_seq=2 ttl=240 time=48 ms
64 bytes from 64.233.167.99: icmp_seq=3 ttl=240 time=40 ms
^C
----64.233.167.99 PING Statistics----
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 40/42/48 ms

 

Sometimes, however, ping can't be the only tool used to confirm connectivity to a server, because a server may have ICMP requests blocked by firewalls. The following example shows ICMP being blocked and simulating no connectivity to IBM.com's IP address, 129.42.18.103:

                ping 129.42.18.103
PING 129.42.18.103: (129.42.18.103): 56 data bytes
^C
----129.42.18.103 PING Statistics----
6 packets transmitted, 0 packets received, 100% packet loss

 

So far, I've only attempted to ping IP addresses. After you've confirmed this first troubleshooting step, it's also a good test to attempt to ping the actual host name that is resolved in DNS:

                ping google.com
PING google.com: (64.233.167.99): 56 data bytes
64 bytes from 64.233.167.99: icmp_seq=0 ttl=240 time=40 ms
64 bytes from 64.233.167.99: icmp_seq=1 ttl=240 time=43 ms
^C
----google.com PING Statistics----
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 40/41/43 ms

 

The traceroute command

Although ping may have failed on one attempt, this doesn't necessarily mean that you don't have connectivity to the Internet. The following example shows that ping failed when trying to communicate with ATC-AIX2:

                ping ATC-AIX2
PING ATC-AIX2: (10.20.90.41): 56 data bytes
^C
----10.20.90.41 PING Statistics----
6 packets transmitted, 0 packets received, 100% packet loss

 

Here's the same test using the traceroute command:

                traceroute ATC-AIX2

traceroute to ATC-AIX2.ATC-DOMAIN.com (10.20.90.41) from ATC-AIX1.ATC-DOMAIN.com 
    (10.20.30.40), 30 hops max
outgoing MTU = 1500
 1  10.20.30.254 (10.20.30.254)  8 ms  3 ms  3 ms
 2  10.20.30.252 (10.20.30.252)  4 ms  4 ms  3 ms
 3  19.16.15.240 (19.16.15.240)  5 ms  5 ms  5 ms
 4  17.30.11.23 (17.30.11.23)  4 ms  5 ms  4 ms
 5  10.20.90.252 (10.20.90.252)  4 ms  5 ms  4 ms
 6  10.20.90.252 (10.20.90.254)  8 ms  5 ms  4 ms
 7  10.20.90.41 (10.20.90.41) 8 ms  6 ms  5 ms

 

The traceroute command can be a helpful troubleshooting tool. If your traceroute results in failure, the output can lead you in the right direction—namely, which server or network equipment may be blocking your access.

The nslookup and dig commands

With the ping and traceroute commands, notice that host names were primarily used. Using host names is helpful for users, because they don't need to memorize difficult IP addresses. One method to determine whether DNS is in fact working is to use the name server lookup, or nslookup, command. Using nslookup can provide host name information as well as IP addresses associated with the host name. This command is useful if users report an issue when they try to connect to a server but don't get a response. In such a situation, it could be that their DNS information isn't updated and old addresses are being used, which you can use nslookup to verify quickly.

The following code displays the IP addresses associated with IBM.com:

                nslookup ibm.com
Server:  ATC-AIX1.ATC-DOMAIN.com
Address:  10.20.30.40

Non-authoritative answer:
Name:    ibm.com
Addresses:  129.42.17.103, 129.42.18.103, 129.42.16.103

 

A newer program similar to nslookup is dig. The dig command provides the same information as nslookup but with a fuller view of how DNS is set up for the target:

                dig ibm.com

; <<>> DiG 9.2.0 <<>> ibm.com
;; global options:  printcmd
;; Got answer:
;; -<<HEADER<<- opcode: QUERY, status: NOERROR, id: 16463
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 13, ADDITIONAL: 13

;; QUESTION SECTION:
;ibm.com.                       IN      A

;; ANSWER SECTION:
ibm.com.                14740   IN      A       129.42.16.103
ibm.com.                14740   IN      A       129.42.17.103
ibm.com.                14740   IN      A       129.42.18.103

;; AUTHORITY SECTION:
.                       24402   IN      NS      I.ROOT-SERVERS.NET.
.                       24402   IN      NS      G.ROOT-SERVERS.NET.
.                       24402   IN      NS      C.ROOT-SERVERS.NET.
.                       24402   IN      NS      F.ROOT-SERVERS.NET.
.                       24402   IN      NS      M.ROOT-SERVERS.NET.
.                       24402   IN      NS      E.ROOT-SERVERS.NET.
.                       24402   IN      NS      L.ROOT-SERVERS.NET.
.                       24402   IN      NS      D.ROOT-SERVERS.NET.
.                       24402   IN      NS      J.ROOT-SERVERS.NET.
.                       24402   IN      NS      H.ROOT-SERVERS.NET.
.                       24402   IN      NS      A.ROOT-SERVERS.NET.
.                       24402   IN      NS      B.ROOT-SERVERS.NET.
.                       24402   IN      NS      K.ROOT-SERVERS.NET.

;; ADDITIONAL SECTION:
I.ROOT-SERVERS.NET.     31808   IN      A       192.36.148.17
G.ROOT-SERVERS.NET.     2961    IN      A       192.112.36.4
C.ROOT-SERVERS.NET.     36288   IN      A       192.33.4.12
F.ROOT-SERVERS.NET.     40867   IN      A       192.5.5.241
M.ROOT-SERVERS.NET.     15357   IN      A       202.12.27.33
E.ROOT-SERVERS.NET.     26901   IN      A       192.203.230.10
L.ROOT-SERVERS.NET.     21568   IN      A       199.7.83.42
D.ROOT-SERVERS.NET.     9464    IN      A       128.8.10.90
J.ROOT-SERVERS.NET.     35190   IN      A       192.58.128.30
H.ROOT-SERVERS.NET.     7936    IN      A       128.63.2.53
A.ROOT-SERVERS.NET.     35190   IN      A       198.41.0.4
B.ROOT-SERVERS.NET.     29770   IN      A       192.228.79.201
K.ROOT-SERVERS.NET.     16473   IN      A       193.0.14.129

;; Query time: 3 msec
;; SERVER: 10.20.30.40#53(10.20.30.40)
;; WHEN: Wed Mar 12 17:02:32 2008
;; MSG SIZE  rcvd: 492

 

Connect to the Web

After successfully testing the Internet connection and verifying that DNS is set up correctly by using the ping, traceroute, and nslookup commands, you're ready to get on the Web. Simply open your preferred Web browser, type the Uniform Resource Locator (URL) you want to view (see Figure 4), and viola! Congratulations: You're on the Internet and viewing the Web!


Figure 4. Connecting to the Web
Connecting to the Web


 

Conclusion

Connecting to the Internet and viewing Web sites is easy in AIX. IBM has made configuration of AIX and network adapters easy. Simply configure your network adapter, direct DNS to a valid DNS server, and you'll be surfing the Web in no time! Enjoy!