The Simple Network Management Protocol (SNMP) is built
in to many devices, but often the tools and software
that can read and parse this information are too large
and complicated when you only want to check a quick
statistic or track a particular device or issue. This
article looks at some simplified methods for getting
SNMP information from your devices and how to integrate
this information into the rest of your network's data
map.
About this series
The typical UNIX® administrator has a key range of
utilities, tricks, and systems he or she uses regularly to
aid in the process of administration. There are key
utilities, command line chains, and scripts that are used to
simplify different processes. Some of these tools come with
the operating system, but a majority of the tricks come
through years of experience and a desire to ease the system
administrator's life. The focus of this series is on getting
the most from the available tools across a range of
different UNIX environments, including methods of
simplifying administration in a heterogeneous environment.
SNMP basics
There are many ways you can monitor your UNIX server. See
the Resources
for some examples of the type of monitoring available.
Monitoring a single server is not a problem, but monitoring
the same information across a number of servers can present
problems. If one of the servers you are in charge of runs
out of disk space, you want to know about it before it
starts to affect your users and clients.
Monitoring multiple servers in this way, especially if
they use a variety of different operating systems, can be a
problem. The differences in command line tools, output
formats, values, and other information all complicate what
should otherwise be a simple process. What is required is a
solution that provides a generic interface to the
information that works, irrespective of the UNIX variant you
are using.
The Simple Network Management Protocol (SNMP) provides a
method for managing information about different systems. An
agent runs on each system and reports information using SNMP
to different managing systems.
SNMP is often a built-in component for network devices
such as routers and switches, and is the only method
available for retrieving statistics and status information
remotely (without logging in to some sort of interface). On
most hosts you will need to explicitly run SNMP software to
expose information about the host over the SNMP protocol.
Information can be retrieved from an agent either
explicitly, by requesting the information using a GET
request, or the agent can broadcast information to
management systems using the TRAP or INFORM messages. In
addition, managing systems can set information and
parameters on the agent, but this is usually only used to
change the network configuration.
The types of information that can be shared can be quite
varied. It can be everything from network settings,
statistics, and metric data for network interfaces, through
to monitoring CPU load and disk space.
The SNMP standard does not define what information the
agent returns; instead, the available information is defined
by Management Information Bases (MIBs). The MIB defines the
structure of the information that is returned, and are
organized into a hierarchical structure using object
identifiers (OID). You access information within an agent by
requesting data using a specific location within the MIB
structure.
For example, some of the more common IDs are shown in
Listing 1.
Listing 1. SNMP object IDs
sysDescr.0 1.3.6.1.2.1.1.1.0
sysObjectId.0 1.3.6.1.2.1.1.2.0
sysUpTime.0 1.3.6.1.2.1.1.3.0
sysContact.0 1.3.6.1.2.1.1.4.0
sysName.0 1.3.6.1.2.1.1.5.0
sysLocation.0 1.3.6.1.2.1.1.6.0
sysServices.0 1.3.6.1.2.1.1.7.0
ifNumber.0 1.3.6.1.2.1.2.1.0
|
You can see from this list that the MIBs are numerical and,
effectively, in sequence. When obtaining information you can
use a GET request to obtain a specific value, or GETNEXT to
get the next property from the last one you read. You can
also use the names. The names shown above are all part of
the system tree, so you can read the value by getting using
the OID 'system.sysUpTime.0'.
The values that you read are also of specific types. You
can read integer, floating point, and string values that are
all defined as 'scalar' objects. Within these objects are
types that are identified with specific significance. For
example, time interval values are reported as 'timeticks,'
or hundredths of a second. These values need to be converted
into a more readable human form before being displayed.
There are also MIB objects that return tabular data. This is
handled by returning additional OID instances that can be
grouped together to make an SNMP table.
From a security perspective, SNMP agents can be
associated with a specific community, and managing systems
access information by using the community as a method of
validating their access to the agent. In Version 1 of the
SNMP standard, the community string was the only method of
securing or restricting access. With Version 2 of the SNMP
standard, the security was improved, but could be complex to
handle. With Version 3, considered the current version since
2004, the standard was improved with explicit authentication
and access control systems.
Getting SNMP statistics
There are many different ways of obtaining information
from SNMP systems, including using professional management
tools, programming interfaces, and command line tools.
Of the latter, probably the best known and easiest to use
is the snmpwalk command, which is part of a larger suite of
SNMP tools that allow you to obtain information from SNMP
agents directly from the command line. This command will
walk the entire subtree of a given management value and
return all the information about the system contained within
the subtree.
For example, Listing 2 shows the output when querying a
local system for all the information within the 'system'
tree.
Listing 2. 'Walking' an SNMP tree
$ snmpwalk -Os -c MCSLP -v 1 localhost system
sysDescr.0 = STRING: Linux tweedledum 2.6.23-gentoo-r8
#1 SMP Tue Feb 12 16:32:14 GMT 2008 x86_64
sysObjectID.0 = OID: netSnmpAgentOIDs.10
sysUpTimeInstance = Timeticks: (34145553) 3 days, 22:50:55.53
sysContact.0 = STRING: root@Unknown
sysName.0 = STRING: tweedledum
sysLocation.0 = STRING: serverroom
sysORLastChange.0 = Timeticks: (0) 0:00:00.00
sysORID.1 = OID: snmpFrameworkMIBCompliance
sysORID.2 = OID: snmpMPDCompliance
sysORID.3 = OID: usmMIBCompliance
sysORID.4 = OID: snmpMIB
sysORID.5 = OID: tcpMIB
sysORID.6 = OID: ip
sysORID.7 = OID: udpMIB
sysORID.8 = OID: vacmBasicGroup
sysORDescr.1 = STRING: The SNMP Management Architecture MIB.
sysORDescr.2 = STRING: The MIB for Message Processing and Dispatching.
sysORDescr.3 = STRING: The management information definitions for
the SNMP User-based Security Model.
sysORDescr.4 = STRING: The MIB module for SNMPv2 entities
sysORDescr.5 = STRING: The MIB module for managing TCP implementations
sysORDescr.6 = STRING: The MIB module for managing IP and ICMP implementations
sysORDescr.7 = STRING: The MIB module for managing UDP implementations
sysORDescr.8 = STRING: View-based Access Control Model for SNMP.
sysORUpTime.1 = Timeticks: (0) 0:00:00.00
sysORUpTime.2 = Timeticks: (0) 0:00:00.00
sysORUpTime.3 = Timeticks: (0) 0:00:00.00
sysORUpTime.4 = Timeticks: (0) 0:00:00.00
sysORUpTime.5 = Timeticks: (0) 0:00:00.00
sysORUpTime.6 = Timeticks: (0) 0:00:00.00
sysORUpTime.7 = Timeticks: (0) 0:00:00.00
sysORUpTime.8 = Timeticks: (0) 0:00:00.00
|
You can see here a range of information about the host, including the
operating system (in sysDescr.0 ), the amount of
time that the system has been available (sysUpTimeInstance ),
and the location of the machine. The interval time here is
shown in both its original value (timeticks) and the
converted, human-readable days, hours:minutes:seconds.
The uptime or availability of a machine is a very common
use for SNMP, as it provides probably the most convenient
and efficient method for determine whether a machine is up
and processing requests. Other solutions that have been
described in past parts of the series include ping or using
rwho and ruptime. These latter two solutions are very CPU
and network intensive and not very friendly in terms of
their resource utilization.
Note, however, the limitation of the uptime described
here, which is the information shown in the uptime of the
SNMP agent, not the uptime of the entire machine. In most
situations the two are same, especially for devices with
built-in SNMP monitoring, such as network routers and
switches. For computers that expose their status through
SNMP, there may be a discrepancy between system and SNMP
agent uptime.
You can get a quicker idea of the status of a machine
through SNMP using snmpstatus. This obtains a number of data
points from a specified SNMP agent, including the IP
address, description, uptime, and network statistics
(packets sent/received, and IP packets sent/received). For
example, if we look at a Solaris host, you can see the
simplified information, as shown in Listing 3.
Listing 3. Simplified information
$ snmpstatus -v1 -c public t1000
[192.168.0.26]=>[SunOS t1000 5.11 snv_81 sun4v] Up: 2:12:10.20
Interfaces: 4, Recv/Trans packets: 643/160 | IP: 456/60
2 interfaces are down!
|
This machine has recently been rebooted (hence the low uptime and
packet statistics). The snmpstatus command has also
determined that two of the interfaces on the machine (which
has four Ethernet ports) are down. This is a good example of
the sort of warning information that SNMP can provide to
help notify you of an issue that requires further
investigating.
For obtaining a specific piece of information, you can
use the snmpget command, which reads one or more OIDs
directly and reports their value. For special types, it will
also convert to a human-readable format. For example, to get
the system description and uptime, use the following command
(in Listing 4).
Listing 4. Getting system description
and uptime information
$ snmpget -v1 -c public t1000 system.sysUpTime.0 system.sysContact.0
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (867411) 2:24:34.11
SNMPv2-MIB::sysContact.0 = STRING: "System administrator"
|
In isolation, all of these methods are useful, but in reality, you need
to be able to monitor and track multiple machines and
multiple OIDs to get a full picture of what is going on. We
can do this by using one of the many programmable interfaces
to SNMP.
Getting SNMP data
programmatically
The Net::SNMP module for Perl obtains
information from one or more agents using SNMP. Other,
similar, interfaces are available for other languages,
including Python, Ruby, and PHP (see
Resources).
The interface works by you creating a session that
communicates (and if necessary authenticates) with the SNMP
agent on the desired host. Once you have an active and valid
session, you can request data from the agent directly for
one or more OIDs. The information is returned in the form of
a hash of information, tied between the OID and the
corresponding value.
Listing 5 shows a very simple script that will obtain the
system uptime for each of the hosts supplied on the command
line.
Listing 5. Getting a single SNMP agent
property with Perl and Net::SNMP
#! /usr/local/bin/perl
use strict;
use Net::SNMP;
my $uptimeOID = '1.3.6.1.2.1.1.3.0';
foreach my $host (@ARGV)
{
my ($session, $error) = Net::SNMP->session(
-hostname => $host,
-community => 'public',
-port => 161
);
warn ("ERROR for $host: $error\n") unless (defined($session));
my $result = $session->get_request(
-varbindlist => [$uptimeOID]
);
if (!defined($result))
{
warn ("ERROR: " . $session->error . "\n");
}
else
{
printf("Uptime for %s: %s\n",$host, $result->{$uptimeOID});
}
$session->close;
}
|
In the script, we've provided the full numerical OID for the system,
sysUpTime property. You have to supply the list of OIDs to
obtain when using the get_request() method as a
reference to an array, and then pull the information back
out from the hash that is returned. In Listing 5 we build
the array reference dynamically during the call, and then
use the OID as the hash key when printing out the result.
Using the script, we can get a list of the uptimes for
each host supplied on the command line (see Listing 6).
Listing 6. List of uptimes for each host
$ perl uptime.pl tweedledum t1000
Uptime for tweedledum: 4 minutes, 52.52
Uptime for t1000: 6 minutes, 26.12
|
Of course, watching this information manually is hardly efficient.
Tracking SNMP data
over time
Viewing a single instance of an SNMP OID property at one
time is not always very useful. Often you want to monitor
something over time (for example, availability), or you want
to monitor for changes in particular values. A good example
is disk space. SNMP can be configured to record all sorts of
information, and disk space is a common system to want to
monitor so that you can identify not only when the disk
space reaches a particular level, but also when there is a
significant change to the disk space, which might signify a
problem.
For example, Listing 7 shows a callback-based solution to
constantly monitor the diskspace. In the script, we output a
running total, but it could be configured to only output the
warning message that is triggered when there is a reduction
in the diskspace.
Listing 7. Getting a running view of
SNMP properties
#! /usr/local/bin/perl
use strict;
use warnings;
use Net::SNMP qw(snmp_dispatcher);
my $diskspaceOID = '1.3.6.1.4.1.2021.9.1.7.1';
foreach my $host (@ARGV)
{
my ($session, $error) = Net::SNMP->session(
-hostname => $host,
-nonblocking => 0x1,
);
if (!defined($session))
{
warn "ERROR: $host produced $error - not monitoring\n"
}
else
{
my ($last_poll) = (0);
$session->get_request(
-varbindlist => [$diskspaceOID],
-callback => [
\&diskspace_cb, \$last_poll
]
);
}
}
snmp_dispatcher();
exit 0;
sub diskspace_cb
{
my ($session, $last_poll) = @_;
if (!defined($session->var_bind_list))
{
printf("%-15s ERROR: %s\n", $session->hostname, $session->error);
}
else
{
my $space = $session->var_bind_list->{$diskspaceOID};
if ($space < ${$last_poll})
{
my $diff = ((${$last_poll}-$space)/${$last_poll})*100;
printf("WARNING: %s has lost %0.2f%% diskspace)\n",
$session->hostname,$diff);
}
printf("%-15s Ok (%s)\n",
$session->hostname,
$space
);
${$last_poll} = $space;
}
$session->get_request(
-delay => 60,
-varbindlist => [$diskspaceOID]
);
}
|
The script is in two parts, and uses some functionality within the
Net::SNMP module that allows you to call a
function when an SNMP value is obtained from a host, coupled
with the ability to continually monitor hosts and SNMP
objects in a simple, but efficient, loop.
The first part sets up each host to monitor the
information. We are only monitoring one piece of
information, but we could monitor others as part of the
solution. The object is configured as 'non-blocking,' so
that the script will not wait if the host cannot be reached,
but simply move on to the next host. Finally, in the call to
get_request() , we submit the callback
information. The first argument here is the name of the
function to be called when the response is received from the
agent. The second is an argument that will be supplied to
the function when it is called.
We'll use this argument to be able to record and track
the previous value returned by the SNMP call. Within the
callback function, we compare the newly returned value and
the previous value. If there's a reduction, we calculate the
percentage reduction and then report a warning.
The final part of the callback is to specify that another
retrieval should occur, here specifying that the next
retrieval should be delayed by 60 seconds. The existing
callback information is retained. In effect, the script
obtains the value from the SNMP agent, calls the callback
function, which then queues up another retrieval in the
future. Because the same callback is already defined, the
process repeats in an endless loop.
Incidentally, the script uses the dskAvail OID value, and
calculates the percentage difference based on the last and
new values. The dskTable tree that this property is part of
actually has a disk percentage property that we could have
queried, instead of calculating it manually. However, the
value returned is probably not finely grained enough to be
useful.
You can see this property and current values by using
snmpwalk to output the dskTable tree, which itself is part
of the UCD MIB (Listing 8).
Listing 8. Getting a dump of available
MIB data
$ snmpwalk -v 1 localhost -c public UCD-SNMP-MIB::dskTable
UCD-SNMP-MIB::dskIndex.1 = INTEGER: 1
UCD-SNMP-MIB::dskPath.1 = STRING: /
UCD-SNMP-MIB::dskDevice.1 = STRING: /dev/sda3
UCD-SNMP-MIB::dskMinimum.1 = INTEGER: 100000
UCD-SNMP-MIB::dskMinPercent.1 = INTEGER: -1
UCD-SNMP-MIB::dskTotal.1 = INTEGER: 72793272
UCD-SNMP-MIB::dskAvail.1 = INTEGER: 62024000
UCD-SNMP-MIB::dskUsed.1 = INTEGER: 7071512
UCD-SNMP-MIB::dskPercent.1 = INTEGER: 10
UCD-SNMP-MIB::dskPercentNode.1 = INTEGER: 3
UCD-SNMP-MIB::dskErrorFlag.1 = INTEGER: noError(0)
UCD-SNMP-MIB::dskErrorMsg.1 = STRING:
|
To find the property in the first place, you can dump all the known
properties by using snmptranslate. By filtering this with
grep we can see the information we want: $
snmptranslate -Ts |grep dsk .
To get a numerical value, use snmptranslate and provide
the name with the -On option (see Listing 9).
Listing 9. Using snmptranslate
$ snmptranslate -On UCD-SNMP-MIB::dskAvail
.1.3.6.1.4.1.2021.9.1.7
|
Running the script, we get a running commentary (and warnings) for the
disk space usage on the specified host. See Listing 10.
Listing 10. Monitoring disk space
automatically
$ perl diskspace-auto.pl tweedledum
tweedledum Ok (50319024)
WARNING: tweedledum has lost 2.67% diskspace)
tweedledum Ok (48976392)
WARNING: tweedledum has lost 1.65% diskspace)
tweedledum Ok (48166292)
tweedledum Ok (48166292)
tweedledum Ok (48166292)
tweedledum Ok (48166292)
|
You can see from this output that we have lost some significant space
out of the space available on this disk on the specified
host. To monitor more hosts, just add more hostnames on the
command line.
Publishing information through an
SNMP agent
The SNMP package includes a daemon, snmpd, which can be
configured to expose a variety of information using the SNMP
protocol. The configuration for the information to be
exposed is controlled using the /etc/snmpd.conf file.
For example, Listing 11 shows the snmpd.conf file on the
host used in the earlier examples in this article.
Listing 11. Sample snmpd.conf file
syslocation serverroom
proc imapd 20 10
disk / 100000
load 5 10 10
|
Each of these lines populates different information. In the example, we
set the location of the machine, and then configure some
specific items to monitor.
The proc section monitors a specific process, shown here
as a monitor for the IMAP daemons for a mail service. The
numbers following the option specify the maximum number of
processes allowed to be running, and the minimum number that
should be running. You can use this to make sure that a
particular service is running, and that you haven't exceeded
capacity that might indicate a fault. When the process count
goes above the MAX value, an SNMP trap is generated.
For the disk, you specify the path to the directory to be
monitored and the minimum size (in kilobytes) that the disk
should have free. Again, an SNMP trap is triggered if the
disk space dips below this value.
Finally, the load information shows the maximum CPU load
for 1, 5, and 15 minutes that should be reported. This is
equivalent to the output of the uptime command that shows
the process loading for these intervals. Like the other
configured limits, a trap is raised when these limits are
exceeded.
Manually setting this information is not difficult, but
also not ideal. A simple menu-based solution, snmpconf, is
available if you want a more straightforward method of
setting the configuration.
Summary
Monitoring your servers and devices is a process that can
be very complex, especially as the number of devices in your
network increases. SNMP is an efficient, and extensible,
method for exposing and reporting this information. Because
the interface is consistent across all the devices, you can
get uptime, network statistics, disk space, and even process
monitoring using the same methods across multiple hosts.
In this article we've looked both at the basics of SNMP
and also how to read specific values from different hosts.
Using the Net::SNMP perl module we have also examined
methods for reading information, using both one-hit and
continual monitoring-based solutions. Finally, we examined
the methods for configuration additional information to be
exposed on a system so that you can customize and monitor
the systems you need for your network when using the snmpd
daemon. |
No comments:
Post a Comment