sysdig is a brand new system-level exploration and troubleshooting tool that combines the benefits of well-known utilities such as strace, tcpdump, and lsof into one single application. And, as if this were not enough, sysdig also provides the capabilities to save system activity to trace files for later analysis.
In addition, a rich library of scripts (called chisels) is provided along with the installation in order to help you solve common problems or meet monitoring needs, from displaying failed disk I/O operations to finding the files where a given process has spent most time, and everything in between. You can also write your own scripts to enhance sysdig even further according to your needs.
In this article we will first introduce basic sysdig usage, and then explore network analysis with sysdig, including an example of auditing network traffic on a CentOS 7 LAMP server. Please note that the VPS used in the examples has not been placed under significant load, but it is enough for showing the basics of the present auditing tasks.
Before you begin, please make sure you have these prerequisites.
Log in to your server and follow these steps:
Draios is the firm behind sysdig.
Before proceeding with the installation itself, yum will use this key to verify the authenticity of the package you’re about to download.
To manually add the Draios key to your RPM keyring, use the
rpm tool with the
sudo rpm --import https://s3.amazonaws.com/download.draios.com/DRAIOS-GPG-KEY.public
Then, download the Draios repository and configure yum to use it:
sudo curl -s -o /etc/yum.repos.d/draios.repo http://download.draios.com/stable/rpm/draios.repo
Extra Packages for Enterprise Linux (EPEL) is a repository of high-quality free and open-source software maintained by the Fedora project and is 100% compatible with its spinoffs, such as Red Hat Enterprise Linux and CentOS. This repository is needed in order to download the Dynamic Kernel Module Support (DKMS) package, which is needed by sysdig, and to download other dependencies.
sudo yum -y install epel-release
This is needed because sysdig will need to build a customized kernel module (named
sysdig-probe) and use it to operate.
sudo yum -y install kernel-devel-$(uname -r)
Now we can install sysdig.
sudo yum -y install sysdig
For security, it’s best to have a non-root user to run sysdig. Create a custom group for sysdig:
sudo groupadd sysdig
Add one or more users to the group. In our example, we’ll add the user sammy.
sudo usermod -aG sysdig sammy
Locate the binary file for sysdig:
You might receive a response like
Give all members of the sysdig group privileges to run the
sysdig executable (and that binary only). Edit
Add the following lines for the sysdig group in the groups section. Adding the new lines after the
%wheel section is fine. Replace the path with sysdig’s location on your system:
## Same thing without a password # %wheel ALL=(ALL) NOPASSWD: ALL ## sysdig %sysdig ALL= /usr/bin/sysdig
If you need further clarifications on editing the
/etc/sudoers file, it is recommended that you take a look at this article.
You can run sysdig in two modes.
You can view the the real-time stream of server activity live, or you can save records of system operations to a file for later offline analysis.
Since you will most likely want to use the second option, that is what we will cover here. Note that when saving system activity to a file, sysdig takes a full snapshot of the operating system, so that everything that happens on your server during that interval of time will be available for offline analysis.
Note: When you run sysdig commands, please make sure that each option is preceded by a single short dash. Copying and pasting may cause an issue where a single dash is pasted as a long dash and therefore not recognized by the program.
Let’s run a basic sysdig command to capture 1000 lines of server activity.
To capture system activity to a file named
act1.scap, and limit the output to 1000 events, run the following command (omit the
-n 1000 part if you want to run sysdig for an unspecified period of time). The
-z switch is used to enable compression of the trace file.
sudo sysdig -w act1.scap.gz -n 1000 -z
Note: If you omitted the -n switch in the last step, you can interrupt the execution of sysdig by pressing the CTRL + C key combination.
Chisels are sysdig scripts. To display a list of the available chisels, we need to run the following command:
sudo sysdig -cl
In order to audit the network traffic on our CentOS 7 LAMP server using the trace file created by sysdig, we will use the chisels available under the Net category:
Category: Net ------------- iobytes_net Show total network I/O bytes spy_ip Show the data exchanged with the given IP address spy_port Show the data exchanged using the given IP port number topconns top network connections by total bytes topports_server Top TCP/UDP server ports by R+W bytes topprocs_net Top processes by network I/O
Further description of a specific chisel, along with instructions for its use, can be viewed with:
sudo sysdig -i chisel name
sudo sysdig -i spy_ip
Category: Net ------------- spy_ip Show the data exchanged with the given IP address shows the network payloads exchanged with an IP endpoint. You can combine this chisel with the -x, -X or -A sysdig command line switches to customize the screen output Args: [ipv4] host_ip - the remote host IP address [string] disable_color - Set to 'disable_colors' if you want to disable color output
Args section indicates whether you need to pass an argument to the chisel or not. In the case of
spy_ip, you need to pass an IP address as an argument to the chisel.
Let’s walk through a practical example of how to use sysdig to analyze bandwidth use and see detailed information about network traffic.
To get the best results from this test, you will need to set up a dummy web form on your server so appropriate traffic is generated. If this is a server with a fresh LAMP installation, you can make this form at
<body> <form id="loginForm" name="loginForm" method="post" action="login.php"> <table width="300" border="0" align="center" cellpadding="2" cellspacing="0"> <tr> <td width="112"><b>Username:</b></td> <td width="188"><input name="login" type="text" class="textfield" id="login" /></td> </tr> <tr> <td><b>Password:</b></td> <td><input name="pass" type="password" class="textfield" id="pass" /></td> </tr> <tr> <td> </td> <td><br /> <input type="submit" name="Submit" value="Login" /></td></tr> </table> </form> </body>
This isn’t required, but to make everything tidy, you can also create the
<body> <p>Form submitted.</p> </body>
Warning: Please delete this form when you are done testing!
We will starting capturing our log collection of data by issuing the following command:
sudo sysdig -w act1.scap.gz -z -s 4096
Leave sysdig running for a reasonable amount of time. Your command prompt will hang while sysdig runs.
Now, visit your server’s domain or IP address in your web browser. You can visit both existing and non-existing pages to generate some traffic. If you want this specific example to work, you should visit the home page, fill out the login information with anything you like, and submit the login form a few times. In addition, feel free to run queries to your MySQL/MariaDB database as well.
Once you’ve generated some traffic, press CTRL + C to stop sysdig. Then you will be ready to run the analysis queries that we will discuss later in this tutorial.
In a production environment, you could start the sysdig data collection during a busy time on your server.
Before we get into sorting the sysdig data, let’s explain some basic sysdig command elements.
Sysdig provides classes and fields as filters. You can think of classes as objects and fields as properties, following an analogy based on object-oriented programming theory.
You can display the complete list of classes and fields with:
sudo sysdig -l
We will use classes and fields to filter output when analyzing a trace file.
Since we want to audit the network traffic to and from our LAMP server, we will load the trace file
act1.scap.gz and perform the following tests with sysdig:
sudo sysdig -r act1.scap.gz -c topprocs_net
You should see output somewhat like this:
Bytes Process ------------------------------ 331.68KB httpd 24.14KB sshd 4.48KB mysqld
Here you can see that Apache is using the most bandwidth (the
Based on this output, you can make an informed and supported judgment call to decide whether you need to increase your available bandwidth in order to serve your current and future estimated requests. Otherwise, you may want to place appropriate restrictions on the maximum rate of already available bandwidth that can be used by a process.
We may also want to know which IPs are using the network bandwidth consumed by
httpd, as shown in the previous example.
To that purpose, we will use the
topconns chisel (which shows the top network connections by total bytes) and add a filter formed with the class
proc and field
name to filter results to show only
http connections. In other words, the following command:
sudo sysdig -r act1.scap.gz -c topconns proc.name=httpd
This will return the top network connections to your server, including the source, where the process serving the request is
Bytes Proto Conn ------------------------------ 56.24KB tcp 184.108.40.206:12574->your_server_ip:80 51.94KB tcp 220.127.116.11:15249->your_server_ip:80 51.57KB tcp 18.104.22.168:27832->your_server_ip:80 51.26KB tcp 22.214.171.124:42487->your_server_ip:80 48.20KB tcp 126.96.36.199:42483->your_server_ip:80 48.20KB tcp 188.8.131.52:42493->your_server_ip:80 4.17KB tcp 184.108.40.206:13879->your_server_ip:80 3.14KB tcp 220.127.116.11:27873->your_server_ip:80 3.06KB tcp 18.104.22.168:42484->your_server_ip:80 3.06KB tcp 22.214.171.124:42494->your_server_ip:80
Note that the original source and destination IP addresses have been obscured for privacy reasons.
This type of query can help you find top bandwidth users that are sending traffic to your server.
After looking at the output above you may be thinking that the numbers after the source IP addresses represent ports. However, that is not the case. Those numbers indicate the event numbers as recorded by sysdig.
Now we’ll examine the connections between a specific IP address and Apache in more detail.
echo_fds chisel allows us to display the data that was read and written by processes. When combined with a specific process name and a client IP (such as
proc.name=httpd and fd.cip=126.96.36.199 in this case), this chisel will show the data that was exchanged between our LAMP server and that client IP address.
In addition, using the following switches helps us to show results in a more friendly and accurate way:
-s 4096: For each event, read up to 4096 bytes from its buffer (this flag can also be used to specify how many bytes of each data buffer should be saved to disk when saving live data to a trace file for offline analysis)
-A: Print only the text portion of data buffers, and echo end-of-lines (we want to only display human-readable data)
Here’s the command. Be sure to replace
188.8.131.52 with a client IP address from the previous output.
sudo sysdig -r act1.scap.gz -s 4096 -A -c echo_fds fd.cip=184.108.40.206 and proc.name=httpd
You should see quite a bit of output, depending on the number of connections made by that IP address. Here’s an example showing a 404 error:
GET /hi HTTP/1.1 Host: your_server_ip Connection: keep-alive Cache-Control: m ------ Write 426B to 220.127.116.11:39003->your_server_ip:80 HTTP/1.1 404 Not Found Date: Tue, 02 Dec 2014 19:38:16 GMT Server: Apache/2.4.6 (CentOS) PHP/5.4.16 Content-Length: 200 Keep-Alive: timeout=5, max=99 Connection: Keep-Alive Content-Type: text/html; charset=iso-8859-1 <!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\"> <html><head> <title>404 Not Found</title> </head><body> <h1>Not Found</h1> <p>The requested URL /hi was not found on this server.</p> </body></html>
This type of query can help you figure out exactly what kinds of connections were made by a top bandwidth-using IP address. For example, if you found that the IP address was reaching a certain page very frequently, you could make that page’s assets as small as possible, to reduce bandwidth use. Or, if the traffic doesn’t seem to be legitimate, you could create a new firewall rule to block bandwidth-hogging IPs.
Depending on the server activity during the capture interval, the trace file may contain quite a lot of events and information. Thus, going through the results of the command in the previous section by hand could take an impractical amount of time. For that reason, we can look for specific words in event buffers.
Suppose we have a set of web applications running on our web server, and we want to make sure that login credentials are not being passed as plain text through forms.
Let’s add a few flags to the command used in the previous example:
sudo sysdig -r act1.scap.gz -A -c echo_fds fd.ip=18.104.22.168 and proc.name=httpd and evt.is_io_read=true and evt.buffer contains form
Here the class
evt, along with field
is_io_read, allow us to examine only read events (from the server’s point of view). In addition,
evt.buffer allows us to search for a specific word inside the event buffer (the word is
form in this case). You can change the search keyword to one that make sense for your own applications.
The following output shows that a username and password are being passed from the client to the server in plain text (thus becoming readable to anyone with enough expertise):
------ Read 551B from 22.214.171.124:41135->your_server_ip:80 POST /login.php HTTP/1.1 Host: your_server_ip Connection: keep-alive Content-Length: 35 Cache-Control: max-age=0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 Origin: http://126.96.36.199 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36 Content-Type: application/x-www-form-urlencoded Referer: http://188.8.131.52/ Accept-Encoding: gzip,deflate Accept-Language: en-US,en;q=0.8 login=sammy&pass=password&Submit=Login
Should you find a similar security hole, notify your developer team immediately.
What you can accomplish with sysdig in auditing network traffic on a LAMP server is mostly limited by one’s imagination and application requests. We’ve seen how to find top bandwidth users, examine traffic from specific IPs, and sort connections by keywords based on requests from your applications.
Should you have any further questions about the present article, or would like suggestions on how to work with sysdig in your current LAMP environment, feel free to submit your comment using the form below.
If you’ve enjoyed this tutorial and our broader community, consider checking out our DigitalOcean products which can also help you achieve your development goals.