SSH is the de facto method of connecting to a cloud server. It is durable, and it is extensible — as new encryption standards are developed, they can be used to generate new SSH keys, ensuring that the core protocol remains secure. However, no protocol or software stack is totally foolproof, and SSH being so widely deployed across the internet means that it represents a very predictable attack surface or attack vector through which people can try to gain access.
Any service that is exposed to the network is a potential target in this way. If you review the logs for your SSH service running on any widely trafficked server, you will often see repeated, systematic login attempts that represent brute force attacks by users and bots alike. Although you can make some optimizations to your SSH service to reduce the chance of these attacks succeeding to near-zero, such as disabling password authentication in favor of SSH keys, they can still pose a minor, ongoing liability.
Large-scale production deployments for whom this liability is completely unacceptable will usually implement a VPN such as WireGuard in front of their SSH service, so that it is impossible to connect directly to the default SSH port 22 from the outside internet without additional software abstraction or gateways. These VPN solutions are widely trusted, but will add complexity, and can break some automations or other small software hooks.
Prior to or in addition to committing to a full VPN setup, you can implement a tool called Fail2ban. Fail2ban can significantly mitigate brute force attacks by creating rules that automatically alter your firewall configuration to ban specific IPs after a certain number of unsuccessful login attempts. This will allow your server to harden itself against these access attempts without intervention from you.
In another tutorial, we discussed How to protect SSH with Fail2ban. In this guide, we’ll discuss in more depth how Fail2ban actually works and how you can use this knowledge to modify or extend the behavior of this service.
The purpose of Fail2ban is to monitor the logs of common services to spot patterns in authentication failures.
When fail2ban is configured to monitor the logs of a service, it looks at a filter that has been configured specific to that service. The filter is designed to identify authentication failures for that specific service through the use of complex regular expressions. Regular expressions are a common templating language used for pattern matching. It defines these regular expression patterns into an internal variable called
By default, Fail2ban includes filter files for common services. When a log from any service, like a web server, matches the
failregex in its filter, a predefined action is executed for that service. The
action is a variable that can be configured to do many different things, depending on the preferences of the administrator.
The default action is to ban the offending host/IP address by modifying the local firewall rules. You can expand this action to, for example, send an email to your system administrator.
By default, action will be taken when three authentication failures have been detected in 10 minutes, and the default ban time is for 10 minutes. This is configurable.
When using the default
fail2ban creates a new set of firewall rules, also called a chain, when the service is started. It adds a new rule to the INPUT chain that sends all TCP traffic directed at port 22 to the new chain. In the new chain, it inserts a single rule that returns to the INPUT chain. The chain and associated rules are removed if the Fail2ban service is stopped.
Fail2ban is configured through several files located within a hierarchy under the
fail2ban.conf file configures some operational settings like the way the daemon logs info, and the socket and pid file it will use. The main configuration, however, is specified in the files that define the per-application “jails”.
By default, fail2ban ships with a
jail.conf file. However, this can be overwritten in updates, so you should copy this file to a
jail.local file and make adjustments there.
If you already have a
jail.local file, open it using
nano or your favorite text editor:
- sudo nano /etc/fail2ban/jail.local
If you don’t have a
jail.local file already, or the file you opened was blank, copy over the
jail.conf file and then open the new file:
- sudo cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local
- sudo nano /etc/fail2ban/jail.local
We will take a look at the options available here and see how this file interacts with other configuration files on the system.
The first portion of the file will define the defaults for fail2ban policy. These options can be overridden in each individual service’s configuration section.
With the comments removed, the entirety of the default section looks something like this:
ignoreip = 127.0.0.1/8
bantime = 10m
findtime = 10m
maxretry = 3
backend = auto
usedns = warn
destemail = root@localhost
sendername = Fail2Ban
banaction = iptables-multiport
mta = sendmail
protocol = tcp
chain = INPUT
action_ = %(banaction)s[name=%(__name__)s, port="%(port)s", protocol="%(protocol)s", chain="%(chain)s"]
action_mw = %(banaction)s[name=%(__name__)s, port="%(port)s", protocol="%(protocol)s", chain="%(chain)s"]
%(mta)s-whois[name=%(__name__)s, dest="%(destemail)s", protocol="%(protocol)s", chain="%(chain)s", sendername="%(sendername)s"]
action_mwl = %(banaction)s[name=%(__name__)s, port="%(port)s", protocol="%(protocol)s", chain="%(chain)s"]
%(mta)s-whois-lines[name=%(__name__)s, dest="%(destemail)s", logpath=%(logpath)s, chain="%(chain)s", sendername="%(sendername)s"]
action = %(action_)s
Let’s go over what some of this means:
findtime window before a ban is instituted.
auto means that fail2ban will try
gamin, and then a polling algorithm based on what’s available.
inotify is a built-in Linux kernel feature for tracking when files are accessed, and
pyinotify is a Python interface to
inotify, used by Fail2ban.
warn setting will attempt to look up a hostname and ban that way, but will log the activity for review.
iptables-multiport.conf. This handles the actual
iptables firewall manipulation to ban an IP address. We will look at this later.
The rest of the parameters define different actions that can be specified. They pass in some of the parameters that we’ve defined above using variable substitution within text strings like this:
The line above would be replaced with the contents of
var_name. Using this, we can tell that the
action variable is set to the
action_ definition by default (ban only, no mail alerts).
This, in turn, is configured by calling the
iptables-multiport action with a list of parameters (service name, port, protocol, and chain) that is needed to perform the ban. The
__name__ is substituted with the name of the service as specified by the section headers below.
Beneath the default section, there are sections for specific services that can be used to override the default settings. This follows a convention of only modifying the parameters that differ from the normal values (convention over configuration).
Each section header is specified like this:
Any section that has the line
enabled = true will be read and enabled.
Within each section, the parameters are configured, including the filter file that should be used to parse the logs (minus the file extension) and the location of the log files themselves.
Keeping this in mind, the section that specifies the actions for the SSH service looks like this:
enabled = true
port = ssh
filter = sshd
logpath = /var/log/auth.log
maxretry = 6
This enables this section and sets the port to the default “ssh” port (port 22). It tells Fail2ban to look at the log located at
/var/log/auth.log for this section and to parse the log using the filtering mechanisms defined in the
/etc/fail2ban/filters.d directory in a file called
All of the other pieces of information that it needs are taken from the parameters defined in the
[DEFAULT] section. For instance, the action will be set to
action_ which will ban the offending IP address using the
iptables-multiport banaction, which references a file called
iptables-multiport.conf found in
As you can see, the actions in the
[DEFAULT] section should be general and flexible. Using parameter substitution along with parameters that provide sensible defaults will make it possible to override definitions when necessary.
In order to understand what is going on in our configuration, we need to understand the filter and action files, which do the bulk of the work.
The filter file will determine the lines that fail2ban will look for in the log files to identify offending characteristics. The action file implements all of the actions required, from building up a firewall structure when the service starts, to adding and deleting rules, and tearing down the firewall structure when the service stops.
Let’s look at the filter file that our SSH service called for in the configuration above:
- sudo nano /etc/fail2ban/filter.d/sshd.conf
before = common.conf
_daemon = sshd
failregex = ^%(__prefix_line)s(?:error: PAM: )?[aA]uthentication (?:failure|error) for .* from <HOST>( via \S+)?\s*$
^%(__prefix_line)s(?:error: PAM: )?User not known to the underlying authentication module for .* from <HOST>\s*$
^%(__prefix_line)sFailed \S+ for .*? from <HOST>(?: port \d*)?(?: ssh\d*)?(: (ruser .*|(\S+ ID \S+ \(serial \d+\) CA )?\S+ %(__md5hex)s(, client user ".*", client host ".*")?))?\s*$
^%(__prefix_line)sROOT LOGIN REFUSED.* FROM <HOST>\s*$
^%(__prefix_line)s[iI](?:llegal|nvalid) user .* from <HOST>\s*$
^%(__prefix_line)sUser .+ from <HOST> not allowed because not listed in AllowUsers\s*$
^%(__prefix_line)sUser .+ from <HOST> not allowed because listed in DenyUsers\s*$
^%(__prefix_line)sUser .+ from <HOST> not allowed because not in any group\s*$
^%(__prefix_line)srefused connect from \S+ \(<HOST>\)\s*$
^%(__prefix_line)sUser .+ from <HOST> not allowed because a group is listed in DenyGroups\s*$
^%(__prefix_line)sUser .+ from <HOST> not allowed because none of user's groups are listed in AllowGroups\s*$
[INCLUDES] section header specifies other filter files that are read in before or after this file. In our example, the
common.conf file is read in and placed before the other lines in this file. This sets up some parameters that we will be using in our configuration.
Next, we have a
[Definition] section that defines the actual rules for our filter matches. First, we set the name of the daemon we are monitoring by using the
After that, we go through the actual
failregex definition, which sets the patterns that will trigger when a matching line in the log file is found. These are regular expressions that match based on the different errors and failures that can be thrown when a user does not authenticate correctly.
Portions of the line like
%(__prefix_line)s will be substituted with the value of a parameter setup in the
common.conf file that we sourced. This is used to match the different leading information that operating systems write to log files when they use standard methods. For instance, some lines from the
/var/log/auth.log might look something like this:
May 6 18:18:52 localhost sshd: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=126.96.36.199
May 6 18:18:54 localhost sshd: Failed password for invalid user phil from 188.8.131.52 port 38354 ssh2
May 6 18:18:54 localhost sshd: Received disconnect from 184.108.40.206: 11: Bye Bye [preauth]
The highlighted portion is a standard pattern that the operating system inserts to provide more context. After that, there are quite a few different ways that the iptables firewall service writes failure attempts to the log.
We see two separate failures in the first two lines above (a PAM authentication error and a password error). The regular expressions defined in the filter are designed to match any of the possible failure lines. You should not have to adjust any of these lines, but you should be aware of the need to catch all of the log entries that signify an unauthorized use error for the application you are trying to protect if you ever have to create a filter file yourself.
At the bottom, you can see an
ignoreregex parameter, which is currently blank. This can be used to exclude more specific patterns that would typically match a failure condition in case you want to negate the failure trigger for fail2ban for certain scenarios. We won’t be adjusting this.
Save and close the file when you are finished examining it.
Now, let’s take a look at the action file. This file is responsible for setting up the firewall with a structure that allows modifications for banning malicious hosts, and for adding and removing those hosts as necessary.
The action that our SSH service invokes is called
iptables-multiport. Open the associated file now:
- sudo nano /etc/fail2ban/action.d/iptables-multiport.conf
With the comments removed, this file looks something like this:
before = iptables-blocktype.conf
actionstart = iptables -N fail2ban-<name>
iptables -A fail2ban-<name> -j RETURN
iptables -I <chain> -p <protocol> -m multiport --dports <port> -j fail2ban-<name>
actionstop = iptables -D <chain> -p <protocol> -m multiport --dports <port> -j fail2ban-<name>
actioncheck = iptables -n -L <chain> | grep -a 'fail2ban-<name>[ \t]'
actionban = iptables -I fail2ban-<name> 1 -s <ip> -j <blocktype>
actionunban = iptables -D fail2ban-<name> -s <ip> -j <blocktype>
name = default
port = ssh
protocol = tcp
chain = INPUT
The file starts off by sourcing another action file called
iptables-blocktype.conf that defines the
blocktype parameter, which configures the restriction that will be set when a client is banned. By default the
blocktype is set to reject packets and reply to pings sent by banned clients with a rejection message that the port is unreachable. We will use this in our ban rules below.
Next, we get to the rule definitions themselves. The
actionstart action sets up the iptables firewall when the fail2ban service is started. It creates a new chain, adds a rule to that chain to return to the calling chain, and then inserts a rule at the beginning of the INPUT chain that passes traffic matching the correct protocol and port destinations to the new chain.
It does this by using the values we passed in with the
action that we defined in our
jail.local file. The
name is taken from the section header for each service. The
port are taken from the
action line itself in that file.
Here, all of the parameters that are set by the other file are referenced by including the parameter name in angle brackets:
When we move down to the companion
actionstop definition, we can see that the firewall commands are implementing a reversal of the
actionstart commands. When the Fail2ban service stops, it cleanly removes any firewall rules that it added.
Another action called
actioncheck makes sure that the proper chain has been created prior to attempting to add ban rules.
Next, we get to the actual banning rule, called
actionban. This rule works by adding a new rule to our created chain. The rule matches the source IP address of the offending client – this parameter is read in from the authorization logs when the
maxretry limit is reached. It institutes the block defined by the
blocktype parameter that we sourced in the
[INCLUDE] section at the top of the file.
actionunban rule removes this rule. This is done automatically by fail2ban when the ban time has elapsed.
Finally, we get to the
[Init] section. This just provides some defaults in case the action file is called without passing in all of the appropriate values.
Now that we’ve seen the specifics, let’s go over the process that happens when fail2ban starts.
First, the main
fail2ban.conf file is read to determine the conditions that the main process should operate under. It creates the socket, pid, and log files if necessary and begins to use them.
Next, fail2ban reads the
jail.conf file for configuration details. It follows this by reading, in alphabetical order, any files found in the
jail.d directory that end in
.conf. It adds the settings found in these files to its internal configuration, giving new values preference over the values described in the
It then searches for a
jail.local file and repeats this process, adapting the new values. Finally, it searches the
jail.d directory again, reading in alphabetical order files ending in
In our case, we only have a
jail.conf file and a
jail.local file. In our
jail.local file, we only need to define the values that differ from the
jail.conf file. The fail2ban process now has a set of directives loaded into memory that represent a combination of all of the files that it found.
It examines each section and searches for an
enabled = true directive. If it finds one, it uses the parameters defined under that section to build a policy and decide what actions are required. Any parameters that are not found in the service’s section use the parameters defined in the
Fail2ban looks for an
action directive to figure out what action script to call to implement the banning/unbanning policies. If one is not found, it falls back on the default action determined above.
The action directive consists of the name of the action file(s) that will be read, as well as a key-value dictionary that passes the parameters needed by those files. The values of these often take the form of parameter substitutions by referencing the settings configured in the service’s section. The “name” key is usually passed the value of the special
__name__ variable that will be set to the value of the section’s header.
Fail2ban then uses this information to find the associated files in the
action.d directory. It first looks for the associated action file ending in
.conf and then amends the information found there with any settings contained in an accompanying
.local file also found in the
It parses those files to determine the actions that it needs to take. It reads the
actionstart value to see the actions it should take to set up the environment. This often includes creating a firewall structure to accommodate banning rules in the future.
The actions defined in this file use the parameters passed to it from the
action directive. It will use these values to dynamically create the appropriate rules. If a certain variable wasn’t set, it can look at the default values set in the action file to fill in the blanks.
The parameters for the service in the
jail.* files also include the location of the log file as well as the polling mechanism that should be used to check the file (this is defined by the
backend parameter). It also includes a filter that should be used to determine whether a line in the log represents a failure.
Fail2ban looks in the
filter.d directory to find the matching filter file that ends with
.conf. It reads this file to define the patterns that can be used to match offending lines. It then searches for a matching filter file ending with
.local to see if any of the default parameters were overwritten.
It uses the regular expressions defined in these files as it reads the service’s log file. It tries each
failregex line defined in the
filter.d files against every new line written to the service’s log file.
If the regular expression returns a match, it checks the line against the regular expressions defined by the
ignoreregex. If this also matches, fail2ban ignores it. If the line matches an expression in the
failregex but does not match an expression in the
ignoreregex, an internal counter is incremented for the client that caused the line and an associated timestamp is created for the event.
As the window of time set by the
findtime parameter in the
jail.* files is reached (as determined by the event timestamp), the internal counter is decremented again and the event is no longer considered relevant to the banning policy.
If, over the course of time, additional authentication failures are logged, each attempt increments the counter. If the counter reaches the value set by the
maxretry parameter within the configured window of time, fail2ban institutes a ban by calling the
actioncheck action for the service as defined in the
action.d/ files for the service. This is to determine whether the
actionstart action set up the necessary structure. It then calls the
actionban action to ban the offending client. It sets a timestamp for this event as well.
When the amount of time has elapsed that was specified by the
bantime parameter, fail2ban unbans the client by calling the
By now you have a fairly in-depth understanding of how fail2ban operates. When you deviate from the standard configuration, it is helpful to know how fail2ban functions in order to manipulate its behavior in a predictable way.
To learn about how to protect other services with fail2ban, you can read How To Protect an Nginx Server with Fail2Ban on Ubuntu 22.04.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.