How To Use the Python Filter Function

Introduction

The Python built-in filter() function can be used to create a new iterator from an existing iterable (like a list or dictionary) that will efficiently filter out elements using a function that we provide. An iterable is a Python object that can be “iterated over”, that is, it will return items in a sequence such that we can use it in a for loop.

The basic syntax for the filter() function is:

filter(function, iterable)

This will return a filter object, which is an iterable. We can use a function like list() to make a list of all the items returned in a filter object.

The filter() function provides a way of filtering values that can often be more efficient than a list comprehension, especially when we’re starting to work with larger data sets. For example, a list comprehension will make a new list, which will increase the run time for that processing. This means that after our list comprehension has completed its expression, we’ll have two lists in memory. However, filter() will make a simple object that holds a reference to the original list, the provided function, and an index of where to go in the original list, which will take up less memory.

In this tutorial, we’ll review four different ways of using filter(): with two different iterable structures, with a lambda function, and with no defined function.

Using filter() with a Function

The first argument to filter() is a function, which we use to decide whether to include or filter out each item. The function is called once for every item in the iterable passed as the second argument and each time it returns False, the value is dropped. As this argument is a function, we can either pass a normal function or we can make use of lambda functions, particularly when the expression is less complex.

Following is the syntax of a lambda with filter():

filter(lambda item: item[] expression, iterable)

With a list, like the following, we can incorporate a lambda function with an expression against which we want to evaluate each item from the list:

creature_names = ['Sammy', 'Ashley', 'Jo', 'Olly', 'Jackie', 'Charlie']

To filter this list to find the names of our aquarium creatures that start with a vowel, we can run the following lambda function:

print(list(filter(lambda x: x[0].lower() in 'aeiou', creature_names)))

Here we declare an item in our list as x. Then we set our expression to access the first character of each string (or character “zero”), so x[0]. Lowering the case of each of the names ensures this will match letters to the string in our expression, 'aeiou'.

Finally we pass the iterable creature_names. Like in the previous section we apply list() to the result in order to create a list from the iterator filter() returns.

The output will be the following:

Output
['Ashley', 'Olly']

This same result can be achieved using a function we define:

creature_names = ['Sammy', 'Ashley', 'Jo', 'Olly', 'Jackie', 'Charlie']

def names_vowels(x):
  return x[0].lower() in 'aeiou'

filtered_names = filter(names_vowels, creature_names)

print(list(filtered_names))

Our function names_vowels defines the expression that we will implement to filter creature_names.

Again, the output would be as follows:

Output
['Ashley', 'Olly']

Overall, lambda functions achieve the same result with filter() as when we use a regular function. The necessity to define a regular function grows as the complexity of expressions for filtering our data increases, which is likely to promote better readability in our code.

Using None with filter()

We can pass None as the first argument to filter() to have the returned iterator filter out any value that Python considers “falsy”. Generally, Python considers anything with a length of 0 (such as an empty list or empty string) or numerically equivalent to 0 as false, thus the use of the term “falsy.”

In the following case we want to filter our list to only show the tank numbers at our aquarium:

aquarium_tanks = [11, False, 18, 21, "", 12, 34, 0, [], {}]

In this code we have a list containing integers, empty sequences, and a boolean value.

filtered_tanks = filter(None, aquarium_tanks)

We use the filter() function with None and pass in the aquarium_tanks list as our iterable. Since we have passed None as the first argument, we will check if the items in our list are considered false.

print(list(filtered_tanks))

Then we wrap filtered_tanks in a list() function so that it returns a list for filtered_tanks when we print.

Here we see the output shows only the integers. All the items that evaluated to False, that are equivalent to 0 in length, were removed by filter():

Output
[11, 25, 18, 21, 12, 34]

Note: If we don’t use list() and print filtered_tanks we would receive a filter object something like this: <filter object at 0x7fafd5903240>. The filter object is an iterable, so we could loop over it with for or we can use list() to turn it into a list, which we’re doing here because it’s a good way to review the results.

With None we have used filter() to quickly remove items from our list that were considered false.

Using filter() with a List of Dictionaries

When we have a more complex data structure, we can still use filter() to evaluate each of the items. For example, if we have a list of dictionaries, not only do we want to iterate over each item in the list — one of the dictionaries — but we may also want to iterate over each key:value pair in a dictionary in order to evaluate all the data.

As an example, let’s say we have a list of each creature in our aquarium along with different details about each of them:

aquarium_creatures = [
  {"name": "sammy", "species": "shark", "tank number": "11", "type": "fish"},
  {"name": "ashley", "species": "crab", "tank number": "25", "type": "shellfish"},
  {"name": "jo", "species": "guppy", "tank number": "18", "type": "fish"},
  {"name": "jackie", "species": "lobster", "tank number": "21", "type": "shellfish"},
  {"name": "charlie", "species": "clownfish", "tank number": "12", "type": "fish"},
  {"name": "olly", "species": "green turtle", "tank number": "34", "type": "turtle"}
]

We want to filter this data by a search string we give to the function. To have filter() access each dictionary and each item in the dictionaries, we construct a nested function, like the following:

def filter_set(aquarium_creatures, search_string):
    def iterator_func(x):
        for v in x.values():
            if search_string in v:
                return True
        return False
    return filter(iterator_func, aquarium_creatures)

We define a filter_set() function that takes aquarium_creatures and search_string as parameters. In filter_set() we pass our iterator_func() as the function to filter(). The filter_set() function will return the iterator resulting from filter().

The iterator_func() takes x as an argument, which represents an item in our list (that is, a single dictionary).

Next the for loop accesses the values in each key:value pair in our dictionaries and then uses a conditional statement to check whether the search_string is in v, representing a value.

Like in our previous examples, if the expression evaluates to True the function adds the item to the filter object. This will return once the filter_set() function has completed. We position return False outside of our loop so that it checks every item in each dictionary, instead of returning after checking the first dictionary alone.

We call filter_set() with our list of dictionaries and the search string we want to find matches for:

filtered_records = filter_set(aquarium_creatures, "2")    

Once the function completes we have our filter object stored in the filtered_records variable, which we turn into a list and print:

print(list(filtered_records))      

We’ll see the following output from this program:

Output
[{'name': 'ashley', 'species': 'crab', 'tank number': '25', 'type': 'shellfish'}, {'name': 'jackie', 'species': 'lobster', 'tank number': '21', 'type': 'shellfish'}, {'name': 'charlie', 'species': 'clownfish', 'tank number': '12', 'type': 'fish'}]

We’ve filtered the list of dictionaries with the search string 2. We can see that the three dictionaries that included a tank number with 2 have been returned. Using our own nested function allowed us to access every item and efficiently check each against the search string.

Conclusion

In this tutorial, we’ve learned the different ways of using the filter() function in Python. Now you can use filter() with your own function, a lambda function, or with None to filter for items in varying complexities of data structures.

Although in this tutorial we printed the results from filter() immediately in list format, it is likely in our programs we would use the returned filter() object and further manipulate the data.

If you would like to learn more Python, check out our How To Code in Python 3 series and our Python topic page.

0 Comments

Creative Commons License