CSV files are used a lot in storing tabular data into a file. We can easily export data from database tables or excel files to CSV files. It’s also easy to read by humans as well as in the program. In this tutorial, we will learn how to parse CSV files in Python.
Parsing a file means reading the data from a file. The file may contain textual data so-called text files, or they may be a spreadsheet.
CSV stands for Comma Separated Files, i.e. data is separated using comma from each other. CSV files are created by the program that handles a large number of data. Data from CSV files can be easily exported in the form of spreadsheet and database as well as imported to be used by other programs. Let’s see how to parse a CSV file. Parsing CSV files in Python is quite easy. Python has an inbuilt CSV library which provides the functionality of both readings and writing the data from and to CSV files. There are a variety of formats available for CSV files in the library which makes data processing user-friendly.
Reading CSV files using the inbuilt Python CSV module.
import csv
with open('university_records.csv', 'r') as csv_file:
reader = csv.reader(csv_file)
for row in reader:
print(row)
Output:
For writing a file, we have to open it in write mode or append mode. Here, we will append the data to the existing CSV file.
import csv
row = ['David', 'MCE', '3', '7.8']
row1 = ['Lisa', 'PIE', '3', '9.1']
row2 = ['Raymond', 'ECE', '2', '8.5']
with open('university_records.csv', 'a') as csv_file:
writer = csv.writer(csv_file)
writer.writerow(row)
writer.writerow(row1)
writer.writerow(row2)
There is one more way to work with CSV files, which is the most popular and more professional, and that is using the pandas library. Pandas is a Python data analysis library. It offers different structures, tools, and operations for working and manipulating given data which is mostly two dimensional or one-dimensional tables.
To work with the CSV file, you need to install pandas. Installing pandas is quite simple, follow the instructions below to install it using PIP.
$ pip install pandas
Python Install Pandas[/caption] [caption id=“attachment_30145” align=“aligncenter” width=“727”]
Once the installation is complete, you are good to go.
You need to know the path where your data file is in your filesystem and what is your current working directory before you can use pandas to import your CSV file data. I suggest keeping your code and the data file in the same directory or folder so that you will not need to specify the path which will save you time and space.
import pandas
result = pandas.read_csv('ign.csv')
print(result)
Output
Writing CSV files using pandas is as simple as reading. The only new term used is DataFrame
. Pandas DataFrame is a two-dimensional, heterogeneous tabular data structure (data is arranged in a tabular fashion in rows and columns. Pandas DataFrame consists of three main components - data, columns, and rows - with a labeled x-axis and y-axis (rows and columns).
from pandas import DataFrame
C = {'Programming language': ['Python', 'Java', 'C++'],
'Designed by': ['Guido van Rossum', 'James Gosling', 'Bjarne Stroustrup'],
'Appeared': ['1991', '1995', '1985'],
'Extension': ['.py', '.java', '.cpp'],
}
df = DataFrame(C, columns=['Programming language', 'Designed by', 'Appeared', 'Extension'])
export_csv = df.to_csv(r'program_lang.csv', index=None, header=True)
Output
We learned to parse a CSV file using built-in CSV module and pandas module. There are many different ways to parse the files, but programmers do not widely use them. Libraries like PlyPlus, PLY, and ANTLR are some of the libraries used for parsing text data. Now you know how to use inbuilt CSV library and powerful pandas module for reading and writing data in CSV format. The codes shown above are very basic and straightforward. It is understandable by anyone familiar with python, so I don’t think there is any need for explanation. However, the manipulation of complex data with empty and ambiguous data entry is not easy. It requires practice and knowledge of various tools in pandas. CSV is the best way of saving and sharing data. Pandas is an excellent alternative to CSV modules. You may find it difficult in the beginning, but it isn’t so hard to learn. With a little bit of practice, you will master it.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.
Sign up for Infrastructure as a Newsletter.
Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Nice tutorial In first example of reading csv, we try to close file and are using with statement too. With will close your resource so you don’t need to.
- Ankit Rana