By Deva004
I am scrapping a website data and want to write that data in two different columns but all data is printing in same single column
This is the code:
from bs4 import BeautifulSoup
from requests_html import HTMLSession
import csv
s = HTMLSession()
url = f'https://everymac.com/systems/apple/iphone/index-iphone-specs.html'
list_data = []
r = s.get(url)
r.html.render(sleep=1)
soup = BeautifulSoup(r.html.html, 'html.parser')
file = open('OutPut.csv', 'w')
writer = csv.writer(file)
writer.writerow(['Product Name', 'Specification'])
products = soup.select('#contentcenter_specs_externalnav_2 a')
specs = soup.select('#contentcenter_specs_internalnav_2 td')
for item in products:
a = item.text
print(a) # want to write this in column 'Product Name'
for i in specs:
b = i.text
print(b) # want to write this in column 'Specification'
writer.writerow([a, b])
file.close()
How can I do that it will be great if you can help me with this
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Heya,
You’re currently doing a nested loop, meaning for each product, you’re writing a row with that product’s name and all of the specifications. It’s likely that’s not what you want, because it duplicates the product name across multiple rows.
It seems that you’re trying to associate each product with a specific specification. If there’s a 1:1 correspondence between products and specifications, you should only need one loop. However, to do that, you need to make sure that the order of products in the products list matches the order of specifications in the specs list. If this isn’t the case, you may need to adjust your selectors or the way you’re scraping the data.
Assuming that each product is associated with a single specification, you could do something like this:
from bs4 import BeautifulSoup
from requests_html import HTMLSession
import csv
s = HTMLSession()
url = f'https://everymac.com/systems/apple/iphone/index-iphone-specs.html'
r = s.get(url)
r.html.render(sleep=1)
soup = BeautifulSoup(r.html.html, 'html.parser')
products = soup.select('#contentcenter_specs_externalnav_2 a')
specs = soup.select('#contentcenter_specs_internalnav_2 td')
with open('OutPut.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(['Product Name', 'Specification'])
for product, spec in zip(products, specs):
a = product.text
print(a) # want to write this in column 'Product Name'
b = spec.text
print(b) # want to write this in column 'Specification'
writer.writerow([a, b])
In the above code, I use the zip() function to iterate over both products and specs at the same time. This assumes that each item in products corresponds to the item at the same index in specs. If this isn’t the case, you’ll need to adjust your selectors or scraping logic accordingly.
Also, note that I’ve added newline='' to open() to ensure that rows are properly written on new lines in the CSV file. Without this, you may end up with extra blank lines between rows when you view your CSV file in certain programs.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.