LaVOZs

The World’s Largest Online Community for Developers

'; python - I am trying to webscrape and the results are output into a csv file - LavOzs.Com

I am trying to webscrape using Python and the results are output into a csv file, however, when I run the script i'm getting multiple entries for the same product name. Here is my code -

import bs4
from urllib.request
import urlopen as uReq
from bs4
import BeautifulSoup as soup

my_url = 'https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20card'

# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# html parsing
page_soup = soup(page_html, "html.parser")

# grabs each product
containers = page_soup.findAll("div", {
    "class": "item-container"
})

filename = "products.csv"
f = open(filename, "w")

headers = "product_name, shipping\n"

f.write(headers)


for container in containers:
    container = page_soup.findAll("div", {
        "class": "item-info"
    })
print(container[0].div.a.img["title"])

container = page_soup.findAll("a", {
    "class": "item-title"
})
product_name = container[0].text

container = page_soup.findAll("li", {
    "class": "price-ship"
})
shipping = container[0].text.strip()


print("product_name: " + product_name)
print("shipping: " + shipping)


f.write(product_name.replace(",", "|") + "," + shipping + "\n")

f.close()

To grab various information about the same item, you could use zip() function. For writing CSV file I would recommend using the csv module (doc) - it will handle quoting and delimiters automatically:

from bs4 import BeautifulSoup
import requests
import csv

url = 'https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20card'
soup = BeautifulSoup(requests.get(url).text, 'lxml')

with open('out.csv', 'w', newline='') as csvfile:
    csvwriter = csv.writer(csvfile, delimiter=',',
                            quotechar='"', quoting=csv.QUOTE_MINIMAL)
    csvwriter.writerow(["product_name", "shipping"])
    for product_name, shipping in zip(soup.select('.item-container .item-title'), soup.select('.item-container .price-ship')):
        csvwriter.writerow([product_name.get_text(strip=True), shipping.get_text(strip=True)])

The output of out.csv will be:

product_name,shipping
"EVGA GeForce RTX 2080 Ti XC ULTRA GAMING, 11G-P4-2383-KR, 11GB GDDR6, Dual HDB Fans & RGB LED",Free Shipping
XFX Radeon RX 5700 XT DirectX 12 RX-57XT8MFD6 Video Card,Free Shipping
GIGABYTE GeForce RTX 2060 DirectX 12 GV-N2060GAMINGOC PRO WHITE-6GD Video Card,Free Shipping
"Aorus AD27QD 27"" 144Hz 1440P FreeSync Gaming Monitor + GIGABYTE Radeon RX ...",
ASUS ROG Strix GeForce RTX 2070 DirectX 12 ROG-STRIX-RTX2070-8G-GAMING Video Card,Free Shipping
"EVGA GeForce RTX 2060 SC Ultra GAMING, 06G-P4-2067-KR, 6GB GDDR6, Dual HDB Fans",Free Shipping
PowerColor AMD Radeon RX 5700 XT 8GB GDDR6 AXRX 5700XT 8GBD6-M3DH,Free Shipping
MSI GeForce RTX 2080 DirectX 12 RTX 2080 VENTUS 8G Video Card,Free Shipping
ZOTAC GeForce GTX 1060 DirectX 12 ZT-P10620A-10M Video Card,Free Shipping
ASRock Phantom Gaming X Radeon VII DirectX 12 Radeon VII 16G Video Card,$6.99 Shipping
"Sapphire PULSE Radeon RX 580 8GB GDDR5 PCI-E Dual HDMI / DVI-D / Dual DP OC w/ Backplate (UEFI), 100411P8GOCL",Free Shipping
XFX Radeon RX 590 Fatboy DirectX 12 RX-590P8DFD6 8GB 256-Bit DDR5 PCI Express 3.0 CrossFireX Support Video Card,Free Shipping

Opening this file in LibreOffice:

enter image description here

Related
How do I check whether a file exists without exceptions?
How do I copy a file in Python?
How to flush output of print function?
Why does comparing strings using either '==' or 'is' sometimes produce a different result?
How do I list all files of a directory?
How to read a file line-by-line into a list?
How do you append to a file in Python?
Delete a file or folder
Writing a pandas DataFrame to CSV file
is there a way to print (to console) and run input as Beautiful soup page extension -Python