CSV
Data Format BeginnerWhat is CSV?
CSV (Comma-Separated Values) is a simple file format used to store tabular data, like spreadsheets. Each line represents a row, and commas separate the values (columns). Despite its simplicity, CSV is one of the most portable and widely-supported data formats in existence.
CSV is the humble hero of data formats. Simple, universal, and opens in every spreadsheet application on Earth. When you need to share data with business analysts, load it into a database, or just back it up, CSV has your back.
CSV Best Practices
product_name,price,in_stock,rating
"Wireless Mouse",29.99,true,4.5
"Mechanical Keyboard",149.99,true,4.8
"USB-C Hub",49.99,false,4.2
Handling CSV Edge Cases
1. Commas in Values — Use quotes:
name,description
"Product A","Contains, comma"
2. Multi-line Values — Use quotes:
id,notes
1,"Line one
Line two
Line three"
3. Different Delimiters — Handle semicolons, tabs:
import csv
# Auto-detect delimiter
with open('data.csv') as f:
dialect = csv.Sniffer().sniff(f.read(1024))
f.seek(0)
reader = csv.reader(f, dialect)
Python CSV Mastery
import pandas as pd
# Read with pandas (handles most edge cases)
df = pd.read_csv('products.csv')
print(df.head())
# Write with proper quoting
df.to_csv('output.csv', quoting=csv.QUOTE_ALL, index=False)
Data pipeline tip: Always validate CSV schemas before loading into production databases. A single malformed row can break your entire ETL pipeline.