Python vs SQL: Compare Python and SQL commands for DML operations, Group By, Window Functions, and Filters. Learn how SQL subqueries enhance data manipulation and analysis using Python (Pandas, PySpark) and SQL.

Python vs SQL Commands: DML, Group By, Window Functions & Filters

OperationPython (Pandas)SQL
Select Columnsdf[['col1', 'col2']]SELECT col1, col2 FROM table_name;
Filter Rowsdf[df['col1'] > 100]SELECT * FROM table_name WHERE col1 > 100;
Add a Columndf['new_col'] = df['col1'] * 2ALTER TABLE table_name ADD new_col INT; UPDATE table_name SET new_col = col1 * 2;
Drop a Columndf.drop(columns=['col1'], inplace=True)ALTER TABLE table_name DROP COLUMN col1;
Rename a Columndf.rename(columns={'old_name': 'new_name'}, inplace=True)ALTER TABLE table_name RENAME COLUMN old_name TO new_name;
Update Valuesdf.loc[df['col1'] > 100, 'col2'] = 10UPDATE table_name SET col2 = 10 WHERE col1 > 100;
Delete Rowsdf.drop(df[df['col1'] > 100].index, inplace=True)DELETE FROM table_name WHERE col1 > 100;
Insert Datanew_data = pd.DataFrame({'col1': [1,2], 'col2': [3,4]})
df = pd.concat([df, new_data])
INSERT INTO table_name (col1, col2) VALUES (1, 3), (2, 4);
Group Bydf.groupby('col1').agg({'col2': 'sum'})SELECT col1, SUM(col2) FROM table_name GROUP BY col1;
Filter After Groupingdf.groupby('col1').agg({'col2': 'sum'}).loc[lambda x: x['col2'] > 100]SELECT col1, SUM(col2) FROM table_name GROUP BY col1 HAVING SUM(col2) > 100;
Window Functions (Row Number)df['row_num'] = df.groupby('col1').cumcount()SELECT col1, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) AS row_num FROM table_name;
Window Functions (Rank)df['rank'] = df.groupby('col1')['col2'].rank()SELECT col1, RANK() OVER (PARTITION BY col1 ORDER BY col2 DESC) AS rank FROM table_name;
Window Functions (NTILE)df['quartile'] = df.groupby('col1')['col2'].transform(lambda x: pd.qcut(x, 4, labels=False))SELECT col1, NTILE(4) OVER (PARTITION BY col1 ORDER BY col2 DESC) AS quartile FROM table_name;
Aggregate Functionsdf.groupby('col1').agg({'col2': 'mean'})SELECT col1, AVG(col2) FROM table_name GROUP BY col1;
Sortingdf.sort_values(by='col1', ascending=False)SELECT * FROM table_name ORDER BY col1 DESC;
Distinct Valuesdf['col1'].drop_duplicates()SELECT DISTINCT col1 FROM table_name;
Countdf['col1'].count()SELECT COUNT(col1) FROM table_name;
Join (Inner)pd.merge(df1, df2, on='col1', how='inner')SELECT * FROM table1 INNER JOIN table2 ON table1.col1 = table2.col1;
Join (Left)pd.merge(df1, df2, on='col1', how='left')SELECT * FROM table1 LEFT JOIN table2 ON table1.col1 = table2.col1;
Join (Right)pd.merge(df1, df2, on='col1', how='right')SELECT * FROM table1 RIGHT JOIN table2 ON table1.col1 = table2.col1;
Join (Full Outer)pd.merge(df1, df2, on='col1', how='outer')SELECT * FROM table1 FULL OUTER JOIN table2 ON table1.col1 = table2.col1;
Concatenate DataFramespd.concat([df1, df2], ignore_index=True)Not directly available in SQL, typically handled with UNION ALL.
Uniondf1.append(df2, ignore_index=True)SELECT * FROM table1 UNION ALL SELECT * FROM table2;

Key Differences

  1. DML Commands:
    • Python (Pandas) uses df['column'] syntax for data manipulation, df.drop(), and df.rename() for column operations.
    • SQL uses SELECT, UPDATE, INSERT, DELETE, and ALTER TABLE for DML operations.
  2. Group By Operations:
    • In Python, groupby() is used with aggregate functions like .sum(), .mean(), or .agg().
    • In SQL, GROUP BY is used with SUM(), AVG(), etc., and can include HAVING for post-aggregation filtering.
  3. Window Functions:
    • In Python (Pandas), window functions are implemented using .rank(), .cumcount(), and .transform().
    • In SQL, window functions such as ROW_NUMBER(), RANK(), and NTILE() are implemented using the OVER() clause with PARTITION BY and ORDER BY.
  4. Joins:
    • Both Python and SQL support various join types like inner, left, right, and full outer.
    • Python uses pd.merge() to perform joins, while SQL uses the JOIN keyword with appropriate types.

SQL