Python vs SQL

Python vs SQL: Compare Python and SQL commands for DML operations, Group By, Window Functions, and Filters. Learn how SQL subqueries enhance data manipulation and analysis using Python (Pandas, PySpark) and SQL.

Python vs SQL Commands: DML, Group By, Window Functions & Filters

Operation	Python (Pandas)	SQL
Select Columns	`df[['col1', 'col2']]`	`SELECT col1, col2 FROM table_name;`
Filter Rows	`df[df['col1'] > 100]`	`SELECT * FROM table_name WHERE col1 > 100;`
Add a Column	`df['new_col'] = df['col1'] * 2`	`ALTER TABLE table_name ADD new_col INT; UPDATE table_name SET new_col = col1 * 2;`
Drop a Column	`df.drop(columns=['col1'], inplace=True)`	`ALTER TABLE table_name DROP COLUMN col1;`
Rename a Column	`df.rename(columns={'old_name': 'new_name'}, inplace=True)`	`ALTER TABLE table_name RENAME COLUMN old_name TO new_name;`
Update Values	`df.loc[df['col1'] > 100, 'col2'] = 10`	`UPDATE table_name SET col2 = 10 WHERE col1 > 100;`
Delete Rows	`df.drop(df[df['col1'] > 100].index, inplace=True)`	`DELETE FROM table_name WHERE col1 > 100;`
Insert Data	`new_data = pd.DataFrame({'col1': [1,2], 'col2': [3,4]})` `df = pd.concat([df, new_data])`	`INSERT INTO table_name (col1, col2) VALUES (1, 3), (2, 4);`
Group By	`df.groupby('col1').agg({'col2': 'sum'})`	`SELECT col1, SUM(col2) FROM table_name GROUP BY col1;`
Filter After Grouping	`df.groupby('col1').agg({'col2': 'sum'}).loc[lambda x: x['col2'] > 100]`	`SELECT col1, SUM(col2) FROM table_name GROUP BY col1 HAVING SUM(col2) > 100;`
Window Functions (Row Number)	`df['row_num'] = df.groupby('col1').cumcount()`	`SELECT col1, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) AS row_num FROM table_name;`
Window Functions (Rank)	`df['rank'] = df.groupby('col1')['col2'].rank()`	`SELECT col1, RANK() OVER (PARTITION BY col1 ORDER BY col2 DESC) AS rank FROM table_name;`
Window Functions (NTILE)	`df['quartile'] = df.groupby('col1')['col2'].transform(lambda x: pd.qcut(x, 4, labels=False))`	`SELECT col1, NTILE(4) OVER (PARTITION BY col1 ORDER BY col2 DESC) AS quartile FROM table_name;`
Aggregate Functions	`df.groupby('col1').agg({'col2': 'mean'})`	`SELECT col1, AVG(col2) FROM table_name GROUP BY col1;`
Sorting	`df.sort_values(by='col1', ascending=False)`	`SELECT * FROM table_name ORDER BY col1 DESC;`
Distinct Values	`df['col1'].drop_duplicates()`	`SELECT DISTINCT col1 FROM table_name;`
Count	`df['col1'].count()`	`SELECT COUNT(col1) FROM table_name;`
Join (Inner)	`pd.merge(df1, df2, on='col1', how='inner')`	`SELECT * FROM table1 INNER JOIN table2 ON table1.col1 = table2.col1;`
Join (Left)	`pd.merge(df1, df2, on='col1', how='left')`	`SELECT * FROM table1 LEFT JOIN table2 ON table1.col1 = table2.col1;`
Join (Right)	`pd.merge(df1, df2, on='col1', how='right')`	`SELECT * FROM table1 RIGHT JOIN table2 ON table1.col1 = table2.col1;`
Join (Full Outer)	`pd.merge(df1, df2, on='col1', how='outer')`	`SELECT * FROM table1 FULL OUTER JOIN table2 ON table1.col1 = table2.col1;`
Concatenate DataFrames	`pd.concat([df1, df2], ignore_index=True)`	Not directly available in SQL, typically handled with `UNION ALL`.
Union	`df1.append(df2, ignore_index=True)`	`SELECT * FROM table1 UNION ALL SELECT * FROM table2;`

Key Differences

DML Commands:
- Python (Pandas) uses df['column'] syntax for data manipulation, df.drop(), and df.rename() for column operations.
- SQL uses SELECT, UPDATE, INSERT, DELETE, and ALTER TABLE for DML operations.
Group By Operations:
- In Python, groupby() is used with aggregate functions like .sum(), .mean(), or .agg().
- In SQL, GROUP BY is used with SUM(), AVG(), etc., and can include HAVING for post-aggregation filtering.
Window Functions:
- In Python (Pandas), window functions are implemented using .rank(), .cumcount(), and .transform().
- In SQL, window functions such as ROW_NUMBER(), RANK(), and NTILE() are implemented using the OVER() clause with PARTITION BY and ORDER BY.
Joins:
- Both Python and SQL support various join types like inner, left, right, and full outer.
- Python uses pd.merge() to perform joins, while SQL uses the JOIN keyword with appropriate types.

SQL

Concepts

Python vs SQL Commands: DML, Group By, Window Functions & Filters

Key Differences

By Darshini

Try to check similar content

Understanding the Executor Node in Apache Spark

How to allocate driver memory and executor memory in Spark

In-Memory Processing in Apache Spark: An Overview for SEO Optimization

Why foreach() is called an action

Python vs SQL

Python vs SQL Commands: DML, Group By, Window Functions & Filters

Key Differences

By Darshini

Related Post

Date functions

Handling NULL in join condition

REVOKE

Try to check similar content

Understanding the Executor Node in Apache Spark

How to allocate driver memory and executor memory in Spark

In-Memory Processing in Apache Spark: An Overview for SEO Optimization

Why foreach() is called an action