Python vs SQL: Compare Python and SQL commands for DML operations, Group By, Window Functions, and Filters. Learn how SQL subqueries enhance data manipulation and analysis using Python (Pandas, PySpark) and SQL.
Python vs SQL Commands: DML, Group By, Window Functions & Filters
Operation | Python (Pandas) | SQL |
---|---|---|
Select Columns | df[['col1', 'col2']] | SELECT col1, col2 FROM table_name; |
Filter Rows | df[df['col1'] > 100] | SELECT * FROM table_name WHERE col1 > 100; |
Add a Column | df['new_col'] = df['col1'] * 2 | ALTER TABLE table_name ADD new_col INT; UPDATE table_name SET new_col = col1 * 2; |
Drop a Column | df.drop(columns=['col1'], inplace=True) | ALTER TABLE table_name DROP COLUMN col1; |
Rename a Column | df.rename(columns={'old_name': 'new_name'}, inplace=True) | ALTER TABLE table_name RENAME COLUMN old_name TO new_name; |
Update Values | df.loc[df['col1'] > 100, 'col2'] = 10 | UPDATE table_name SET col2 = 10 WHERE col1 > 100; |
Delete Rows | df.drop(df[df['col1'] > 100].index, inplace=True) | DELETE FROM table_name WHERE col1 > 100; |
Insert Data | new_data = pd.DataFrame({'col1': [1,2], 'col2': [3,4]}) df = pd.concat([df, new_data]) | INSERT INTO table_name (col1, col2) VALUES (1, 3), (2, 4); |
Group By | df.groupby('col1').agg({'col2': 'sum'}) | SELECT col1, SUM(col2) FROM table_name GROUP BY col1; |
Filter After Grouping | df.groupby('col1').agg({'col2': 'sum'}).loc[lambda x: x['col2'] > 100] | SELECT col1, SUM(col2) FROM table_name GROUP BY col1 HAVING SUM(col2) > 100; |
Window Functions (Row Number) | df['row_num'] = df.groupby('col1').cumcount() | SELECT col1, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) AS row_num FROM table_name; |
Window Functions (Rank) | df['rank'] = df.groupby('col1')['col2'].rank() | SELECT col1, RANK() OVER (PARTITION BY col1 ORDER BY col2 DESC) AS rank FROM table_name; |
Window Functions (NTILE) | df['quartile'] = df.groupby('col1')['col2'].transform(lambda x: pd.qcut(x, 4, labels=False)) | SELECT col1, NTILE(4) OVER (PARTITION BY col1 ORDER BY col2 DESC) AS quartile FROM table_name; |
Aggregate Functions | df.groupby('col1').agg({'col2': 'mean'}) | SELECT col1, AVG(col2) FROM table_name GROUP BY col1; |
Sorting | df.sort_values(by='col1', ascending=False) | SELECT * FROM table_name ORDER BY col1 DESC; |
Distinct Values | df['col1'].drop_duplicates() | SELECT DISTINCT col1 FROM table_name; |
Count | df['col1'].count() | SELECT COUNT(col1) FROM table_name; |
Join (Inner) | pd.merge(df1, df2, on='col1', how='inner') | SELECT * FROM table1 INNER JOIN table2 ON table1.col1 = table2.col1; |
Join (Left) | pd.merge(df1, df2, on='col1', how='left') | SELECT * FROM table1 LEFT JOIN table2 ON table1.col1 = table2.col1; |
Join (Right) | pd.merge(df1, df2, on='col1', how='right') | SELECT * FROM table1 RIGHT JOIN table2 ON table1.col1 = table2.col1; |
Join (Full Outer) | pd.merge(df1, df2, on='col1', how='outer') | SELECT * FROM table1 FULL OUTER JOIN table2 ON table1.col1 = table2.col1; |
Concatenate DataFrames | pd.concat([df1, df2], ignore_index=True) | Not directly available in SQL, typically handled with UNION ALL . |
Union | df1.append(df2, ignore_index=True) | SELECT * FROM table1 UNION ALL SELECT * FROM table2; |
Key Differences
- DML Commands:
- Python (Pandas) uses
df['column']
syntax for data manipulation,df.drop()
, anddf.rename()
for column operations. - SQL uses
SELECT
,UPDATE
,INSERT
,DELETE
, andALTER TABLE
for DML operations.
- Python (Pandas) uses
- Group By Operations:
- In Python,
groupby()
is used with aggregate functions like.sum()
,.mean()
, or.agg()
. - In SQL,
GROUP BY
is used withSUM()
,AVG()
, etc., and can includeHAVING
for post-aggregation filtering.
- In Python,
- Window Functions:
- In Python (Pandas), window functions are implemented using
.rank()
,.cumcount()
, and.transform()
. - In SQL, window functions such as
ROW_NUMBER()
,RANK()
, andNTILE()
are implemented using theOVER()
clause withPARTITION BY
andORDER BY
.
- In Python (Pandas), window functions are implemented using
- Joins:
- Both Python and SQL support various join types like inner, left, right, and full outer.
- Python uses
pd.merge()
to perform joins, while SQL uses theJOIN
keyword with appropriate types.