Polars API are
Simple
Consistent
Grammar
Most of the feature engineering task based on below 7 verbs
7 Verbs Get Most Jobs done
select/slice columns
select
create/transform/assign columns
with_columns
filter/slice/query rows
filter
join/merge other dataframes
join & concat
group dataframe rows
group_by
aggregate groups
agg
sort dataframe
sort
import polars as pl
df = pl.read_csv("StudentsPerformance.csv" )
df.head()
shape: (5, 9)
i64
str
str
str
str
str
i64
i64
i64
1
"female"
"group B"
"bachelor's degree"
"standard"
"none"
72
72
74
2
"female"
"group C"
"some college"
"standard"
"completed"
69
90
88
3
"female"
"group B"
"master's degree"
"standard"
"none"
90
95
93
4
"male"
"group A"
"associate's degree"
"free/reduced"
"none"
47
57
44
5
"male"
"group C"
"some college"
"standard"
"none"
76
78
75
1. Select Columns
df.select(pl.col('gender' )).head()
shape: (5, 1)
str
"female"
"female"
"female"
"male"
"male"
Selecting two or more columns
df.select(pl.col(['gender' , 'math score' ])).head()
shape: (5, 2)
str
i64
"female"
72
"female"
69
"female"
90
"male"
47
"male"
76
Selecting all the columns
df.select(pl.col('*' )).head()
shape: (5, 9)
i64
str
str
str
str
str
i64
i64
i64
1
"female"
"group B"
"bachelor's degree"
"standard"
"none"
72
72
74
2
"female"
"group C"
"some college"
"standard"
"completed"
69
90
88
3
"female"
"group B"
"master's degree"
"standard"
"none"
90
95
93
4
"male"
"group A"
"associate's degree"
"free/reduced"
"none"
47
57
44
5
"male"
"group C"
"some college"
"standard"
"none"
76
78
75
2. Create Columns
Creating a new column “sum” by summing math score
and reading score
df.with_columns(
(pl.col('math score' ) + pl.col('reading score' )).alias('sum' )
).head()
shape: (5, 10)
i64
str
str
str
str
str
i64
i64
i64
i64
1
"female"
"group B"
"bachelor's degree"
"standard"
"none"
72
72
74
144
2
"female"
"group C"
"some college"
"standard"
"completed"
69
90
88
159
3
"female"
"group B"
"master's degree"
"standard"
"none"
90
95
93
185
4
"male"
"group A"
"associate's degree"
"free/reduced"
"none"
47
57
44
104
5
"male"
"group C"
"some college"
"standard"
"none"
76
78
75
154
3. Filter
df.filter (pl.col('gender' )== 'female' ).head()
shape: (5, 9)
i64
str
str
str
str
str
i64
i64
i64
1
"female"
"group B"
"bachelor's degree"
"standard"
"none"
72
72
74
2
"female"
"group C"
"some college"
"standard"
"completed"
69
90
88
3
"female"
"group B"
"master's degree"
"standard"
"none"
90
95
93
6
"female"
"group B"
"associate's degree"
"standard"
"none"
71
83
78
7
"female"
"group B"
"some college"
"standard"
"completed"
88
95
92
df.filter (
(pl.col('gender' )== 'female' ) &
(pl.col('race/ethnicity' )== 'group B' )
).head()
shape: (5, 9)
i64
str
str
str
str
str
i64
i64
i64
1
"female"
"group B"
"bachelor's degree"
"standard"
"none"
72
72
74
3
"female"
"group B"
"master's degree"
"standard"
"none"
90
95
93
6
"female"
"group B"
"associate's degree"
"standard"
"none"
71
83
78
7
"female"
"group B"
"some college"
"standard"
"completed"
88
95
92
10
"female"
"group B"
"high school"
"free/reduced"
"none"
38
60
50
4. Join
df2 = pl.read_csv('LanguageScore.csv' )
df.join(df2, on= "id" ).head()
shape: (5, 10)
i64
str
str
str
str
str
i64
i64
i64
i64
1
"female"
"group B"
"bachelor's degree"
"standard"
"none"
72
72
74
74
2
"female"
"group C"
"some college"
"standard"
"completed"
69
90
88
67
3
"female"
"group B"
"master's degree"
"standard"
"none"
90
95
93
34
4
"male"
"group A"
"associate's degree"
"free/reduced"
"none"
47
57
44
33
5
"male"
"group C"
"some college"
"standard"
"none"
76
78
75
75
Concat
df2 = df2.drop("id" )
pl.concat([df, df2], how= "horizontal" ).head()
shape: (5, 10)
i64
str
str
str
str
str
i64
i64
i64
i64
1
"female"
"group B"
"bachelor's degree"
"standard"
"none"
72
72
74
74
2
"female"
"group C"
"some college"
"standard"
"completed"
69
90
88
67
3
"female"
"group B"
"master's degree"
"standard"
"none"
90
95
93
34
4
"male"
"group A"
"associate's degree"
"free/reduced"
"none"
47
57
44
33
5
"male"
"group C"
"some college"
"standard"
"none"
76
78
75
75
5. Group By
Count total elements for each race/ethnicity
df.group_by('race/ethnicity' ).count()
/var/folders/qd/nnwwkgqd7m11zrq6syq4q8c80000gn/T/ipykernel_26980/1267365750.py:1: DeprecationWarning: `GroupBy.count` is deprecated. It has been renamed to `len`.
df.group_by('race/ethnicity').count()
shape: (5, 2)
str
u32
"group D"
262
"group A"
89
"group C"
319
"group B"
190
"group E"
140
6. Aggregate
average math score
for females and males
df.group_by('gender' ).agg(pl.col('math score' ).mean().alias('mean_score' ))
shape: (2, 2)
str
f64
"female"
63.633205
"male"
68.728216
7. Sort
sort the dataframe by math score
df.sort('math score' ,descending= True ).head()
shape: (5, 9)
i64
str
str
str
str
str
i64
i64
i64
150
"male"
"group E"
"associate's degree"
"free/reduced"
"completed"
100
100
93
452
"female"
"group E"
"some college"
"standard"
"none"
100
92
97
459
"female"
"group E"
"bachelor's degree"
"standard"
"none"
100
100
100
624
"male"
"group A"
"some college"
"standard"
"completed"
100
96
86
626
"male"
"group D"
"some college"
"standard"
"completed"
100
97
99
Back to top