Skip to content
Incomplete sheet

This sheet is incomplete and could use some attention. Please submit code snippet suggestions as an issue or PR here.

Pandas Series

Opinionated words of caution:

  • Types are (currently?) a mess, and autocasting does not make this any better.
  • Specify dtype when defining series to avoid surprises further down the pipeline.
  • Missing values are a complete mess: None/Null/NA/NaN are all used interchangeably, despite the existence of is.na and is.null implying otherwise.
  • The representation of missing values differs per type (float uses nan even for NA inputs, object uses NA), which makes the whole thing even more confusing.
  • This seems to be fixed with the new Int64 and Float64 types, although this won't help in practice as autocasting still uses the old types.
  • Avoid indexes unless you have a good reason. Consider using a DataFrame instead.
  • The in operator, counterintuitively, works on the index, not the values.

Code

import pandas as pd

Create

Action Code Details
From DataFrame column s
data[s]

Create empty series

Action Code Details
Empty series (of object type)
pd.Series()
Empty boolean series
pd.Series(dtype=bool)
Empty categorical series with pre-defined categories
pd.Categorical([], categories=['a', 'b', 'c'])
Empty categorical series without defined categories
pd.Series(dtype='category')
Empty int series
pd.Series(dtype=int)
Empty float series
pd.Series(dtype=float)
Empty datetime series
pd.Series(dtype='datetime64[ns]')

Create series of constant values

Action Code Details
Series filled with NAs of length n (of object type)
pd.Series([None] * n)
Constant value v of length n
pd.Series(v, index=range(n))

Create series from a list of values

Action Code Details
Object series from a generic list of values
pd.Series([1, None, 'a'])
Int series from a list of integers
pd.Series([1, '2', 3], dtype='int')
Nullable int series from a list of integers
pd.Series([1, None, 3], dtype = 'Int64')
Preserves None as
Nullable int series filled with NA of length n
pd.Series([None] * 3, dtype='Int64')
Float series from list of numbers
pd.Series([1, None, 3], dtype='float')
None is converted to NaN!
Nullable float series from a list of numbers
pd.Series([1, None, 3.5], dtype = 'Float64')
None, NA and NaN are all set to
Nullable float series filled with NA of length n
pd.Series([None] * 3, dtype='Int64')
Categorical series from list of strings
pd.Categorical(['b', 'b', 'a'], categories=['a', 'b', 'c'])
Categorical series filled with NA of length n
pd.Categorical([None] * n, categories=['a', 'b', 'c'])

Test

Action Code Details
Is series or subclass
isinstance(x, pd.Series)
Is series and not subclass
type(x) is pd.Series
Empty
x.empty
Not empty
not x.empty
Has length n
len(x) == n
Is boolean series
pd.api.types.is_bool_dtype(x)
Is categorical series
isinstance(x.dtype, pd.CategoricalDtype)
Is ordered categorical series
isinstance(x.dtype, pd.CategoricalDtype) and x.ordered
Is numeric series
pd.api.types.is_numeric_dtype(x)
For example, int or float
Is integer series
pd.api.types.is_integer_dtype(x)
For example, int or Int64
Is unsigned integer series
pd.api.types.is_unsigned_integer_dtype(x)
Is float series
pd.api.types.is_float_dtype(x)
For example, float or Float64
Is datetime64 series
pd.api.types.is_datetime64_dtype(x)
Is datetime64[ns] series
pd.api.types.is_datetime64_ns_dtype(x)
Is string series
pd.api.types.is_string_dtype(x)
Is object series
pd.api.types.is_object_dtype(x)
Is hashable series
pd.api.types.is_hashable(x)
No duplicate elements (all unique)
x.is_unique
Any duplicate elements
not x.is_unique
Contains NA
x.hasnans
Contains only NA
x.isna().all()
Contains no NA
x.notna().all()
Contains value v, ignoring NAs
x.isin([v]).any()
WARNING: `v in x' tests the indices instead!
Contains value v, ignoring NAs
any(x == v)
Contains any of the values v1, v2, ignoring NAs
x.isin([v1, v2]).any()
Does not contain value v
all(x != v)
Are elements in increasing order
x.is_monotonic_increasing
Are elements in decreasing order
x.is_monotonic_decreasing

Test boolean series

Action Code Details
All values are True
x.all()
All values are False
not x.any()

Tests for string series

Action Code Details
Contains string s
x.str.contains(s).any()

Assertions

Action Code Details
Assert series equal
pd.testing.assert_series_equal(x, y)
Assert series equal, ignoring the names
pd.testing.assert_series_equal(x, y, check_names=False)

Extract

Action Code Details
Number of elements
x.size
Number of elements
len(x)
Hash
hash(x.values.tobytes())
Dtype
x.dtype
Number of unique elements, ignoring NAs
x.nunique()
Smallest value, ignoring NAs
x.min()
Index of the smallest value, ignoring NAs
x.idxmin()
Greatest value, ignoring NAs
x.max()
Index of the greatest value, ignoring NAs
x.idxmax()
Count occurrence per value
x.value_counts()

Dtype-specific operations

Action Code Details
Get length of each list element
x.list.len()
Get the _i_th item of each list element
x.list[i]

Update

Warning: updates may change the dtype of the series!

Action Code Details
Set element at index i to NA
x[i] = pd.NA
Set element at index i to value v
x[i] = v

Derive

Cast

Action Code Details
Cast series to numeric dtype
pd.to_numeric(x)

Order

Action Code Details
Reverse
x[::-1]
Sort ascendingly
x.sort_values()
Nulls are placed last
Sort descendingly
x.sort_values(ascending=False)
Nulls are placed last
Shuffle
x.sample(frac=1)

Transform

Action Code Details
Fill NA by value_v_
x.fillna(value=v)
Replace value v by NA
x.replace(V, pd.NA)
May cast series to another type if nulls are not supported (example: int)

Mask

Action Code Details
Duplicate mask
x.duplicated()

Cast to type

Action Code Details
Cast to boolean
x.astype(bool)
Cast to integer
x.astype(int)
Cast to float
x.astype(float)
Cast to string
x.astype(str)
Cast to categorical (nominal)
x.astype('category')
Cast to categorical (nominal) with given categories
pd.Categorical(x, categories=['a', 'b', 'c'])
Cast to ordered categorical (ordinal)
pd.Categorical(x, ordered=True)
Cast categorical to integer codes
pd.Series(z.codes)
Cast ordered categorical to unordered categorical
x.as_unordered()

Grow

Shrink

Action Code Details
First n elements
x.head(n)
Last n elements
x.tail(n)
Slice
x[a:b]
Sample n elements
x.sample(n)
Remove duplicates
x.drop_duplicates()

Convert

Action Code Details
To list
x.tolist()
To list
list(x)
To set of unique values
set(x)
NA is included only once, nan is only included once
To DataFrame (single column)
x.to_frame()
To Numpy ND array
x.to_numpy()
To dict (index-value pairs)
x.to_dict()
To JSON index-value array
x.to_json()