pandas

Function

Chapter

Description

pd.DataFrame(data)

Tabular Data and pandas

Create a DataFrame from a two-dimensional array or dictionary data

pd.read_csv(filepath)

Tabular Data and pandas

Import a CSV file from filepath as a pandas DataFrame

pd.DataFrame.head(n=5)
pd.Series.head(n=5)

Tabular Data and pandas

View the first n rows of a DataFrame or Series

pd.DataFrame.index
pd.DataFrame.columns

Tabular Data and pandas

View a DataFrame’s index and column values

pd.DataFrame.describe()
pd.Series.describe()

Exploratory Data Analysis

View descriptive statistics about a DataFrame or Series

pd.Series.unique()

Exploratory Data Analysis

View unique values in a Series

pd.Series.value_counts()

Exploratory Data Analysis

View the number of times each unique value appears in a Series

df[col]

Tabular Data and pandas

From DataFrame df, return column col as a Series

df[[col]]

Tabular Data and pandas

From DataFrame df, return column col as a DataFrame

df.loc[row, col]

Tabular Data and pandas

From DataFrame df, return rows with index name row and column name col; row can alternatively be a boolean Series

df.iloc[row, col]

Tabular Data and pandas

From DataFrame df, return rows with index number row and column number col; row can alternatively be a boolean Series

pd.DataFrame.isnull()
pd.Series.isnull()

Data Cleaning

View missing values in a DataFrame or Series

pd.DataFrame.fillna(value)
pd.Series.fillna(value)

Data Cleaning

Fill in missing values in a DataFrame or Series with value

pd.DataFrame.dropna(axis)
pd.Series.dropna()

Data Cleaning

Drop rows or columns with missing values from a DataFrame or Series

pd.DataFrame.drop(labels, axis)

Data Cleaning

Drop rows or columns named labels from DataFrame along axis

pd.DataFrame.rename()

Data Cleaning

Rename specified rows or column in DataFrame

pd.DataFrame.replace(to_replace, value)

Data Cleaning

Replace to_replace values with value in DataFrame

pd.DataFrame.reset_index(drop=False)

Data Cleaning

Reset a DataFrame’s indices; by default, retains old indices as a new column unless drop=True specified

pd.DataFrame.sort_values(by, ascending=True)

Tabular Data and pandas

Sort a DataFrame by specified columns by, in ascending order by default

pd.DataFrame.groupby(by)

Tabular Data and pandas

Return a GroupBy object that contains a DataFrame grouped by the values in the specified columns by

GroupBy.<function>

Tabular Data and pandas

Apply a function <function> to each group in a GroupBy object GroupBy; e.g. mean(), count()

pd.Series.<function>

Tabular Data and pandas

Apply a function <function> to a Series with numerical values; e.g. mean(), max(), median()

pd.Series.str.<function>

Tabular Data and pandas

Apply a function <function> to a Series with string values; e.g. len(), lower(), split()

pd.Series.dt.<property>

Tabular Data and pandas

Extract a property <property> from a Series with Datetime values; e.g. year, month, date

pd.get_dummies(columns, drop_first=False)

Convert categorical variables columns to dummy variables; default retains all variables unless drop_first=True specified

pd.merge(left, right, how, on)

Exploratory Data Analysis; Databases and SQL

Merge two DataFrames left and right together on specified columns on; type of join depends on how

pd.read_sql(sql, con)

Databases and SQL

Read a SQL query sql on a database connection con, and return result as a pandas DataFrame