Pandas Eda Cheat Sheet

Posted : admin On 1/3/2022

This PySpark cheat sheet covers the basics, from initializing Spark and loading your data, to retrieving RDD information, sorting, filtering and sampling your data. But that's not all. You'll also see that topics such as repartitioning, iterating, merging, saving your data. Download Cheat Sheet Scikit Learn Download Ebook Machine Learning Bahasa Indonesia Download Buku Machine Learning In Python ( 360 Halaman ) Download Ebook Belajar Pandas dalam 10 menit Download Ebook Data Scince For Busniess ( 300 Halaman ) Download Ebook lengkap, statistik Khusus data scientist Download Ebook Machine Learning ( 400 Halaman ) Mode Easy Download Ebook Python Programming For. Pandas Cheat Sheet is a quick guide through the basics of Pandas that you will need to get started on wrangling your data with Python. If you want to begin your data science journey with Pandas, you can use it as a handy reference to deal with the data easily.

PandasPandas basic eda
  • pandas.DataFrame.shape -- > (row_count, col_count)
  • pandas.DataFrame.shape[0] --> number of records, number of samples in the dataset
  • my_dataframe['my_series_name'].unique() --> returns a unique values of a column, 'radio button choices'
  • dataframe.describe() --> returns summary data
  • len(my_dataframe['my_series_name'].unique()) --> number of unique values
  • import os os.listdir('name_of_directory_or_just_use_.') --> list the files in the current directory '.' os.listdir('.') or a specific directory with a name
  • import os len(os.listdir('.') ) --> returns the number of files in the current directory
  • my_dataframe.groupby(['col_1', 'col_2']) --> groupby column 1 first then groupby column 2
  • Converting a Pandas GroupBy output from Series to DataFrame: .groupby() returns a groupby object with MultiIndex instead of a dataframe with a single index. it is also known as a hierarchical index. Will need to rename columns and reset index my_groupby.add_suffix('_Count').reset_index() or call the .size().reset_index() important to note that .size() is called on the groupby object not the usual dataframe. pandas.core.groupby.GroupBy.size calculates : Series Number of rows in each group
  • group = ['col_1', 'col_2']; my_df.groupby(group).size().reset_index(name='colum_name')
  • df = df[(df.col_name < 1) & (df.col_name_2 < 1)] complex condition query / filter in dataframe
  • pd = pd.query('col_name != 'my_value')
  • .value_count df.column.value_count()
  • pandas cheatsheet
  • .copy()
  • .head()
  • .unique()
  • Another way to use unique pd.unique(df.col_name)
  • df['col_name'].isnull().sum()
  • df['col_name'].min()
  • df['col_name'].max()
  • df.fillna(0) #fill the dataframe with zero the entire table
  • df.reset_index(drop=True, inplace = True)
  • remove target column or any column data.drop(['target'], axis = 1, inplace = True)

Pandas Profiling Eda