Python - Pandas Cheat Sheet
Artigo completo em português: https://danielpms.com.br/artigos/python-pandas-guia-rapido-exploracao-dados.html
Pandas is a Python library for data analysis and data manipulation with a lot of tools that help us to work with databases like MySQL, data based on text files like csv, json, html or Excel files.
Free, powerful and with a low learning curve it's a great tool for beginners on Python language and an alternative to people that use Excel as an analysis tool.
To start with Pandas or learn the basics you don't need to install any software, just access an online tool like Google Colab.
My intention in this post is to show a small cheat sheet with basics of Pandas as a kind of beginner's guide or a consultation guide.
Let's go
To load Pandas library
import pandas as pd
Importing data from external source
Import text files // Regular csv separated by commas
pd.read_csv('filename')
// Separated by semicolon
pd.read_csv('filename', sep=';', engine='python')
// CSV in latin encode
pd.read_csv('filename', encoding='iso-8859-1')
// Load just some lines from .csv
pd.read_csv('filename', nrows=integer with the number of rows)
// Load file data with custom column names
pd.read_csv('filename', names=['Column #1','Column #2'], engine='python')
// Load data just from some columns (to save memory)
pd.read_csv('filename', usecols=[0,1,2...])
pd.read_csv('filename', usecols=['Column #1','Column #2'])
// Delimited text file tsv
pd.read_table('filename')
// Json string, json file or URL
pd.read_json('json source')
// Parse HTML as string, file or URL and gets table markups
pd.read_html('html source')
Import data from excel
pd.read_excel('filename')
// Read multiple sheets
file = pd.ExcelFile('filename.xlsx')
df = pd.read_excel(file, 'Sheet Name')
Import data from SQL
pd.read_sql(query, connection_configs)
Exporting data
Exports as csv df.to_csv('filename')
// Export with latin encode
df.to_csv('filename', encoding='iso-8859-1')
// Export CSV without internal index
df.to_csv('filename', index=False)
// Export with semicolon delimiter and header
df.to_csv('filename', sep=';', header='True')
Exports as Excel
df.to_excel('filename')
// For a specific sheet
df.to_excel('path_to_excel_file.xlsx', sheet_name='Sheet Name')
// Without index column
df.to_excel('path_to_excel_file.xlsx', sheet_name='Sheet name', index=False)
Exports as Json
df.to_json('filename')
Exports as HTML table
df.to_html('filename')
Writes to a table into a SQL database
df.to_sql(table, connection_configs)
Data structures
Data Frame
Data Frames (two dimensional data structure) looks like a excel sheet.Creating from a external source
data = pd.read_csv('file_1.csv')
df = pd.DataFrame(data)
Manually created
data = {'name': ['Anne', 'John', 'Beth'], 'age': [23, 18, 86]}
df = pd.DataFrame(data, columns=['name','age'])
Series
Creating series (one dimensional data structure) serie = pd.Series([1,2,'a','b'], index=['one', 'two', 'three', 'four'])
// With auto index
autoSerie = pd.Series(['init', 'my', 'serie'])
Combining, Merging or Joining DataFrames
Standard joins pd.merge(df1, df2, how='left|right|inner|outer' on='index_column_name')
where df1 and df2 are DataFrames, how is the join method and on is the common field into df1 and df2 as reference to join data.Internal Pandas index join
pd.merge(df1, df2, left_index=True, right_index=True, how='left|right|inner|outer')
where left_index and right_index are the Pandas internal indexSelect all rows in df1 DataFrame that have a match em df2 Dataframe
df1[df1['column name'].isin(df2['column name'])]
Select all rows in df1 DataFrame that doesn't match df2 DataFrame rows
df1[~df1['column name'].isin(df1['column name'])]
Columns unique values
Like SQL 'distinct'
Filter DataFrame columns using regular expession
df.drop_duplicates(['column name'])
Show just some columns
Filter DataFrame with specific column names df['column name']
or
df[['column name 1', 'column name 2']]
df.filter(regex='regex expression')
DataFrame Sort
Sort 'column 1' in ascending order and 'column 2' in descending order. Dataframe df was modified with inplace=True. df.sort_values(['column 1','column 2'], ascending=[True,False], inplace=True)
Pandas options
Show all columns from a DataFrame
pd.set_option('display.max_columns', None)
df.head()
Comentários
Postar um comentário