|’,’,| Pandas |’,’,| in Python….

Lalitha
Analytics Vidhya
Published in
4 min readDec 9, 2020

--

Pandas is a python library, which are used for data analysis, manipulation and cleaning.

Series and data frames are the two data structures in pandas.

Series — Series is a one-dimensional array that can store any type of data,

What is a dataframe… ?

Data frames are two-dimensional arrays that are mutable and has heterogeneous tabular data. DataFrame is a structured API, which can represent data in a tabular form(rows and columns), can hold columns of various types which has labelled acces.

As pandas is most popular python library for data analysis, dataframe is its primary data structure.

Photo by Debbie Molle on Unsplash

Diffrence between series and a dataframe -

  • Series is a data structure in pandas which can only has single list with an index.
  • Dataframe is collection of one or more series ,which are used to analyse data.

Some of the most common uses of pandas are used for stock prediction, statistics etc.

One of the example, to understand about the functionality of API’s for scrapping weather data.

Photo by Hannah Domsic on Unsplash

If data is stored in .csv or excel file, an API is used to send latest weather data and then required data is reformatted depending upon the user request.

Syntax of a dataframe -

class pandas.DataFrame(data=None, index=None, columns=None, dtype =None, copy=False)

Parameters -

data — Data is a ndarray (structured or homogeneous), Iterable, dict, or DataFrame.

Dictionary can be a Series, arrays or list-like objects.

index — Index or array-like

columnsIndex or array-like

Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

dtype — Only a single dtype is allowed.

copy — is a boolean value, default False

Pandas dataframes has one datatype per column.

Different ways of creating Dataframes in Pandas -

  • Using CSV files
  • Using EXCEL files
  • From a dictionary
  • From list of dictionaries
  • From list of tuples
  • From dictionary of narray / lists
  • Using read_html method

(1) Using CSV file -

Comma separated value files are most common file formats for storing data.

This can be done by using the method read_csv() function.

Syntax -

pandas.read_csv(filepath_or_buffer)

Example -

import pandas as pd
weather = pd.read_csv("../input/mount-rainier-weather-and-climbing data/Rainier_Weather.csv")

Output -

(2) Using Excel file -

read_excel() function is used for reading an excel file into a panda dataframe.

File extensions that are supported are xls, xlsx, xlsm, xlsb, odf, ods and odt. Gives an option to read a single sheet or list of sheets.

Syntax -

pandas.read_excel(file_path)

Example -

import pandas as pd
weather = pd.read_excel("weather.xls)

(3) From a dictionary -

Syntax -

classmethod DataFrame.from_dict(data, orient='columns', dtype=None, columns=None)[source]

Example -

weather_data = {
'date': ['10/10/2020', '10/11/2020', '10/12/2020', '10/13/2020', '10/14/2020'],
'temperature': [36, 38, 32, 33, 36],
'wind': [6, 7, 3, 4, 5],
'event': ['Rain', 'Rain', 'sunny', 'Rain', 'sunny'],
'sunrise': ['7:27AM', '7:28AM', '7:26AM', '7:22AM', '7:20AM'],
'sunset': ['6:27PM', '6:287PM', '6:26PM', '6:22PM', '6:20PM']
}
pd.DataFrame.from_dict(weather_data)

Output -

(4) From List of dictionaries -

Example -

weather_data = [
{ 'date' : '10/14/2020' , 'temperature' : '35' , 'wind' : '5' , 'event' : 'Sunny','sunrise': '7:20AM','sunset':'6:20PM'},
{ 'date' : '10/13/2020' , 'temperature' : '38' , 'wind' : '4' , 'event' : 'Rain', 'sunrise': '7:22AM','sunset':'6:22PM'},
{ 'date' : '10/12/2020' , 'temperature' : '32' , 'wind' : '3' , 'event' : 'Sunny','sunrise': '7:26AM','sunset':'6:26PM'},
{ 'date' : '10/11/2020' , 'temperature' : '33' , 'wind' : '7' , 'event' : 'Rain', 'sunrise': '7:28AM','sunset':'6:28PM'},
{ 'date' : '10/10/2020' , 'temperature' : '36' , 'wind' : '6' , 'event' : 'Rain', 'sunrise': '7:27AM','sunset':'6:27PM'}
]
df = pd.DataFrame(weather_data)

Output -

(5) From dic of narrays / lists

Example -

weather_data = {
'date': ['10/14/2020', '10/13/2020', '10/12/2020', '10/11/2020', '10/10/2020'],
'temperature': [35, 38, 32, 33, 36],
'wind': [5, 4, 3, 7, 6],
'event': ['Sunny', 'Rain', 'Sunny', 'Rain', 'Rain'],
'sunrise': ['7:20AM', '7:22AM', '7:26AM', '7:28AM', '7:27AM'],
'sunset': ['6:20PM', '6:22PM', '6:26PM', '6:28PM', '6:27PM']
}
df = pd.DataFrame(weather_data)

Output -

Pandas Operations -

Some of the operations that can be performed are string processing,applying functions to data and histogramming

df.mean()

df.apply()

String operations -

Example -

a = pd.Series([‘abc’,’xyz’, ‘zzz’])

a.str.upper() — changes all the column names to uppercase

a.str.lower() — changes all the column names to uppercase

Merging dataframes -

Merging of two dataframes can be done by,

  • Concat()
  • join()
  • merge()
  • groupby()

Thanks for reading…

--

--