干貨!整理了50個?Pandas?高頻使用技巧,強(qiáng)烈建議收藏!
pandas
當(dāng)中經(jīng)常會被用到的方法,篇幅可能有點(diǎn)長但是提供的都是干貨,讀者朋友們看完之后也可以點(diǎn)贊收藏,相信會對大家有所幫助,大致本文會講述這些內(nèi)容- DataFrame初印象
- 讀取表格型數(shù)據(jù)
- 篩選出特定的行
- 用
pandas
來繪圖 - 在DataFrame中新增行與列
- DataFrame中的統(tǒng)計分析與計算
- DataFrame中排序問題
- 合并多個表格
- 時序問題的處理
- 字符串類型數(shù)據(jù)的處理
DataFrame初印象
我們先來通過Python
當(dāng)中的字典類型來創(chuàng)建一個DataFrame,import?pandas?as?pddata?=?{"Country":?["Canada",?"USA",?"UK"],
????????"Population":?[10.52*10**6,?350.1*10**6,?65.2*10**6]
???????}
df?=?pd.DataFrame(data)
df
當(dāng)你通過
Python
當(dāng)中的字典來創(chuàng)建DataFrame,字典當(dāng)中的keys
會被當(dāng)做是列名,而values
則是表格當(dāng)中的值??Country???Population0??Canada???10520000.0
1?????USA??350100000.0
2??????UK???65200000.0
要是我們要獲取當(dāng)中的某一列,我們可以這么來做df["Country"]
output0????Portugal
1?????????USA
2??????France
Name:?Country,?dtype:?object
而當(dāng)我們想要獲取表格當(dāng)中每一列的數(shù)據(jù)格式的時候,可以這么做df.dtypes
outputCountry????????object
Population????float64
dtype:?object
讀取數(shù)據(jù)
Pandas
當(dāng)中有特定的模塊可以來讀取數(shù)據(jù),要是讀取的文件是csv
格式,我們可以這么來做import?pandas?as?pddf?=?pd.read_csv("titanic.csv")
我們要是想要查看表格的前面幾行,可以這么做df.head(7)
output???PassengerId??Survived??Pclass??...?????Fare?Cabin??Embarked
0????????????1?????????0???????3??...???7.2500???NaN?????????S
1????????????2?????????1???????1??...??71.2833???C85?????????C
2????????????3?????????1???????3??...???7.9250???NaN?????????S
3????????????4?????????1???????1??...??53.1000??C123?????????S
4????????????5?????????0???????3??...???8.0500???NaN?????????S
5????????????6?????????0???????3??...???8.4583???NaN?????????Q
6????????????7?????????0???????1??...??51.8625???E46?????????S
這里我們只是展示了前面7行的數(shù)據(jù),當(dāng)然我們也可以使用
tail()
方法來展示末尾的若干行的數(shù)據(jù)df.tail(7)output?????PassengerId??Survived??Pclass??...????Fare?Cabin??Embarked
884??????????885?????????0???????3??...???7.050???NaN?????????S
885??????????886?????????0???????3??...??29.125???NaN?????????Q
886??????????887?????????0???????2??...??13.000???NaN?????????S
887??????????888?????????1???????1??...??30.000???B42?????????S
888??????????889?????????0???????3??...??23.450???NaN?????????S
889??????????890?????????1???????1??...??30.000??C148?????????C
890??????????891?????????0???????3??...???7.750???NaN?????????Q
要是遇到文件的格式是
excel
格式,pandas
當(dāng)中也有相對應(yīng)的方法df?=?pd.read_excel("titanic.xlsx")可以通過
pandas
當(dāng)中的info()
方法來獲取對表格數(shù)據(jù)的一個初步的印象df.info()output
RangeIndex:?891?entries,?0?to?890
Data?columns?(total?12?columns):
?#???Column???????Non-Null?Count??Dtype??
---??------???????--------------??-----??
?0???PassengerId??891?non-null????int64??
?1???Survived?????891?non-null????int64??
?2???Pclass???????891?non-null????int64??
?3???Name?????????891?non-null????object?
?4???Sex??????????891?non-null????object?
?5???Age??????????714?non-null????float64
?6???SibSp????????891?non-null????int64??
?7???Parch????????891?non-null????int64??
?8???Ticket???????891?non-null????object?
?9???Fare?????????891?non-null????float64
?10??Cabin????????204?non-null????object?
?11??Embarked?????889?non-null????object?
dtypes:?float64(2),?int64(5),?object(5)
memory?usage:?83.7 ?KB
我們可以從上面的信息中看到例如哪些列可能存在一些空值,每一列的數(shù)據(jù)類型,占用內(nèi)存的情況等等。
篩選出特定條件的行
要是我們想要篩選出年齡在30歲以上的乘客,我們可以這么來操作df[df["Age"]?>?30]output?????PassengerId??Survived??Pclass??...?????Fare?Cabin??Embarked
1??????????????2?????????1???????1??...??71.2833???C85?????????C
3??????????????4?????????1???????1??...??53.1000??C123?????????S
4??????????????5?????????0???????3??...???8.0500???NaN?????????S
6??????????????7?????????0???????1??...??51.8625???E46?????????S
11????????????12?????????1???????1??...??26.5500??C103?????????S
..???????????...???????...?????...??...??????...???...???????...
873??????????874?????????0???????3??...???9.0000???NaN?????????S
879??????????880?????????1???????1??...??83.1583???C50?????????C
881??????????882?????????0???????3??...???7.8958???NaN?????????S
885??????????886?????????0???????3??...??29.1250???NaN?????????Q
890??????????891?????????0???????3??...???7.7500???NaN?????????Q
[305?rows?x?12?columns]
當(dāng)然我們也可以將若干個條件合起來,一同做篩選,例如survived_under_45?=?df[(df["Survived"]==1)?