Kaggle learn

<h2 id="880a4122-deba-4683-8b5e-8d9adc76b3c4" data-toc-id="880a4122-deba-4683-8b5e-8d9adc76b3c4">Summary Function</h2><ul><li><p>데이터 셋에 대한 기초적인 통계량을 확인 가능하다.</p></li><li><p>DataFrame 전체에 대해 describe()를 적용시, <u>수치형 타입</u>에 대해서만 기초 통계를 계산한다.</p></li></ul><table style="min-width: 90px"><colgroup><col style="width: 65px"><col style="min-width: 25px"></colgroup><tbody><tr><td colspan="1" rowspan="1" colwidth="65"><p>수치형</p></td><td colspan="1" rowspan="1"><p><strong>count</strong>, <strong>mean</strong>, <strong>std</strong>, <strong>min</strong>, <strong>25%</strong>, <strong>50%</strong>, <strong>75%</strong>, <strong>max</strong></p></td></tr><tr><td colspan="1" rowspan="1" colwidth="65" style="box-sizing: border-box; margin: 0px; padding: 0.25rem 0.5rem; border: 0.5px solid rgb(221, 221, 221); font: inherit; vertical-align: baseline;"><p>문자형</p></td><td colspan="1" rowspan="1" style="box-sizing: border-box; margin: 0px; padding: 0.25rem 0.5rem; border: 0.5px solid rgb(221, 221, 221); font: inherit; vertical-align: baseline;"><p><strong>coun</strong>t, <strong>unique</strong>, <strong>top</strong>, <strong>freq</strong></p></td></tr></tbody></table><pre spellcheck="false"><code>reviews.point.describe()</code></pre><pre spellcheck="false"><code>count    129971.000000
mean         88.447138
             ...      
75%          91.000000
max         100.000000
Name: points, Length: 8, dtype: float64</code></pre><pre spellcheck="false"><code>reviews.taster_name.describe()
</code></pre><pre spellcheck="false"><code>count         103727
unique            19
top       Roger Voss
freq           25514
Name: taster_name, dtype: object</code></pre><h2 id="4cda0f66-09de-499f-859a-7cfdc0283bcd" data-toc-id="4cda0f66-09de-499f-859a-7cfdc0283bcd">Mapping</h2><blockquote><p>해당하는 열의 값들을 다른 값으로 바꾸는 것"</p></blockquote><p>Mapping를 하는 방법에는 map(), apply()가 있는데</p><p>map()의 경우에는 값에 하나씩 접근하고 Series에 사용하기에 column에 사용하기 좋다.</p><p>apply()는 사용자 정의합수를 사용해 DataFrame 전체를 변환할 때 유용하다.</p><h3 id="6207afc5-be2b-489b-9b01-91bdb4ac42e7" data-toc-id="6207afc5-be2b-489b-9b01-91bdb4ac42e7">map()</h3><pre spellcheck="false"><code>a = 3
df.col_neme_1.map(lambda x: x-a)</code></pre><h3 id="b2a9b508-1d62-4190-ad80-0e160c640f5f" data-toc-id="b2a9b508-1d62-4190-ad80-0e160c640f5f">apply()</h3><ul><li><p>axis = 'columns' (or 1): 행 기준&nbsp; 함수 적용</p></li><li><p>axis = 'index' (or 0): 열 기준 함수 적용</p></li></ul><pre spellcheck="false"><code>df.apply(lambda x : x+3, axis= 'columns'</code></pre><p></p><h2 id="682c31ac-7e64-4349-811a-5b365581d823" data-toc-id="682c31ac-7e64-4349-811a-5b365581d823">+Addition</h2><p>값이 가장 클 때의 index 찾는 방법 : <code spellcheck="false">.idxmax()</code></p><pre spellcheck="false"><code>max_idx = df.col_name1.idxmax()
reviews.loc[max_idx, 'col_name2']</code></pre><p></p>

Summary Function데이터 셋에 대한 기초적인 통계량을 확인 가능하다.DataFrame 전체에 대해 describe()를 적용시, 수치형 타입에 대해서만 기초 통계를 계산한다.수치형count, mean, std, min, 25%, 50%, 75%, max문자형count, unique, top, freqreviews.point.describe()co

수치형	count, mean, std, min, 25%, 50%, 75%, max
문자형	count, unique, top, freq

[Pandas] Summary Functions and Maps

Summary Function

Mapping

map()

apply()

+Addition