-
EDA) 서울시 범죄현황 시각화EDA 2023. 3. 10. 22:17
02. Analysis Seoul Crime 02. Analysis Seoul Crime¶
1. 프로젝트 개요¶
2. 데이터 개요¶
In [234]:import numpy as np import pandas as pd import openpyxl
In [235]:# 데이터 읽기 crime_raw_data = pd.read_csv("../data/02. crime_in_Seoul.csv", thousands = ",", encoding="euc-kr") #thousands - 숫자값을 문자로 인식할 수 있어서 설정 crime_raw_data.head()
Out[235]:구분 죄종 발생검거 건수 0 중부 살인 발생 2.0 1 중부 살인 검거 2.0 2 중부 강도 발생 3.0 3 중부 강도 검거 3.0 4 중부 강간 발생 141.0 In [236]:crime_raw_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 65534 entries, 0 to 65533 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 구분 310 non-null object 1 죄종 310 non-null object 2 발생검거 310 non-null object 3 건수 310 non-null float64 dtypes: float64(1), object(3) memory usage: 2.0+ MB
- info(): 데이터 개요 확인
- RangeIndex가 65534인데, 310개이다
In [237]:crime_raw_data["죄종"].unique()
Out[237]:array(['살인', '강도', '강간', '절도', '폭력', nan], dtype=object)
- 특정 컬럼에서 unique 조사: nan 값이 들어가 있는 것 확인
In [238]:crime_raw_data[crime_raw_data["죄종"].isnull()]
Out[238]:구분 죄종 발생검거 건수 310 NaN NaN NaN NaN 311 NaN NaN NaN NaN 312 NaN NaN NaN NaN 313 NaN NaN NaN NaN 314 NaN NaN NaN NaN ... ... ... ... ... 65529 NaN NaN NaN NaN 65530 NaN NaN NaN NaN 65531 NaN NaN NaN NaN 65532 NaN NaN NaN NaN 65533 NaN NaN NaN NaN 65224 rows × 4 columns
In [239]:crime_raw_data = crime_raw_data[crime_raw_data["죄종"].notnull()]
In [240]:crime_raw_data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 310 entries, 0 to 309 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 구분 310 non-null object 1 죄종 310 non-null object 2 발생검거 310 non-null object 3 건수 310 non-null float64 dtypes: float64(1), object(3) memory usage: 12.1+ KB
Pandas pivot table¶
- index, columns, values, aggfunc
In [241]:df = pd.read_excel("../data/02. sales-funnel.xlsx") df.head()
Out[241]:Account Name Rep Manager Product Quantity Price Status 0 714466 Trantow-Barrows Craig Booker Debra Henley CPU 1 30000 presented 1 714466 Trantow-Barrows Craig Booker Debra Henley Software 1 10000 presented 2 714466 Trantow-Barrows Craig Booker Debra Henley Maintenance 2 5000 pending 3 737550 Fritsch, Russel and Anderson Craig Booker Debra Henley CPU 1 35000 declined 4 146832 Kiehn-Spinka Daniel Hilton Debra Henley CPU 2 65000 won In [242]:# Name 컬럼을 인덱스로 설정 pd.pivot_table(df, index="Name") df.pivot_table(index="Name")
C:\Users\admin\AppData\Local\Temp\ipykernel_9092\964206776.py:2: FutureWarning: pivot_table dropped a column because it failed to aggregate. This behavior is deprecated and will raise in a future version of pandas. Select only the columns that can be aggregated. pd.pivot_table(df, index="Name") C:\Users\admin\AppData\Local\Temp\ipykernel_9092\964206776.py:3: FutureWarning: pivot_table dropped a column because it failed to aggregate. This behavior is deprecated and will raise in a future version of pandas. Select only the columns that can be aggregated. df.pivot_table(index="Name")
Out[242]:Account Price Quantity Name Barton LLC 740150 35000 1.000000 Fritsch, Russel and Anderson 737550 35000 1.000000 Herman LLC 141962 65000 2.000000 Jerde-Hilpert 412290 5000 2.000000 Kassulke, Ondricka and Metz 307599 7000 3.000000 Keeling LLC 688981 100000 5.000000 Kiehn-Spinka 146832 65000 2.000000 Koepp Ltd 729833 35000 2.000000 Kulas Inc 218895 25000 1.500000 Purdy-Kunde 163416 30000 1.000000 Stokes LLC 239344 7500 1.000000 Trantow-Barrows 714466 15000 1.333333 In [243]:#멀티 인덱스 설정 df.pivot_table(index=["Name","Rep","Manager"])
C:\Users\admin\AppData\Local\Temp\ipykernel_9092\3928218144.py:2: FutureWarning: pivot_table dropped a column because it failed to aggregate. This behavior is deprecated and will raise in a future version of pandas. Select only the columns that can be aggregated. df.pivot_table(index=["Name","Rep","Manager"])
Out[243]:Account Price Quantity Name Rep Manager Barton LLC John Smith Debra Henley 740150 35000 1.000000 Fritsch, Russel and Anderson Craig Booker Debra Henley 737550 35000 1.000000 Herman LLC Cedric Moss Fred Anderson 141962 65000 2.000000 Jerde-Hilpert John Smith Debra Henley 412290 5000 2.000000 Kassulke, Ondricka and Metz Wendy Yule Fred Anderson 307599 7000 3.000000 Keeling LLC Wendy Yule Fred Anderson 688981 100000 5.000000 Kiehn-Spinka Daniel Hilton Debra Henley 146832 65000 2.000000 Koepp Ltd Wendy Yule Fred Anderson 729833 35000 2.000000 Kulas Inc Daniel Hilton Debra Henley 218895 25000 1.500000 Purdy-Kunde Cedric Moss Fred Anderson 163416 30000 1.000000 Stokes LLC Cedric Moss Fred Anderson 239344 7500 1.000000 Trantow-Barrows Craig Booker Debra Henley 714466 15000 1.333333 values 설정¶
In [244]:df.head()
Out[244]:Account Name Rep Manager Product Quantity Price Status 0 714466 Trantow-Barrows Craig Booker Debra Henley CPU 1 30000 presented 1 714466 Trantow-Barrows Craig Booker Debra Henley Software 1 10000 presented 2 714466 Trantow-Barrows Craig Booker Debra Henley Maintenance 2 5000 pending 3 737550 Fritsch, Russel and Anderson Craig Booker Debra Henley CPU 1 35000 declined 4 146832 Kiehn-Spinka Daniel Hilton Debra Henley CPU 2 65000 won In [245]:df.pivot_table(index=["Manager","Rep"], values="Price")
Out[245]:Price Manager Rep Debra Henley Craig Booker 20000.000000 Daniel Hilton 38333.333333 John Smith 20000.000000 Fred Anderson Cedric Moss 27500.000000 Wendy Yule 44250.000000 In [246]:#price 칼럼에 sum연산 적용 df.pivot_table(index=["Manager","Rep"], values="Price",aggfunc=np.sum)
Out[246]:Price Manager Rep Debra Henley Craig Booker 80000 Daniel Hilton 115000 John Smith 40000 Fred Anderson Cedric Moss 110000 Wendy Yule 177000 In [247]:df.pivot_table(index=["Manager","Rep"], values="Price",aggfunc=[np.sum,len])
Out[247]:sum len Price Price Manager Rep Debra Henley Craig Booker 80000 4 Daniel Hilton 115000 3 John Smith 40000 2 Fred Anderson Cedric Moss 110000 4 Wendy Yule 177000 4 columns 설정¶
In [248]:df.head()
Out[248]:Account Name Rep Manager Product Quantity Price Status 0 714466 Trantow-Barrows Craig Booker Debra Henley CPU 1 30000 presented 1 714466 Trantow-Barrows Craig Booker Debra Henley Software 1 10000 presented 2 714466 Trantow-Barrows Craig Booker Debra Henley Maintenance 2 5000 pending 3 737550 Fritsch, Russel and Anderson Craig Booker Debra Henley CPU 1 35000 declined 4 146832 Kiehn-Spinka Daniel Hilton Debra Henley CPU 2 65000 won In [249]:# product를 컬럼으로 지정 df.pivot_table(index=["Manager","Rep"], values="Price",columns="Product",aggfunc=[np.sum,len])
Out[249]:sum len Product CPU Maintenance Monitor Software CPU Maintenance Monitor Software Manager Rep Debra Henley Craig Booker 65000.0 5000.0 NaN 10000.0 2.0 1.0 NaN 1.0 Daniel Hilton 105000.0 NaN NaN 10000.0 2.0 NaN NaN 1.0 John Smith 35000.0 5000.0 NaN NaN 1.0 1.0 NaN NaN Fred Anderson Cedric Moss 95000.0 5000.0 NaN 10000.0 2.0 1.0 NaN 1.0 Wendy Yule 165000.0 7000.0 5000.0 NaN 2.0 1.0 1.0 NaN In [250]:# Nan값 설정: fill_value df.pivot_table(index=["Manager","Rep"], values="Price",columns="Product",aggfunc=[np.sum,len],fill_value=0)
Out[250]:sum len Product CPU Maintenance Monitor Software CPU Maintenance Monitor Software Manager Rep Debra Henley Craig Booker 65000 5000 0 10000 2 1 0 1 Daniel Hilton 105000 0 0 10000 2 0 0 1 John Smith 35000 5000 0 0 1 1 0 0 Fred Anderson Cedric Moss 95000 5000 0 10000 2 1 0 1 Wendy Yule 165000 7000 5000 0 2 1 1 0 In [251]:#2개 이상 index, values 설정 df.pivot_table(index=["Manager","Rep","Product"],values=["Price","Quantity"],aggfunc=np.sum,fill_value=0)
Out[251]:Price Quantity Manager Rep Product Debra Henley Craig Booker CPU 65000 2 Maintenance 5000 2 Software 10000 1 Daniel Hilton CPU 105000 4 Software 10000 1 John Smith CPU 35000 1 Maintenance 5000 2 Fred Anderson Cedric Moss CPU 95000 3 Maintenance 5000 1 Software 10000 1 Wendy Yule CPU 165000 7 Maintenance 7000 3 Monitor 5000 2 In [252]:#aggfunc 2개 이상 설정 df.pivot_table( index=["Manager","Rep"], values=["Price","Quantity"], columns="Product", aggfunc=[np.sum,np.mean], fill_value=0, margins = True) #총계(All) 추가
Out[252]:sum mean Price Quantity Price Quantity Product CPU Maintenance Monitor Software All CPU Maintenance Monitor Software All CPU Maintenance Monitor Software All CPU Maintenance Monitor Software All Manager Rep Debra Henley Craig Booker 65000 5000 0 10000 80000 2 2 0 1 5 32500.000000 5000 0 10000 20000.000000 1.000000 2 0 1 1.250000 Daniel Hilton 105000 0 0 10000 115000 4 0 0 1 5 52500.000000 0 0 10000 38333.333333 2.000000 0 0 1 1.666667 John Smith 35000 5000 0 0 40000 1 2 0 0 3 35000.000000 5000 0 0 20000.000000 1.000000 2 0 0 1.500000 Fred Anderson Cedric Moss 95000 5000 0 10000 110000 3 1 0 1 5 47500.000000 5000 0 10000 27500.000000 1.500000 1 0 1 1.250000 Wendy Yule 165000 7000 5000 0 177000 7 3 2 0 12 82500.000000 7000 5000 0 44250.000000 3.500000 3 2 0 3.000000 All 465000 22000 5000 30000 522000 17 8 2 3 30 51666.666667 5500 5000 10000 30705.882353 1.888889 2 2 1 1.764706
3. 서울시 범죄 현황 데이터 정리¶
In [253]:crime_raw_data.head()
Out[253]:구분 죄종 발생검거 건수 0 중부 살인 발생 2.0 1 중부 살인 검거 2.0 2 중부 강도 발생 3.0 3 중부 강도 검거 3.0 4 중부 강간 발생 141.0 In [254]:crime_station = crime_raw_data.pivot_table(index="구분", columns = ["죄종","발생검거"],aggfunc=[np.sum]) crime_station.head()
Out[254]:sum 건수 죄종 강간 강도 살인 절도 폭력 발생검거 검거 발생 검거 발생 검거 발생 검거 발생 검거 발생 구분 강남 269.0 339.0 26.0 24.0 3.0 3.0 1129.0 2438.0 2096.0 2336.0 강동 152.0 160.0 13.0 14.0 5.0 4.0 902.0 1754.0 2201.0 2530.0 강북 159.0 217.0 4.0 5.0 6.0 7.0 672.0 1222.0 2482.0 2778.0 강서 239.0 275.0 10.0 10.0 10.0 9.0 1070.0 1952.0 2768.0 3204.0 관악 264.0 322.0 10.0 12.0 7.0 6.0 937.0 2103.0 2707.0 3235.0 In [255]:crime_station.columns #Multi index
Out[255]:MultiIndex([('sum', '건수', '강간', '검거'), ('sum', '건수', '강간', '발생'), ('sum', '건수', '강도', '검거'), ('sum', '건수', '강도', '발생'), ('sum', '건수', '살인', '검거'), ('sum', '건수', '살인', '발생'), ('sum', '건수', '절도', '검거'), ('sum', '건수', '절도', '발생'), ('sum', '건수', '폭력', '검거'), ('sum', '건수', '폭력', '발생')], names=[None, None, '죄종', '발생검거'])In [256]:crime_station["sum","건수","강도","검거"][:5]
Out[256]:구분 강남 26.0 강동 13.0 강북 4.0 강서 10.0 관악 10.0 Name: (sum, 건수, 강도, 검거), dtype: float64
In [257]:crime_station.columns = crime_station.columns.droplevel([0,1]) #다중 컬럼에서 특정 컬럼 제거 crime_station.columns
Out[257]:MultiIndex([('강간', '검거'), ('강간', '발생'), ('강도', '검거'), ('강도', '발생'), ('살인', '검거'), ('살인', '발생'), ('절도', '검거'), ('절도', '발생'), ('폭력', '검거'), ('폭력', '발생')], names=['죄종', '발생검거'])In [258]:crime_station.head()
Out[258]:죄종 강간 강도 살인 절도 폭력 발생검거 검거 발생 검거 발생 검거 발생 검거 발생 검거 발생 구분 강남 269.0 339.0 26.0 24.0 3.0 3.0 1129.0 2438.0 2096.0 2336.0 강동 152.0 160.0 13.0 14.0 5.0 4.0 902.0 1754.0 2201.0 2530.0 강북 159.0 217.0 4.0 5.0 6.0 7.0 672.0 1222.0 2482.0 2778.0 강서 239.0 275.0 10.0 10.0 10.0 9.0 1070.0 1952.0 2768.0 3204.0 관악 264.0 322.0 10.0 12.0 7.0 6.0 937.0 2103.0 2707.0 3235.0 4. Python 모듈 설치¶
pip 명령¶
- python의 고식 모듈 관리자
- pip list
- pip install module_name
- pip uninstall module_name
In [259]:!pip list get_ipython().system("pip list")
Package Version -------------------- ----------- anyio 3.5.0 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 asttokens 2.0.5 attrs 22.1.0 Babel 2.11.0 backcall 0.2.0 beautifulsoup4 4.11.1 bleach 4.1.0 Bottleneck 1.3.5 branca 0.6.0 brotlipy 0.7.0 certifi 2022.12.7 cffi 1.15.1 charset-normalizer 2.0.4 colorama 0.4.6 comm 0.1.2 contourpy 1.0.5 cryptography 38.0.4 cycler 0.11.0 debugpy 1.5.1 decorator 5.1.1 defusedxml 0.7.1 entrypoints 0.4 et-xmlfile 1.1.0 executing 0.8.3 fastjsonschema 2.16.2 flit_core 3.6.0 folium 0.14.0 fonttools 4.25.0 googlemaps 2.5.1 idna 3.4 importlib-metadata 4.11.3 importlib-resources 5.2.0 ipykernel 6.19.2 ipython 8.10.0 ipython-genutils 0.2.0 ipywidgets 7.6.5 jedi 0.18.1 Jinja2 3.1.2 json5 0.9.6 jsonschema 4.17.3 jupyter 1.0.0 jupyter_client 7.4.9 jupyter-console 6.4.4 jupyter_core 5.2.0 jupyter-server 1.23.4 jupyterlab 3.5.3 jupyterlab-pygments 0.1.2 jupyterlab_server 2.19.0 jupyterlab-widgets 1.0.0 kiwisolver 1.4.4 lxml 4.9.1 MarkupSafe 2.1.1 matplotlib 3.6.2 matplotlib-inline 0.1.6 mistune 0.8.4 mkl-fft 1.3.1 mkl-random 1.2.2 mkl-service 2.4.0 munkres 1.1.4 nbclassic 0.5.2 nbclient 0.5.13 nbconvert 6.5.4 nbformat 5.7.0 nest-asyncio 1.5.6 notebook 6.5.2 notebook_shim 0.2.2 numexpr 2.8.4 numpy 1.23.5 openpyxl 3.1.1 packaging 22.0 pandas 1.5.2 pandocfilters 1.5.0 parso 0.8.3 patsy 0.5.3 pickleshare 0.7.5 Pillow 9.3.0 pip 22.3.1 pkgutil_resolve_name 1.3.10 platformdirs 2.5.2 ply 3.11 prometheus-client 0.14.1 prompt-toolkit 3.0.36 psutil 5.9.0 pure-eval 0.2.2 pycparser 2.21 Pygments 2.11.2 pyOpenSSL 22.0.0 pyparsing 3.0.9 PyQt5 5.15.7 PyQt5-sip 12.11.0 pyrsistent 0.18.0 PySocks 1.7.1 python-dateutil 2.8.2 pytz 2022.7 pywin32 305.1 pywinpty 2.0.2 pyzmq 23.2.0 qtconsole 5.4.0 QtPy 2.2.0 requests 2.28.1 scipy 1.10.1 seaborn 0.12.2 Send2Trash 1.8.0 setuptools 65.6.3 sip 6.6.2 six 1.16.0 sniffio 1.2.0 soupsieve 2.3.2.post1 stack-data 0.2.0 statsmodels 0.13.5 terminado 0.17.1 tinycss2 1.2.1 toml 0.10.2 tomli 2.0.1 tornado 6.2 traitlets 5.7.1 typing_extensions 4.4.0 urllib3 1.26.14 wcwidth 0.2.5 webencodings 0.5.1 websocket-client 0.58.0 wheel 0.38.4 widgetsnbextension 3.5.2 win-inet-pton 1.1.0 wincertstore 0.2 xlrd 2.0.1 zipp 3.11.0 Package Version -------------------- ----------- anyio 3.5.0 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 asttokens 2.0.5 attrs 22.1.0 Babel 2.11.0 backcall 0.2.0 beautifulsoup4 4.11.1 bleach 4.1.0 Bottleneck 1.3.5 branca 0.6.0 brotlipy 0.7.0 certifi 2022.12.7 cffi 1.15.1 charset-normalizer 2.0.4 colorama 0.4.6 comm 0.1.2 contourpy 1.0.5 cryptography 38.0.4 cycler 0.11.0 debugpy 1.5.1 decorator 5.1.1 defusedxml 0.7.1 entrypoints 0.4 et-xmlfile 1.1.0 executing 0.8.3 fastjsonschema 2.16.2 flit_core 3.6.0 folium 0.14.0 fonttools 4.25.0 googlemaps 2.5.1 idna 3.4 importlib-metadata 4.11.3 importlib-resources 5.2.0 ipykernel 6.19.2 ipython 8.10.0 ipython-genutils 0.2.0 ipywidgets 7.6.5 jedi 0.18.1 Jinja2 3.1.2 json5 0.9.6 jsonschema 4.17.3 jupyter 1.0.0 jupyter_client 7.4.9 jupyter-console 6.4.4 jupyter_core 5.2.0 jupyter-server 1.23.4 jupyterlab 3.5.3 jupyterlab-pygments 0.1.2 jupyterlab_server 2.19.0 jupyterlab-widgets 1.0.0 kiwisolver 1.4.4 lxml 4.9.1 MarkupSafe 2.1.1 matplotlib 3.6.2 matplotlib-inline 0.1.6 mistune 0.8.4 mkl-fft 1.3.1 mkl-random 1.2.2 mkl-service 2.4.0 munkres 1.1.4 nbclassic 0.5.2 nbclient 0.5.13 nbconvert 6.5.4 nbformat 5.7.0 nest-asyncio 1.5.6 notebook 6.5.2 notebook_shim 0.2.2 numexpr 2.8.4 numpy 1.23.5 openpyxl 3.1.1 packaging 22.0 pandas 1.5.2 pandocfilters 1.5.0 parso 0.8.3 patsy 0.5.3 pickleshare 0.7.5 Pillow 9.3.0 pip 22.3.1 pkgutil_resolve_name 1.3.10 platformdirs 2.5.2 ply 3.11 prometheus-client 0.14.1 prompt-toolkit 3.0.36 psutil 5.9.0 pure-eval 0.2.2 pycparser 2.21 Pygments 2.11.2 pyOpenSSL 22.0.0 pyparsing 3.0.9 PyQt5 5.15.7 PyQt5-sip 12.11.0 pyrsistent 0.18.0 PySocks 1.7.1 python-dateutil 2.8.2 pytz 2022.7 pywin32 305.1 pywinpty 2.0.2 pyzmq 23.2.0 qtconsole 5.4.0 QtPy 2.2.0 requests 2.28.1 scipy 1.10.1 seaborn 0.12.2 Send2Trash 1.8.0 setuptools 65.6.3 sip 6.6.2 six 1.16.0 sniffio 1.2.0 soupsieve 2.3.2.post1 stack-data 0.2.0 statsmodels 0.13.5 terminado 0.17.1 tinycss2 1.2.1 toml 0.10.2 tomli 2.0.1 tornado 6.2 traitlets 5.7.1 typing_extensions 4.4.0 urllib3 1.26.14 wcwidth 0.2.5 webencodings 0.5.1 websocket-client 0.58.0 wheel 0.38.4 widgetsnbextension 3.5.2 win-inet-pton 1.1.0 wincertstore 0.2 xlrd 2.0.1 zipp 3.11.0
conda 명령¶
- condalist
- conda install module_name
- conda uninstall module_name
- conda install -c channel_name module_name
- 지정된 배포 채널에서 모듈 설치
5. Google Maps API 설치¶
In [260]:# 구글 계정 # AIzaSyBRwnhXCj6U8hJkACJjk3CdL1aLzh_Knso
conda install -c conda-forge googlemaps
In [261]:import googlemaps
In [262]:gmaps_key = "AIzaSyBRwnhXCj6U8hJkACJjk3CdL1aLzh_Knso" gmaps= googlemaps.Client(key=gmaps_key)
In [263]:gmaps.geocode("서울영등포경찰서",language="ko")
Out[263]:[{'address_components': [{'long_name': '608', 'short_name': '608', 'types': ['premise']}, {'long_name': '국회대로', 'short_name': '국회대로', 'types': ['political', 'sublocality', 'sublocality_level_4']}, {'long_name': '영등포구', 'short_name': '영등포구', 'types': ['political', 'sublocality', 'sublocality_level_1']}, {'long_name': '서울특별시', 'short_name': '서울특별시', 'types': ['administrative_area_level_1', 'political']}, {'long_name': '대한민국', 'short_name': 'KR', 'types': ['country', 'political']}, {'long_name': '150-043', 'short_name': '150-043', 'types': ['postal_code']}], 'formatted_address': '대한민국 서울특별시 영등포구 국회대로 608', 'geometry': {'location': {'lat': 37.5260441, 'lng': 126.9008091}, 'location_type': 'ROOFTOP', 'viewport': {'northeast': {'lat': 37.5273930802915, 'lng': 126.9021580802915}, 'southwest': {'lat': 37.5246951197085, 'lng': 126.8994601197085}}}, 'partial_match': True, 'place_id': 'ChIJ1TimJLaffDURptXOs0Tj6sY', 'plus_code': {'compound_code': 'GWG2+C8 대한민국 서울특별시', 'global_code': '8Q98GWG2+C8'}, 'types': ['establishment', 'point_of_interest', 'police']}]
Python 반복문¶
간단한 for문 예제¶
In [264]:for n in [1,2,3,4]: print("Number is", n)
Number is 1 Number is 2 Number is 3 Number is 4
조금 복잡한 for문 예제¶
In [265]:for n in range(0,10): print(n ** 2)
0 1 4 9 16 25 36 49 64 81
위 코드를 한 줄로: list comprehension¶
In [266]:[n ** 2 for n in range(0, 10)]
Out[266]:[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Pandas에 잘 맞춰진 반복문용 명령 iterrows()¶
- Pandas 데이터 프레임은 대부분 2차원
- 이럴 때 for문을 사용하면, n번째라는 지정을 반복해서 가독률 떨어짐
- Pandas 데이터 프레임으로 반복문 만들 때 iterrows() 옵션 사용하면 편함
- 받을 때, 인덱스와 내용으로 나누어 받는 것만 주의
6. Google Maps를 이용한 데이터 정리¶
In [267]:gmaps.geocode("서울영등포경찰서",language="ko")
Out[267]:[{'address_components': [{'long_name': '608', 'short_name': '608', 'types': ['premise']}, {'long_name': '국회대로', 'short_name': '국회대로', 'types': ['political', 'sublocality', 'sublocality_level_4']}, {'long_name': '영등포구', 'short_name': '영등포구', 'types': ['political', 'sublocality', 'sublocality_level_1']}, {'long_name': '서울특별시', 'short_name': '서울특별시', 'types': ['administrative_area_level_1', 'political']}, {'long_name': '대한민국', 'short_name': 'KR', 'types': ['country', 'political']}, {'long_name': '150-043', 'short_name': '150-043', 'types': ['postal_code']}], 'formatted_address': '대한민국 서울특별시 영등포구 국회대로 608', 'geometry': {'location': {'lat': 37.5260441, 'lng': 126.9008091}, 'location_type': 'ROOFTOP', 'viewport': {'northeast': {'lat': 37.5273930802915, 'lng': 126.9021580802915}, 'southwest': {'lat': 37.5246951197085, 'lng': 126.8994601197085}}}, 'partial_match': True, 'place_id': 'ChIJ1TimJLaffDURptXOs0Tj6sY', 'plus_code': {'compound_code': 'GWG2+C8 대한민국 서울특별시', 'global_code': '8Q98GWG2+C8'}, 'types': ['establishment', 'point_of_interest', 'police']}]In [268]:tmp = gmaps.geocode("서울영등포경찰서",language="ko")
In [269]:tmp[0].get("geometry")["location"]
Out[269]:{'lat': 37.5260441, 'lng': 126.9008091}In [270]:print(tmp[0].get("geometry")["location"]["lat"]) print(tmp[0].get("geometry")["location"]["lng"])
37.5260441 126.9008091
In [271]:tmp[0].get("formatted_address")
Out[271]:'대한민국 서울특별시 영등포구 국회대로 608'
In [272]:tmp[0].get("formatted_address").split()[2]
Out[272]:'영등포구'
In [273]:crime_station.head()
Out[273]:죄종 강간 강도 살인 절도 폭력 발생검거 검거 발생 검거 발생 검거 발생 검거 발생 검거 발생 구분 강남 269.0 339.0 26.0 24.0 3.0 3.0 1129.0 2438.0 2096.0 2336.0 강동 152.0 160.0 13.0 14.0 5.0 4.0 902.0 1754.0 2201.0 2530.0 강북 159.0 217.0 4.0 5.0 6.0 7.0 672.0 1222.0 2482.0 2778.0 강서 239.0 275.0 10.0 10.0 10.0 9.0 1070.0 1952.0 2768.0 3204.0 관악 264.0 322.0 10.0 12.0 7.0 6.0 937.0 2103.0 2707.0 3235.0 In [274]:crime_station["구별"] = np.nan crime_station["lat"] = np.nan crime_station["lng"] = np.nan
In [275]:crime_station.head()
Out[275]:죄종 강간 강도 살인 절도 폭력 구별 lat lng 발생검거 검거 발생 검거 발생 검거 발생 검거 발생 검거 발생 구분 강남 269.0 339.0 26.0 24.0 3.0 3.0 1129.0 2438.0 2096.0 2336.0 NaN NaN NaN 강동 152.0 160.0 13.0 14.0 5.0 4.0 902.0 1754.0 2201.0 2530.0 NaN NaN NaN 강북 159.0 217.0 4.0 5.0 6.0 7.0 672.0 1222.0 2482.0 2778.0 NaN NaN NaN 강서 239.0 275.0 10.0 10.0 10.0 9.0 1070.0 1952.0 2768.0 3204.0 NaN NaN NaN 관악 264.0 322.0 10.0 12.0 7.0 6.0 937.0 2103.0 2707.0 3235.0 NaN NaN NaN - 경찰서 이름에서 소속된 구이름 얻기
- 구이름과 위도 경도 정보를 저장할 준비
- 반복문을 이용해서 위 표의 Nan을 모두 채워준다
- iterrows()
In [276]:count = 0 for idx, rows in crime_station.iterrows(): station_name = "서울" + str(idx) + "경찰서" tmp = gmaps.geocode(station_name, language = "ko") tmp_gu = tmp[0].get("formatted_address").split()[2] lat = tmp[0].get("geometry")["location"]["lat"] lng = tmp[0].get("geometry")["location"]["lng"] crime_station.loc[idx,"lat"] = lat crime_station.loc[idx,"lng"] = lng crime_station.loc[idx,"구별"] = tmp_gu print(count) #제대로 작동하는지 확인 count += 1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
In [277]:crime_station.head()
Out[277]:죄종 강간 강도 살인 절도 폭력 구별 lat lng 발생검거 검거 발생 검거 발생 검거 발생 검거 발생 검거 발생 구분 강남 269.0 339.0 26.0 24.0 3.0 3.0 1129.0 2438.0 2096.0 2336.0 강남구 37.509435 127.066958 강동 152.0 160.0 13.0 14.0 5.0 4.0 902.0 1754.0 2201.0 2530.0 강동구 37.528511 127.126822 강북 159.0 217.0 4.0 5.0 6.0 7.0 672.0 1222.0 2482.0 2778.0 강북구 37.637197 127.027305 강서 239.0 275.0 10.0 10.0 10.0 9.0 1070.0 1952.0 2768.0 3204.0 양천구 37.539783 126.829997 관악 264.0 322.0 10.0 12.0 7.0 6.0 937.0 2103.0 2707.0 3235.0 관악구 37.474395 126.951349 In [278]:crime_station.columns.get_level_values(0) + crime_station.columns.get_level_values(1)
Out[278]:Index(['강간검거', '강간발생', '강도검거', '강도발생', '살인검거', '살인발생', '절도검거', '절도발생', '폭력검거', '폭력발생', '구별', 'lat', 'lng'], dtype='object')In [279]:crime_station.columns.get_level_values(0)[12]
Out[279]:'lng'
In [280]:tmp = [crime_station.columns.get_level_values(0)[n] + crime_station.columns.get_level_values(1)[n] for n in range(0, len(crime_station.columns.get_level_values(0))) ] tmp
Out[280]:['강간검거', '강간발생', '강도검거', '강도발생', '살인검거', '살인발생', '절도검거', '절도발생', '폭력검거', '폭력발생', '구별', 'lat', 'lng']
In [281]:crime_station.columns = tmp
In [282]:crime_station.head()
Out[282]:강간검거 강간발생 강도검거 강도발생 살인검거 살인발생 절도검거 절도발생 폭력검거 폭력발생 구별 lat lng 구분 강남 269.0 339.0 26.0 24.0 3.0 3.0 1129.0 2438.0 2096.0 2336.0 강남구 37.509435 127.066958 강동 152.0 160.0 13.0 14.0 5.0 4.0 902.0 1754.0 2201.0 2530.0 강동구 37.528511 127.126822 강북 159.0 217.0 4.0 5.0 6.0 7.0 672.0 1222.0 2482.0 2778.0 강북구 37.637197 127.027305 강서 239.0 275.0 10.0 10.0 10.0 9.0 1070.0 1952.0 2768.0 3204.0 양천구 37.539783 126.829997 관악 264.0 322.0 10.0 12.0 7.0 6.0 937.0 2103.0 2707.0 3235.0 관악구 37.474395 126.951349 In [283]:#데이터 저장 crime_station.to_csv("../data/02. crime_in_Seoul_raw.csv", sep=",", encoding = "utf-8")
In [284]:pd.read_csv("../data/02. crime_in_Seoul_raw.csv").head(2)
Out[284]:구분 강간검거 강간발생 강도검거 강도발생 살인검거 살인발생 절도검거 절도발생 폭력검거 폭력발생 구별 lat lng 0 강남 269.0 339.0 26.0 24.0 3.0 3.0 1129.0 2438.0 2096.0 2336.0 강남구 37.509435 127.066958 1 강동 152.0 160.0 13.0 14.0 5.0 4.0 902.0 1754.0 2201.0 2530.0 강동구 37.528511 127.126822 7. 구별 데이터로 정리¶
In [285]:crime_anal_station = pd.read_csv("../data/02. crime_in_Seoul_raw.csv", index_col = 0, encoding = "utf-8") # index_col "구분"을 인덱스 칼럼으로 설정 crime_anal_station.head()
Out[285]:강간검거 강간발생 강도검거 강도발생 살인검거 살인발생 절도검거 절도발생 폭력검거 폭력발생 구별 lat lng 구분 강남 269.0 339.0 26.0 24.0 3.0 3.0 1129.0 2438.0 2096.0 2336.0 강남구 37.509435 127.066958 강동 152.0 160.0 13.0 14.0 5.0 4.0 902.0 1754.0 2201.0 2530.0 강동구 37.528511 127.126822 강북 159.0 217.0 4.0 5.0 6.0 7.0 672.0 1222.0 2482.0 2778.0 강북구 37.637197 127.027305 강서 239.0 275.0 10.0 10.0 10.0 9.0 1070.0 1952.0 2768.0 3204.0 양천구 37.539783 126.829997 관악 264.0 322.0 10.0 12.0 7.0 6.0 937.0 2103.0 2707.0 3235.0 관악구 37.474395 126.951349 In [286]:crime_anal_gu = pd.pivot_table(crime_anal_station, index="구별", aggfunc = np.sum) del crime_anal_gu["lat"] crime_anal_gu.drop("lng", axis = 1, inplace = True) crime_anal_gu.head()
Out[286]:강간검거 강간발생 강도검거 강도발생 살인검거 살인발생 절도검거 절도발생 폭력검거 폭력발생 구별 강남구 413.0 516.0 42.0 39.0 5.0 5.0 1918.0 3587.0 3527.0 4002.0 강동구 152.0 160.0 13.0 14.0 5.0 4.0 902.0 1754.0 2201.0 2530.0 강북구 159.0 217.0 4.0 5.0 6.0 7.0 672.0 1222.0 2482.0 2778.0 관악구 264.0 322.0 10.0 12.0 7.0 6.0 937.0 2103.0 2707.0 3235.0 광진구 234.0 279.0 6.0 11.0 4.0 4.0 1057.0 2636.0 2011.0 2392.0 In [287]:#다수의 컬럼을 다른 컬럼으로 나누기 crime_anal_gu[["강도검거", "살인검거"]].div(crime_anal_gu["강도발생"], axis=0).head()
Out[287]:강도검거 살인검거 구별 강남구 1.076923 0.128205 강동구 0.928571 0.357143 강북구 0.800000 1.200000 관악구 0.833333 0.583333 광진구 0.545455 0.363636 In [288]:#다수의 컬럼을 다수의 컬럼으로 각각 나누기 num = ["강간검거", "강도검거", "살인검거", "절도검거", "폭력검거"] den = ["강간발생", "강도발생", "살인발생", "절도발생", "폭력발생"] crime_anal_gu[num].div(crime_anal_gu[den].values).head()
Out[288]:강간검거 강도검거 살인검거 절도검거 폭력검거 구별 강남구 0.800388 1.076923 1.000000 0.534709 0.881309 강동구 0.950000 0.928571 1.250000 0.514253 0.869960 강북구 0.732719 0.800000 0.857143 0.549918 0.893449 관악구 0.819876 0.833333 1.166667 0.445554 0.836785 광진구 0.838710 0.545455 1.000000 0.400986 0.840719 In [289]:target = ["강간검거율", "강도검거율", "살인검거율", "절도검거율", "폭력검거율"] num = ["강간검거", "강도검거", "살인검거", "절도검거", "폭력검거"] den = ["강간발생", "강도발생", "살인발생", "절도발생", "폭력발생"] crime_anal_gu[target] = crime_anal_gu[num].div(crime_anal_gu[den].values) * 100 crime_anal_gu.head()
Out[289]:강간검거 강간발생 강도검거 강도발생 살인검거 살인발생 절도검거 절도발생 폭력검거 폭력발생 강간검거율 강도검거율 살인검거율 절도검거율 폭력검거율 구별 강남구 413.0 516.0 42.0 39.0 5.0 5.0 1918.0 3587.0 3527.0 4002.0 80.038760 107.692308 100.000000 53.470867 88.130935 강동구 152.0 160.0 13.0 14.0 5.0 4.0 902.0 1754.0 2201.0 2530.0 95.000000 92.857143 125.000000 51.425314 86.996047 강북구 159.0 217.0 4.0 5.0 6.0 7.0 672.0 1222.0 2482.0 2778.0 73.271889 80.000000 85.714286 54.991817 89.344852 관악구 264.0 322.0 10.0 12.0 7.0 6.0 937.0 2103.0 2707.0 3235.0 81.987578 83.333333 116.666667 44.555397 83.678516 광진구 234.0 279.0 6.0 11.0 4.0 4.0 1057.0 2636.0 2011.0 2392.0 83.870968 54.545455 100.000000 40.098634 84.071906 In [290]:del crime_anal_gu["강간검거"] del crime_anal_gu["강도검거"] del crime_anal_gu["살인검거"] del crime_anal_gu["절도검거"] del crime_anal_gu["폭력검거"] crime_anal_gu.head()
Out[290]:강간발생 강도발생 살인발생 절도발생 폭력발생 강간검거율 강도검거율 살인검거율 절도검거율 폭력검거율 구별 강남구 516.0 39.0 5.0 3587.0 4002.0 80.038760 107.692308 100.000000 53.470867 88.130935 강동구 160.0 14.0 4.0 1754.0 2530.0 95.000000 92.857143 125.000000 51.425314 86.996047 강북구 217.0 5.0 7.0 1222.0 2778.0 73.271889 80.000000 85.714286 54.991817 89.344852 관악구 322.0 12.0 6.0 2103.0 3235.0 81.987578 83.333333 116.666667 44.555397 83.678516 광진구 279.0 11.0 4.0 2636.0 2392.0 83.870968 54.545455 100.000000 40.098634 84.071906 In [291]:# 100보다 큰 숫자 찾아서 바꾸기 crime_anal_gu[crime_anal_gu[target] > 100] = 100 crime_anal_gu.head()
Out[291]:강간발생 강도발생 살인발생 절도발생 폭력발생 강간검거율 강도검거율 살인검거율 절도검거율 폭력검거율 구별 강남구 516.0 39.0 5.0 3587.0 4002.0 80.038760 100.000000 100.000000 53.470867 88.130935 강동구 160.0 14.0 4.0 1754.0 2530.0 95.000000 92.857143 100.000000 51.425314 86.996047 강북구 217.0 5.0 7.0 1222.0 2778.0 73.271889 80.000000 85.714286 54.991817 89.344852 관악구 322.0 12.0 6.0 2103.0 3235.0 81.987578 83.333333 100.000000 44.555397 83.678516 광진구 279.0 11.0 4.0 2636.0 2392.0 83.870968 54.545455 100.000000 40.098634 84.071906 In [292]:crime_anal_gu.rename(columns = {"강간발생":"강간", "살인발생":"살인", "절도발생":"절도", "폭력발생":"폭력", "강도발생":"강도"}, inplace = True) crime_anal_gu.head()
Out[292]:강간 강도 살인 절도 폭력 강간검거율 강도검거율 살인검거율 절도검거율 폭력검거율 구별 강남구 516.0 39.0 5.0 3587.0 4002.0 80.038760 100.000000 100.000000 53.470867 88.130935 강동구 160.0 14.0 4.0 1754.0 2530.0 95.000000 92.857143 100.000000 51.425314 86.996047 강북구 217.0 5.0 7.0 1222.0 2778.0 73.271889 80.000000 85.714286 54.991817 89.344852 관악구 322.0 12.0 6.0 2103.0 3235.0 81.987578 83.333333 100.000000 44.555397 83.678516 광진구 279.0 11.0 4.0 2636.0 2392.0 83.870968 54.545455 100.000000 40.098634 84.071906 8. 범죄 데이터 정렬을 위한 데이터 정리¶
In [293]:crime_anal_gu.head()
Out[293]:강간 강도 살인 절도 폭력 강간검거율 강도검거율 살인검거율 절도검거율 폭력검거율 구별 강남구 516.0 39.0 5.0 3587.0 4002.0 80.038760 100.000000 100.000000 53.470867 88.130935 강동구 160.0 14.0 4.0 1754.0 2530.0 95.000000 92.857143 100.000000 51.425314 86.996047 강북구 217.0 5.0 7.0 1222.0 2778.0 73.271889 80.000000 85.714286 54.991817 89.344852 관악구 322.0 12.0 6.0 2103.0 3235.0 81.987578 83.333333 100.000000 44.555397 83.678516 광진구 279.0 11.0 4.0 2636.0 2392.0 83.870968 54.545455 100.000000 40.098634 84.071906 In [294]:# 정규화: 최고값은 1, 최소값은 0 crime_anal_gu["강도"] / crime_anal_gu["강도"].max()
Out[294]:구별 강남구 1.000000 강동구 0.358974 강북구 0.128205 관악구 0.307692 광진구 0.282051 구로구 0.256410 금천구 0.179487 노원구 0.153846 도봉구 0.128205 동대문구 0.256410 동작구 0.179487 마포구 0.102564 서대문구 0.128205 서초구 0.333333 성동구 0.076923 성북구 0.205128 송파구 0.384615 양천구 0.435897 영등포구 0.487179 용산구 0.230769 은평구 0.230769 종로구 0.307692 중구 0.205128 중랑구 0.358974 Name: 강도, dtype: float64
In [295]:col = ["살인", "강도", "강간", "절도", "폭력"] crime_anal_norm = crime_anal_gu[col] / crime_anal_gu[col].max() crime_anal_norm.head()
Out[295]:살인 강도 강간 절도 폭력 구별 강남구 0.357143 1.000000 1.000000 0.977118 0.733773 강동구 0.285714 0.358974 0.310078 0.477799 0.463880 강북구 0.500000 0.128205 0.420543 0.332879 0.509351 관악구 0.428571 0.307692 0.624031 0.572868 0.593143 광진구 0.285714 0.282051 0.540698 0.718060 0.438577 In [296]:crime_anal_gu.head(1)
Out[296]:강간 강도 살인 절도 폭력 강간검거율 강도검거율 살인검거율 절도검거율 폭력검거율 구별 강남구 516.0 39.0 5.0 3587.0 4002.0 80.03876 100.0 100.0 53.470867 88.130935 In [297]:# 검거율 추가 col2 = ["강간검거율", "강도검거율","살인검거율","절도검거율","폭력검거율"] crime_anal_norm[col2] = crime_anal_gu[col2] crime_anal_norm.head()
Out[297]:살인 강도 강간 절도 폭력 강간검거율 강도검거율 살인검거율 절도검거율 폭력검거율 구별 강남구 0.357143 1.000000 1.000000 0.977118 0.733773 80.038760 100.000000 100.000000 53.470867 88.130935 강동구 0.285714 0.358974 0.310078 0.477799 0.463880 95.000000 92.857143 100.000000 51.425314 86.996047 강북구 0.500000 0.128205 0.420543 0.332879 0.509351 73.271889 80.000000 85.714286 54.991817 89.344852 관악구 0.428571 0.307692 0.624031 0.572868 0.593143 81.987578 83.333333 100.000000 44.555397 83.678516 광진구 0.285714 0.282051 0.540698 0.718060 0.438577 83.870968 54.545455 100.000000 40.098634 84.071906 In [298]:# 구별 cctv 자료에서 인구수, cctv수 추가 result_CCTV = pd.read_csv("../data/01. CCTV_result.csv",index_col = "구별", encoding = "utf-8") result_CCTV.head()
Out[298]:소계 최근증가율 인구수 한국인 외국인 고령자 외국인비율 고령자비율 CCTV비율 오차 구별 강남구 3238 150.619195 561052 556164 4888 65060 0.871220 11.596073 0.577130 1549.200326 강동구 1010 166.490765 440359 436223 4136 56161 0.939234 12.753458 0.229358 -544.642322 강북구 831 125.203252 328002 324479 3523 56530 1.074079 17.234651 0.253352 -598.750923 강서구 911 134.793814 608255 601691 6564 76032 1.079153 12.500021 0.149773 -830.268578 관악구 2109 149.290780 520929 503297 17632 70046 3.384722 13.446362 0.404854 464.799395 In [299]:crime_anal_norm[["인구수", "CCTV"]] = result_CCTV[["인구수", "소계"]] crime_anal_norm.head()
Out[299]:살인 강도 강간 절도 폭력 강간검거율 강도검거율 살인검거율 절도검거율 폭력검거율 인구수 CCTV 구별 강남구 0.357143 1.000000 1.000000 0.977118 0.733773 80.038760 100.000000 100.000000 53.470867 88.130935 561052 3238 강동구 0.285714 0.358974 0.310078 0.477799 0.463880 95.000000 92.857143 100.000000 51.425314 86.996047 440359 1010 강북구 0.500000 0.128205 0.420543 0.332879 0.509351 73.271889 80.000000 85.714286 54.991817 89.344852 328002 831 관악구 0.428571 0.307692 0.624031 0.572868 0.593143 81.987578 83.333333 100.000000 44.555397 83.678516 520929 2109 광진구 0.285714 0.282051 0.540698 0.718060 0.438577 83.870968 54.545455 100.000000 40.098634 84.071906 372298 878 In [300]:# 정규화된 범죄발생 건수 전체의 평균을 구해서 범죄 컬럼 대표값으로 사용 col = ["강간", "강도", "살인", "절도", "폭력"] crime_anal_norm["범죄"] = np.mean(crime_anal_norm[col], axis = 1) crime_anal_norm.head()
Out[300]:살인 강도 강간 절도 폭력 강간검거율 강도검거율 살인검거율 절도검거율 폭력검거율 인구수 CCTV 범죄 구별 강남구 0.357143 1.000000 1.000000 0.977118 0.733773 80.038760 100.000000 100.000000 53.470867 88.130935 561052 3238 0.813607 강동구 0.285714 0.358974 0.310078 0.477799 0.463880 95.000000 92.857143 100.000000 51.425314 86.996047 440359 1010 0.379289 강북구 0.500000 0.128205 0.420543 0.332879 0.509351 73.271889 80.000000 85.714286 54.991817 89.344852 328002 831 0.378196 관악구 0.428571 0.307692 0.624031 0.572868 0.593143 81.987578 83.333333 100.000000 44.555397 83.678516 520929 2109 0.505261 광진구 0.285714 0.282051 0.540698 0.718060 0.438577 83.870968 54.545455 100.000000 40.098634 84.071906 372298 878 0.453020
np.mean()¶
In [301]:np.array([0.357143, 1.000000, 1.000000, 0.977118, 0.733773]) np.mean(np.array([0.357143, 1.000000, 1.000000, 0.977118, 0.733773]))
Out[301]:0.8136068
In [302]:np.mean(np.array( [[0.357143, 1.000000, 1.000000, 0.977118, 0.733773], [0.285714, 0.358974, 0.310078, 0.477799, 0.463880]]), axis = 1 ) # axis = 1: 행 값 / axis = 0: 열 값 (cf. pandas는 반대)
Out[302]:array([0.8136068, 0.379289 ])
In [303]:# 검거율의 평균을 구해서 검거 컬럼의 대표값으로 사용 col = ["강간검거율", "강도검거율", "살인검거율", "절도검거율", "폭력검거율"] crime_anal_norm["검거"] = np.mean(crime_anal_norm[col], axis = 1) crime_anal_norm.head()
Out[303]:살인 강도 강간 절도 폭력 강간검거율 강도검거율 살인검거율 절도검거율 폭력검거율 인구수 CCTV 범죄 검거 구별 강남구 0.357143 1.000000 1.000000 0.977118 0.733773 80.038760 100.000000 100.000000 53.470867 88.130935 561052 3238 0.813607 84.328112 강동구 0.285714 0.358974 0.310078 0.477799 0.463880 95.000000 92.857143 100.000000 51.425314 86.996047 440359 1010 0.379289 85.255701 강북구 0.500000 0.128205 0.420543 0.332879 0.509351 73.271889 80.000000 85.714286 54.991817 89.344852 328002 831 0.378196 76.664569 관악구 0.428571 0.307692 0.624031 0.572868 0.593143 81.987578 83.333333 100.000000 44.555397 83.678516 520929 2109 0.505261 78.710965 광진구 0.285714 0.282051 0.540698 0.718060 0.438577 83.870968 54.545455 100.000000 40.098634 84.071906 372298 878 0.453020 72.517393
Seaborn()¶
In [304]:conda install -y seaborn
Collecting package metadata (current_repodata.json): ...working... done Solving environment: ...working... done # All requested packages already installed. Note: you may need to restart the kernel to use updated packages.
In [305]:import matplotlib.pyplot as plt import seaborn as sns from matplotlib import rc plt.rcParams["axes.unicode_minus"] = False rc("font", family = "Malgun Gothic") %matplotlib inline get_ipython().run_line_magic("matplotlib","inline")
예제1: seaborn 기초¶
In [306]:np.linspace(0, 14, 100)
Out[306]:array([ 0. , 0.14141414, 0.28282828, 0.42424242, 0.56565657, 0.70707071, 0.84848485, 0.98989899, 1.13131313, 1.27272727, 1.41414141, 1.55555556, 1.6969697 , 1.83838384, 1.97979798, 2.12121212, 2.26262626, 2.4040404 , 2.54545455, 2.68686869, 2.82828283, 2.96969697, 3.11111111, 3.25252525, 3.39393939, 3.53535354, 3.67676768, 3.81818182, 3.95959596, 4.1010101 , 4.24242424, 4.38383838, 4.52525253, 4.66666667, 4.80808081, 4.94949495, 5.09090909, 5.23232323, 5.37373737, 5.51515152, 5.65656566, 5.7979798 , 5.93939394, 6.08080808, 6.22222222, 6.36363636, 6.50505051, 6.64646465, 6.78787879, 6.92929293, 7.07070707, 7.21212121, 7.35353535, 7.49494949, 7.63636364, 7.77777778, 7.91919192, 8.06060606, 8.2020202 , 8.34343434, 8.48484848, 8.62626263, 8.76767677, 8.90909091, 9.05050505, 9.19191919, 9.33333333, 9.47474747, 9.61616162, 9.75757576, 9.8989899 , 10.04040404, 10.18181818, 10.32323232, 10.46464646, 10.60606061, 10.74747475, 10.88888889, 11.03030303, 11.17171717, 11.31313131, 11.45454545, 11.5959596 , 11.73737374, 11.87878788, 12.02020202, 12.16161616, 12.3030303 , 12.44444444, 12.58585859, 12.72727273, 12.86868687, 13.01010101, 13.15151515, 13.29292929, 13.43434343, 13.57575758, 13.71717172, 13.85858586, 14. ])In [307]:x = np.linspace(0, 14, 100) y1 = np.sin(x) y2 = 2 * np.sin(x + 0.5) y3 = 3 * np.sin(x + 1) y4 = 4 * np.sin(x + 1.5)
In [308]:plt.figure(figsize =(10, 6)) plt.plot(x, y1, x, y2, x, y4) plt.show()
In [309]:# sns.set_style() #white, whitegrid, dark, darkgrid,ticks sns.set_style("white") plt.figure(figsize =(10, 6)) plt.plot(x, y1, x, y2, x, y4) plt.show()
예제2: seaborn tips data¶
- boxplot
- swarmplot
- lmplot
In [310]:tips = sns.load_dataset("tips") tips
Out[310]:total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 ... ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 244 rows × 7 columns
In [311]:tips.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 244 entries, 0 to 243 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 total_bill 244 non-null float64 1 tip 244 non-null float64 2 sex 244 non-null category 3 smoker 244 non-null category 4 day 244 non-null category 5 time 244 non-null category 6 size 244 non-null int64 dtypes: category(4), float64(2), int64(1) memory usage: 7.4 KB
In [312]:#boxplot plt.figure(figsize = (8, 6)) sns.boxplot(x=tips["total_bill"]) plt.show()
In [313]:#boxplot plt.figure(figsize = (8, 6)) sns.boxplot(x=tips["day"],y=tips["total_bill"], data=tips) plt.show()
In [314]:# boxplot hue, palette option plt.figure(figsize = (8, 6)) sns.boxplot(x = "day", y = "total_bill", data = tips, hue ="smoker", palette = "Set3")
Out[314]:<AxesSubplot: xlabel='day', ylabel='total_bill'>
In [315]:#swarmplot # color: 0-1 사이 검은색부터 흰색 값을 조절 plt.figure(figsize = (8, 6)) sns.swarmplot(x="day", y="total_bill", data = tips, color = "0.7")
Out[315]:<AxesSubplot: xlabel='day', ylabel='total_bill'>
In [316]:# boxplot with swarmplot plt.figure(figsize=(8, 6)) sns.boxplot(x="day", y = "total_bill", data=tips) sns.swarmplot(x="day", y="total_bill", data=tips, color = "0.25" ) plt.show()
In [317]:# lmplot: total_bill과 tip 사이 관계 파악 sns.set_style("darkgrid") sns.lmplot(x="total_bill", y="tip", data=tips, height = 7, hue="smoker");
예제3: flights data¶
- heatmap
In [318]:flights = sns.load_dataset("flights") flights.head()
Out[318]:year month passengers 0 1949 Jan 112 1 1949 Feb 118 2 1949 Mar 132 3 1949 Apr 129 4 1949 May 121 In [319]:flights.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 144 entries, 0 to 143 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 144 non-null int64 1 month 144 non-null category 2 passengers 144 non-null int64 dtypes: category(1), int64(2) memory usage: 2.9 KB
In [320]:# pivot # index, columns, values flights = flights.pivot(index="month", columns = "year", values = "passengers") flights.head()
Out[320]:year 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 month Jan 112 115 145 171 196 204 242 284 315 340 360 417 Feb 118 126 150 180 196 188 233 277 301 318 342 391 Mar 132 141 178 193 236 235 267 317 356 362 406 419 Apr 129 135 163 181 235 227 269 313 348 348 396 461 May 121 125 172 183 229 234 270 318 355 363 420 472 In [321]:#headmap plt.figure(figsize = (10, 8)) sns.heatmap(data=flights, annot = True, fmt = "d") # aanot: 데이터 값 표시 / #fmt = format / fmt = "d" - 정수형 표현 plt.show()
In [322]:#colormap plt.figure(figsize = (10, 8)) sns.heatmap(flights, annot=True, fmt="d", cmap = "YlGnBu") plt.show()
예제4: iris data¶
- pairplot
In [323]:iris = sns.load_dataset("iris") iris.tail()
Out[323]:sepal_length sepal_width petal_length petal_width species 145 6.7 3.0 5.2 2.3 virginica 146 6.3 2.5 5.0 1.9 virginica 147 6.5 3.0 5.2 2.0 virginica 148 6.2 3.4 5.4 2.3 virginica 149 5.9 3.0 5.1 1.8 virginica In [324]:# pairplot sns.set_style("ticks") sns.pairplot(iris) plt.show()
In [325]:iris.head(2)
Out[325]:sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa In [326]:iris["species"].unique()
Out[326]:array(['setosa', 'versicolor', 'virginica'], dtype=object)
In [327]:#hue option sns.pairplot(iris, hue="species")
Out[327]:<seaborn.axisgrid.PairGrid at 0x18f5bc92790>
In [328]:# 원하는 컬럼만 pairplot sns.pairplot(iris, x_vars=["sepal_width", "sepal_length"], y_vars = ["petal_width","petal_length"]) plt.show()
예제5: anscombe data¶
- implot
In [329]:anscombe = sns.load_dataset("anscombe") anscombe.head()
Out[329]:dataset x y 0 I 10.0 8.04 1 I 8.0 6.95 2 I 13.0 7.58 3 I 9.0 8.81 4 I 11.0 8.33 In [330]:anscombe["dataset"].unique()
Out[330]:array(['I', 'II', 'III', 'IV'], dtype=object)
In [331]:sns.set_style("darkgrid") sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'I'"), ci=None, height = 7) #ci : 신뢰구간 선택 plt.show()
In [332]:sns.set_style("darkgrid") sns.lmplot( x="x", y="y", data=anscombe.query("dataset == 'I'"), ci=None, #ci : 신뢰구간 선택 height = 7, scatter_kws={"s":30}) plt.show()
In [333]:#order option sns.set_style("darkgrid") sns.lmplot( x="x", y="y", data=anscombe.query("dataset == 'II'"), order = 2, #2차식을 직선으로 ci=None, #ci : 신뢰구간 선택 height = 7, scatter_kws={"s":30}) plt.show()
In [334]:!pip install statsmodelsRequirement already satisfied: statsmodels in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (0.13.5) Requirement already satisfied: patsy>=0.5.2 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from statsmodels) (0.5.3) Requirement already satisfied: scipy>=1.3 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from statsmodels) (1.10.1) Requirement already satisfied: packaging>=21.3 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from statsmodels) (22.0) Requirement already satisfied: pandas>=0.25 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from statsmodels) (1.5.2) Requirement already satisfied: numpy>=1.17 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from statsmodels) (1.23.5) Requirement already satisfied: pytz>=2020.1 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from pandas>=0.25->statsmodels) (2022.7) Requirement already satisfied: python-dateutil>=2.8.1 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from pandas>=0.25->statsmodels) (2.8.2) Requirement already satisfied: six in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from patsy>=0.5.2->statsmodels) (1.16.0)
In [335]:#outlier sns.set_style("darkgrid") sns.lmplot( x="x", y="y", data=anscombe.query("dataset == 'III'"), robust=True, ci=None, #ci : 신뢰구간 선택 height = 7, scatter_kws={"s":30}) plt.show()
9. 서울시 범죄현황 데이터 시각화¶
In [336]:import matplotlib.pyplot as plt import seaborn as sns from matplotlib import rc plt.rcParams["axes.unicode_minus"] = False rc("font", family = "Malgun Gothic") %matplotlib inline
In [337]:crime_anal_norm.head()
Out[337]:살인 강도 강간 절도 폭력 강간검거율 강도검거율 살인검거율 절도검거율 폭력검거율 인구수 CCTV 범죄 검거 구별 강남구 0.357143 1.000000 1.000000 0.977118 0.733773 80.038760 100.000000 100.000000 53.470867 88.130935 561052 3238 0.813607 84.328112 강동구 0.285714 0.358974 0.310078 0.477799 0.463880 95.000000 92.857143 100.000000 51.425314 86.996047 440359 1010 0.379289 85.255701 강북구 0.500000 0.128205 0.420543 0.332879 0.509351 73.271889 80.000000 85.714286 54.991817 89.344852 328002 831 0.378196 76.664569 관악구 0.428571 0.307692 0.624031 0.572868 0.593143 81.987578 83.333333 100.000000 44.555397 83.678516 520929 2109 0.505261 78.710965 광진구 0.285714 0.282051 0.540698 0.718060 0.438577 83.870968 54.545455 100.000000 40.098634 84.071906 372298 878 0.453020 72.517393 In [1]:# pairplot 강도, 살인, 폭력에 대한 상관관계 확인 sns.pairplot(data=crime_anal_norm, vars=["살인", "강도", "폭력"], kind ="reg", height = 3); #kind : scatter, kde, hist, reg
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[1], line 3 1 # pairplot 강도, 살인, 폭력에 대한 상관관계 확인 ----> 3 sns.pairplot(data=crime_anal_norm, vars=["살인", "강도", "폭력"], kind ="scatter", height = 3) NameError: name 'sns' is not defined
In [339]:crime_anal_norm.head(1)
Out[339]:살인 강도 강간 절도 폭력 강간검거율 강도검거율 살인검거율 절도검거율 폭력검거율 인구수 CCTV 범죄 검거 구별 강남구 0.357143 1.0 1.0 0.977118 0.733773 80.03876 100.0 100.0 53.470867 88.130935 561052 3238 0.813607 84.328112 In [340]:# "인구수", "CCTV" 과 "살인", "강도"의 상관관계 확인 def drawGraph(): sns.pairplot( data=crime_anal_norm, x_vars = ["인구수", "CCTV"], y_vars = ["살인", "강도"], kind = "reg", height = 4) plt.show() drawGraph()
In [341]:# "인구수", "CCTV"와 "살인검거율", "폭력검거율"의 상관관계 확인 def drawGraph(): sns.pairplot( data=crime_anal_norm, x_vars = ["인구수", "CCTV"], y_vars = ["살인검거율", "폭력검거율"], kind = "reg", height = 4) plt.show() drawGraph()
In [342]:# "인구수", "CCTV"와 "절도검거율", "강도검거율"의 상관관계 확인 def drawGraph(): sns.pairplot( data=crime_anal_norm, x_vars = ["인구수", "CCTV"], y_vars = ["절도검거율", "강도검거율"], kind = "reg", height = 4) plt.show() drawGraph()
In [343]:crime_anal_norm.head(3)
Out[343]:살인 강도 강간 절도 폭력 강간검거율 강도검거율 살인검거율 절도검거율 폭력검거율 인구수 CCTV 범죄 검거 구별 강남구 0.357143 1.000000 1.000000 0.977118 0.733773 80.038760 100.000000 100.000000 53.470867 88.130935 561052 3238 0.813607 84.328112 강동구 0.285714 0.358974 0.310078 0.477799 0.463880 95.000000 92.857143 100.000000 51.425314 86.996047 440359 1010 0.379289 85.255701 강북구 0.500000 0.128205 0.420543 0.332879 0.509351 73.271889 80.000000 85.714286 54.991817 89.344852 328002 831 0.378196 76.664569 In [344]:# 검거율 heatmap # "검거" 칼럼을 기준으로 정렬 def drawGraph(): # 데이터 프레임 생성 target_col = ["강간검거율", "강도검거율", "살인검거율", "절도검거율", "폭력검거율", "검거"] crime_anal_norm_sort = crime_anal_norm.sort_values(by="검거", ascending=False) #내림차순 # 그래프 설정 plt.figure(figsize = (10, 10)) sns.heatmap( data = crime_anal_norm_sort[target_col], annot = True, #데이터값 표현 fmt = "f", linewidths = 1, #간격설정 cmap = "RdPu" ) plt.title("범죄 검거 비율(정규화된 검거의 합으로 정렬)") plt.show() drawGraph()
In [345]:# 범죄발생 건수 heatmap # "범죄" 컬럼을 기준으로 정렬 def drawGraph(): #데이터 프레임 생성 target_col = ["살인", "강도", "강간", "절도", "폭력", "범죄"] crime_anal_norm_sort = crime_anal_norm.sort_values(by="범죄", ascending = False) #그래프 설정 plt.figure(figsize=(10,10)) sns.heatmap( data=crime_anal_norm_sort[target_col], annot=True, fmt = "f", linewidths=0.5, cmap="RdPu", ) plt.title("범죄 비율(정규화된 발생 건수로 정렬)") plt.show() drawGraph()
In [346]:# 데이터 저장 crime_anal_norm.to_csv("../data/02. crime_in_Seoul_final.csv", sep=",", encoding = "utf-8")
folium¶
In [347]:!pip install foliumRequirement already satisfied: folium in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (0.14.0) Requirement already satisfied: numpy in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from folium) (1.23.5) Requirement already satisfied: requests in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from folium) (2.28.1) Requirement already satisfied: branca>=0.6.0 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from folium) (0.6.0) Requirement already satisfied: jinja2>=2.9 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from folium) (3.1.2) Requirement already satisfied: MarkupSafe>=2.0 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from jinja2>=2.9->folium) (2.1.1) Requirement already satisfied: charset-normalizer<3,>=2 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from requests->folium) (2.0.4) Requirement already satisfied: certifi>=2017.4.17 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from requests->folium) (2022.12.7) Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from requests->folium) (1.26.14) Requirement already satisfied: idna<4,>=2.5 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from requests->folium) (3.4)
In [348]:import folium import pandas as pd import json
folium.map()¶
location: tuple or list, defalt None Latitude and Longitude of Map(Northing, Easting).In [349]:m = folium.Map(location=[37.544564958079896, 127.05582307754338], zoom_start = 14) #0~18 m
Out[349]:Make this Notebook Trusted to load map: File -> Trust Notebooksave(path)¶
In [350]:m.save("./folium.html")
tiles option¶
- "OpenStreetMap" - "Mapbox Bright" (Limited levels of zoom for free tiles) - "Mapbox Control Room" (Limited levels of zoom for free tiles) - "Stamen" (Terrain, Toner, and Watercolor) - "Cloudmade" (Must pass API key) - "Mapbox" (Must pass API key) - "CartoDB" (positron and dark_matter)In [351]:m = folium.Map( location=[37.544564958079896, 127.05582307754338], zoom_start = 14, tiles="OpenStreetMap") m
Out[351]:Make this Notebook Trusted to load map: File -> Trust Notebookfolium.Marker()¶
- 지도에 마커 생성
In [352]:m = folium.Map( location=[37.544564958079896, 127.05582307754338], #성수역 zoom_start = 14, tiles="OpenStreetMap") #뚝섬역 folium.Marker((37.54712311308356, 127.04721916917774)).add_to(m) #성수역 folium.Marker([37.544564958079896, 127.05582307754338], popup = "<b>Subway</b>", tooltip = "<i>성수역</i>" ).add_to(m) # Zerobase folium.Marker([37.54558642069953,127.05729705810472], popup = "<a href='https://zero-base.co.kr/' target=_'blink'>제로베이스</a>", tooltip = "<i>Zerobase</i>" ).add_to(m) m
Out[352]:Make this Notebook Trusted to load map: File -> Trust NotebookIn [353]:m = folium.Map( location=[37.544564958079896, 127.05582307754338], #성수역 zoom_start = 14, tiles="OpenStreetMap") #icon basic folium.Marker((37.54712311308356, 127.04721916917774), icon=folium.Icon(color='black',icon = "info=sign") ).add_to(m) #성수역 folium.Marker([37.544564958079896, 127.05582307754338], popup = "<b>Subway</b>", tooltip = "<i>성수역</i>", icon=folium.Icon( color='red', icon_color = 'blue', icon = "cloud") ).add_to(m) # Zerobase folium.Marker([37.54558642069953,127.05729705810472], popup = "<a href='https://zero-base.co.kr/' target=_'blink'>제로베이스</a>", tooltip = "<i>Zerobase</i>" ).add_to(m) #Icon custom folium.Marker( location=[37.54035903907497, 127.06913328776446], popup="<i>건대입구역</i>", tooltip = "Icon custom", icon=folium.Icon(color="purple", icon_color="green", icon="fa-brands fa-instagram", angle=50, prefix = "fa" #fa, glyphicon )).add_to(m) m
Out[353]:Make this Notebook Trusted to load map: File -> Trust Notebookfolium.ClickForMarker()¶
- 지도위에 마우스 클릭했을 때 마커 생성
In [354]:m = folium.Map( location=[37.544564958079896, 127.05582307754338], #성수역 zoom_start = 14, tiles="OpenStreetMap") m.add_child(folium.ClickForMarker(pop="ClickForMarker"))
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[354], line 6 1 m = folium.Map( 2 location=[37.544564958079896, 127.05582307754338], #성수역 3 zoom_start = 14, 4 tiles="OpenStreetMap") ----> 6 m.add_child(folium.ClickForMarker(pop="ClickForMarker")) TypeError: __init__() got an unexpected keyword argument 'pop'
folium.LatLngPopup()¶
- 지도를 마우스로 클릭했을 때 경도 정보 반환
In [ ]:m = folium.Map( location = [37.5301, 127.0403], zoom_start = 14, title="OpenStreetMap") m.add_child(folium.LatLngPopup())
folium.Circle(), folium.CircleMarker()¶
In [ ]:m = folium.Map( location=[37.544564958079896, 127.05582307754338], #성수역 zoom_start = 14, tiles="OpenStreetMap") #circle folium.Circle([37.5366,127.0099], radius = 100, fill = False, color = "#eb9e34", fill_color = "red", popup = "Circle popup", tooltip = "Circle Tooltip").add_to(m) #CircleMarker folium.CircleMarker([37.5444, 127.0404], radius = 100, fill = False, color = "#34ebc6", fill_color = "blue", popup = "Circle popup", tooltip = "Circle Tooltip").add_to(m) m
folium.Choropleth()¶
In [ ]:import json
In [ ]:state_data = pd.read_csv("../data/02. US_Unemployment_Oct2012.csv") state_data.tail(2)
In [ ]:m = folium.Map([43, -102], zoom_start = 3) folium.Choropleth( geo_data = "../data/02. us-states.json", # 경계선 좌표값이 담긴 데이터 data=state_data, #Series or DataFrame columns = ["State", "Unemployment"], key_on = "feature.id", fill_color = "BuPu", fill_opacity = 1, #0~1 line_opacity = 1, legend_name = "Uneployment rate (%)" ).add_to(m) m
아파트 유형 지도 시각화¶
In [ ]:import pandas as pd
In [ ]:df = pd.read_csv("../data/서울특별시 동작구_주택유형별 위치 정보 및 세대수 현황_20220818.csv", encoding = "cp949") df.info()
In [ ]:# Nan 데이터 제거 df = df.dropna() df.info()
In [ ]:df = df.reset_index(drop=False) df
In [ ]:df = df.rename(columns = {"연번 ":"연번","분류 ":"분류"}) del df["연번"] del df["index"] df
In [ ]:row.위도
In [ ]:df.describe()
In [ ]:# folium m = folium.Map(location=[37.50589466533131, 126.93450729567374], zoom_start = 13) for idx, row in df.iterrows(): # location lat, lng = row.위도, row.경도 #Marker folium.Marker( location = [lat, lng], popup = row.주소, tooltip = row.분류, icon = folium.Icon( icon = "home", color = "lightred" if row.세대수 >=199 else "lightblue", icon_color = "darkred" if row.세대수 >=199 else "darkblue" ) ).add_to(m) #CircleMarker folium.Circle( location = [lat,lng], radius = row.세대수 * 0.2, #세대수에 비례하는 원크기 fill = True, color = "blakc" if row.세대수 >518 else "green", fill_color = "black" if row.세대수 >518 else "green", opacity = 1 ).add_to(m) m
10. 서울시 범죄 현황에 대한 지도 시각화¶
In [357]:import json import folium import pandas as pd
In [358]:crime_anal_norm = pd.read_csv("../data/02. crime_in_Seoul_final.csv", index_col=0, encoding = "utf-8") # index_col : 특정 컬럼을 인덱스로 지정 geo_path="../data/02. skorea_municipalities_geo_simple.json" geo_str = json.load(open(geo_path, encoding = "utf-8"))
In [359]:crime_anal_norm.tail(2)
Out[359]:살인 강도 강간 절도 폭력 강간검거율 강도검거율 살인검거율 절도검거율 폭력검거율 인구수 CCTV 범죄 검거 구별 중구 0.214286 0.205128 0.383721 0.585671 0.407957 74.747475 87.5 100.0 42.511628 89.707865 134593 1023 0.359353 78.893394 중랑구 0.571429 0.358974 0.317829 0.460637 0.580125 91.463415 100.0 87.5 62.211709 85.714286 412780 916 0.457799 85.377882 In [370]:#살인발생 건수 지도 시각화 my_map = folium.Map( location = [37.552, 126.982], zoom_start = 11, tiles = "Stamen Toner" ) folium.Choropleth( geo_data = geo_str, # 우리나라 경계선 좌표값이 담긴 데이터 data = crime_anal_norm["살인"], colums = [crime_anal_norm.index, crime_anal_norm["살인"]], key_on = "feature.id", fill_color = "PuRd", fill_opacity = 0.7, line_opacity = 0.2, legend_name="정규화된 살인 발생 건수" ).add_to(my_map) my_map
Out[370]:Make this Notebook Trusted to load map: File -> Trust NotebookIn [369]:# 5대 범주 발생 건수 지도 시각화 my_map = folium.Map( location = [37.552, 126.982], zoom_start = 11, tiles = "Stamen Toner" ) folium.Choropleth( geo_data = geo_str, # 우리나라 경계선 좌표값이 담긴 데이터 data = crime_anal_norm["범죄"], colums = [crime_anal_norm.index, crime_anal_norm["범죄"]], key_on = "feature.id", fill_color = "PuRd", fill_opacity = 0.7, line_opacity = 0.2, legend_name="정규화된 5대 범죄 발생 건수" ).add_to(my_map) my_map
Out[369]:Make this Notebook Trusted to load map: File -> Trust NotebookIn [383]:# 인구 대비 범죄 발생 건수 tmp_criminal = crime_anal_norm["범죄"] / crime_anal_norm["인구수"] my_map = folium.Map( location = [37.552, 126.982], zoom_start = 11, tiles = "Stamen Toner" ) folium.Choropleth( geo_data = geo_str, # 우리나라 경계선 좌표값이 담긴 데이터 data = tmp_criminal, colums = [crime_anal_norm.index, tmp_criminal], key_on = "feature.id", fill_color = "PuRd", fill_opacity = 0.7, line_opacity = 0.2, legend_name="인구 대비 범죄 발생 건수", ).add_to(my_map) my_map
Out[383]:Make this Notebook Trusted to load map: File -> Trust NotebookIn [375]:# 경찰서별 정보를 범죄발생과 함께 정리 crime_anal_station = pd.read_csv( "../data/02. crime_in_Seoul_raw.csv", encoding ="utf-8" ) crime_anal_station.head()
Out[375]:구분 강간검거 강간발생 강도검거 강도발생 살인검거 살인발생 절도검거 절도발생 폭력검거 폭력발생 구별 lat lng 0 강남 269.0 339.0 26.0 24.0 3.0 3.0 1129.0 2438.0 2096.0 2336.0 강남구 37.509435 127.066958 1 강동 152.0 160.0 13.0 14.0 5.0 4.0 902.0 1754.0 2201.0 2530.0 강동구 37.528511 127.126822 2 강북 159.0 217.0 4.0 5.0 6.0 7.0 672.0 1222.0 2482.0 2778.0 강북구 37.637197 127.027305 3 강서 239.0 275.0 10.0 10.0 10.0 9.0 1070.0 1952.0 2768.0 3204.0 양천구 37.539783 126.829997 4 관악 264.0 322.0 10.0 12.0 7.0 6.0 937.0 2103.0 2707.0 3235.0 관악구 37.474395 126.951349 In [378]:col = ["살인검거","강도검거","강간검거","절도검거","폭력검거"] tmp = crime_anal_station[col] / crime_anal_station[col].max() #정규화 0-1 crime_anal_station["검거"] = np.mean(tmp, axis = 1) #numpy에서 axis=1은 행(가로), pandas에서 axis=1은 열(세로) crime_anal_station.tail(2)
Out[378]:구분 강간검거 강간발생 강도검거 강도발생 살인검거 살인발생 절도검거 절도발생 폭력검거 폭력발생 구별 lat lng 검거 29 중부 96.0 141.0 3.0 3.0 2.0 2.0 485.0 1204.0 1164.0 1335.0 중구 37.563617 126.989652 0.277182 30 혜화 64.0 101.0 6.0 6.0 2.0 2.0 379.0 988.0 842.0 972.0 종로구 37.571968 126.998957 0.240065 In [388]:# 경찰서 위치 마커 표시 my_map = folium.Map( location = [37.5502, 126.982], zoom_start = 11 ) folium.Choropleth( geo_data=geo_str, data=crime_anal_norm["범죄"], columns = [crime_anal_norm.index, crime_anal_norm["범죄"]], key_on = "feature.id", fill_color = "PuRd", fill_opacity = 0.7, line_opacity = 0.2, legend_name="정규화된 5대 범죄 발생 건수" ).add_to(my_map) for idx, rows in crime_anal_station.iterrows(): folium.CircleMarker( location=[rows["lat"], rows["lng"]], radius = rows["검거"] * 50, popup = rows["구분"] + ":" + "%.2f" % rows["검거"], color = "#3186cc", fill = True, fill_color = "#3186cc" ).add_to(my_map) my_map
Out[388]:Make this Notebook Trusted to load map: File -> Trust Notebook11. 서울시 범죄 현황 발생 장소 분석¶
In [393]:# 추가 검증 crime_loc_row = pd.read_csv( "../data/02. crime_in_Seoul_location.csv", thousands =",", encoding = "euc-kr") crime_loc_row.tail(2)
Out[393]:범죄명 장소 발생건수 63 폭력 금융기관 42 64 폭력 기타 26382 In [394]:crime_loc_row.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 65 entries, 0 to 64 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 범죄명 65 non-null object 1 장소 65 non-null object 2 발생건수 65 non-null int64 dtypes: int64(1), object(2) memory usage: 1.6+ KB
In [395]:crime_loc_row["범죄명"].unique()
Out[395]:array(['살인', '강도', '강간.추행', '절도', '폭력'], dtype=object)
In [396]:crime_loc_row["장소"].unique()
Out[396]:array(['아파트, 연립 다세대', '단독주택', '노상', '상점', '숙박업소, 목욕탕', '유흥 접객업소', '사무실', '역, 대합실', '교통수단', '유원지 ', '학교', '금융기관', '기타'], dtype=object)In [402]:crime_loc = crime_loc_row.pivot_table( crime_loc_row, index = "장소", columns = "범죄명", aggfunc = [np.sum]) crime_loc.columns = crime_loc.columns.droplevel([0, 1]) crime_loc
Out[402]:범죄명 강간.추행 강도 살인 절도 폭력 장소 교통수단 691 0 0 457 222 금융기관 2 1 1 1081 42 기타 2128 67 65 21734 26382 노상 986 87 22 9329 24535 단독주택 395 15 30 2241 3579 사무실 132 8 1 682 1229 상점 95 34 1 4403 852 숙박업소, 목욕탕 389 9 4 828 303 아파트, 연립 다세대 284 18 12 1504 2839 역, 대합실 181 0 0 356 272 유원지 59 2 2 367 424 유흥 접객업소 398 13 8 2035 2645 학교 33 0 0 400 203 In [406]:col = ["살인", "강도", "강간", "절도", "폭력"] crime_loc_norm = crime_loc / crime_loc.max() #정규화 crime_loc_norm.head()
Out[406]:범죄명 강간.추행 강도 살인 절도 폭력 장소 교통수단 0.324718 0.000000 0.000000 0.021027 0.008415 금융기관 0.000940 0.011494 0.015385 0.049738 0.001592 기타 1.000000 0.770115 1.000000 1.000000 1.000000 노상 0.463346 1.000000 0.338462 0.429235 0.929990 단독주택 0.185620 0.172414 0.461538 0.103110 0.135661 In [407]:crime_loc_norm["종합"] = np.mean(crime_loc_norm, axis = 1) crime_loc_norm.tail(2)
Out[407]:범죄명 강간.추행 강도 살인 절도 폭력 종합 장소 유흥 접객업소 0.187030 0.149425 0.123077 0.093632 0.100258 0.130684 학교 0.015508 0.000000 0.000000 0.018404 0.007695 0.008321 In [410]:crime_loc_norm_sort = crime_loc_norm.sort_values("종합", ascending = False) def drawGraph(): plt.figure(figsize = (10, 10)) sns.heatmap( crime_loc_norm_sort, annot = True, fmt = "f", linewidths = 0.5, cmap = "RdPu" ) plt.title("범죄 발생 장소") plt.show() drawGraph()
'EDA' 카테고리의 다른 글
EDA) 셀프 주유소 가격 분석 (0) 2023.03.10 EDA) Selenium 기초 (0) 2023.03.10 EDA) 네이버 영화순위 시각화 (0) 2023.03.10 EDA) 웹크롤링 기초 예제 - 시카고 샌드위치 (0) 2023.03.10 EDA) 서울시 인구수 및 CCTV 개수 시각화 (0) 2023.03.10